Arrow left Go to previous page / next page of Tier3 site log MOVED TO...

26. 07. 2012 Enforcing flexible memory limits on SGE

Change proposal

Up to now we didn't apply memory limits on SGE but because we got 5 servers crashes in the last days now it's time to apply them; said that I propose to:

  • Assign h_vmem values to each host according to its memory capacity, so for instance 25G to t3wn[10-29] and 50G to t3wn[30-40]. We'll run for each WN:
    • qconf -se WN
    • qconf -aattr exechost complex_values h_vmem=25G WN
    • qconf -se WN
  • Configure the h_vmem complex like a consumable with a default value; so switching:
    • its consumable property from NO to YES
    • its default value from 0 to 3G.
    • Run qconf -sc to see the complexes before and after the change.
  • For each queue Q configure the hard limit h_vmem like 6G, so users can request more memory than the default 3G but <= 6G.
    • The limit h_vmem is per server slot and it's an effective method to enforce an upper limit on h_vmem but even better we might program a JVS script to reject the unsatisfiable job requests instead to leave them queued forever in the cluster
    • Our JSV can be installed like a forced check into /gridware/sge/util/sge_request.
  • We modify the file /etc/security/limits.conf to enforce the memory the limit '@cms as 6500000' like a final security mechanism would be SGE stop to respect its h_vmem deal or becasue of an SGE Admin misconfiguration.

Some logs collected during the change

Configuring the h_vmem limit per each host

[root@t3ce02 ~]# for i in `seq 10 29` ; do qconf -aattr exechost complex_values h_vmem=25G t3wn$i ; done   
root@t3ce02.psi.ch modified "t3wn10.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn11.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn12.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn13.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn14.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn15.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn16.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn17.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn18.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn19.psi.ch" in exechost list
No modification because "h_vmem" already exists in "complex_values" of "exechost"
root@t3ce02.psi.ch modified "t3wn21.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn22.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn23.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn24.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn25.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn26.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn27.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn28.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn29.psi.ch" in exechost list
[root@t3ce02 ~]# for i in `seq 30 40` ; do qconf -aattr exechost complex_values h_vmem=50G t3wn$i ; done   
root@t3ce02.psi.ch modified "t3wn30.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn31.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn32.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn33.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn34.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn35.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn36.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn37.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn38.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn39.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn40.psi.ch" in exechost list

After the h_vmem complex change

[root@t3ce02 ~]# qconf -sc | egrep 'h_vmem|shortcut'
#name                 shortcut         type        relop requestable consumable default  urgency 
h_vmem                h_vmem           MEMORY      <=    YES         YES        3G       0
Interesting to note, since I've enforced a default value even the running jobs were affected, this is reported by qhost:
[root@t3ce02 ~]# qhost -F h_vmem | grep h_vmem
    Host Resource(s):      hc:h_vmem=16.000G    <-- WN10
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=25.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=2.000G   <-- WN30, 16 jobs running => 16*3G=48G, and 50G-48G = 2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G  <-- WN40
Strangely but a qstat -j JOBID doesn't report the implict h_vmem=3G request; conversely this limit is considered to start, or not, a new job and when the job will start like a ulimit -v memory limit.

Soft and Hard limit for the corner case 6G

[root@t3ce02 ~]# qconf -sq all.q | grep vmem
s_vmem                5.9G
h_vmem                6G
[root@t3ce02 ~]# qconf -sq short.q | grep vmem
s_vmem                5.9G
h_vmem                6G
[root@t3ce02 ~]# qconf -sq long.q | grep vmem
s_vmem                5.9G
h_vmem                6G
[root@t3ce02 ~]# qconf -sq all.q.admin | grep vmem
s_vmem                5.9G
h_vmem                6G

3GB ulimit applied

[root@t3wn40 ~]# ps fax  | grep --color 2644280 -A 2
14192 ?        S      0:00  \_ sge_shepherd-2644280 -bg
14193 ?        Ss     0:00  |   \_ /bin/bash /gridware/sge/default/spool/t3wn40/job_scripts/2644280 /shome/fronga/work/CBAF8prod/Plotting/
14245 ?        D      1:48  |       \_ ./Selective_Plot_Generator_14193.exec --results

[root@t3wn40 ~]# cd /proc/14245

[root@t3wn40 14245]# cat limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            unlimited            unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             401408               401408               processes 
Max open files            1024                 1024                 files     
Max locked memory         32768                32768                bytes     
Max address space         3221225472           3221225472           bytes        <-------
Max file locks            unlimited            unlimited            locks     
Max pending signals       401408               401408               signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0      
-- FabioMartinelli - 2012-07-26


Arrow left Go to previous page / next page of Tier3 site log MOVED TO...

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2012-07-26 - FabioMartinelli
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback