Arrow left Go to previous page / next page of Tier3 site log MOVED TO...

26. 07. 2012 Enforcing flexible memory limits on SGE

Change proposal

Up to now we didn't apply memory limits on SGE but because we got 5 servers crashes in the last days now it's time to enforce them; said that I propose to:

  • Assign h_vmem values to each host according to its memory capacity, so for instance 25G to t3wn[10-29] and 50G to t3wn[30-40]. We'll run for each WN:
    • qconf -se WN
    • qconf -aattr exechost complex_values h_vmem=25G WN
    • qconf -se WN
  • Configure the h_vmem complex like a forced consumable with a default value; so switching:
    • its consumable property from NO to YES
    • its default value from 0 to 3G.
    • Run qconf -sc to see the complexes before and after the change.
  • For each queue Q configure the hard limit h_vmem like 6G, so users can request more memory than the default 3G but <= 6G.
    • The limit h_vmem is per server slot and it's an effective method to enforce an upper limit on h_vmem but even better we might program a JVS to reject the unsatisfiable job requests instead to leave them queued forever.
    • Our JSV can be installed like a forced check into /gridware/sge/util/sge_request.
  • We modify the file /etc/security/limits.conf to enforce the memory the limit '@cms as 6500000' like a final security mechanism would be SGE stop to respect its h_vmem deal or becasue of an SGE Admin misconfiguration.

Some logs collected during the change

Configuring the h_vmem limit per each host

[root@t3ce02 ~]# for i in `seq 10 29` ; do qconf -aattr exechost complex_values h_vmem=25G t3wn$i ; done   
root@t3ce02.psi.ch modified "t3wn10.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn11.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn12.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn13.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn14.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn15.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn16.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn17.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn18.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn19.psi.ch" in exechost list
No modification because "h_vmem" already exists in "complex_values" of "exechost"
root@t3ce02.psi.ch modified "t3wn21.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn22.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn23.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn24.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn25.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn26.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn27.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn28.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn29.psi.ch" in exechost list
[root@t3ce02 ~]# for i in `seq 30 40` ; do qconf -aattr exechost complex_values h_vmem=50G t3wn$i ; done   
root@t3ce02.psi.ch modified "t3wn30.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn31.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn32.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn33.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn34.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn35.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn36.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn37.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn38.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn39.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn40.psi.ch" in exechost list

Following the h_vmem complex change

[root@t3ce02 ~]# qconf -sc | egrep 'h_vmem|shortcut'
#name                 shortcut         type        relop requestable consumable default  urgency 
h_vmem                h_vmem           MEMORY      <=    YES         YES        3G       0
Interesting to note, since I've enforced a default value even the running jobs were affected, this is reported by qhost:
[root@t3ce02 ~]# qhost -F h_vmem | grep h_vmem
    Host Resource(s):      hc:h_vmem=16.000G    <-- WN10
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=25.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=16.000G
    Host Resource(s):      hc:h_vmem=19.000G
    Host Resource(s):      hc:h_vmem=2.000G   <-- WN30, 16 jobs running => 16*3G=48G, and 50G-48G = 2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G
    Host Resource(s):      hc:h_vmem=2.000G  <-- WN40
Strangely but a qstat -j JOBID doesn't report the implict h_vmem request; conversely this limit is considered to start, or not, a new job and later used like a ulimit -v memory limit.

Soft and Hard limit for the corner case 6G

[root@t3ce02 ~]# qconf -sq all.q | grep vmem
s_vmem                5.9G
h_vmem                6G
[root@t3ce02 ~]# qconf -sq short.q | grep vmem
s_vmem                5.9G
h_vmem                6G
[root@t3ce02 ~]# qconf -sq long.q | grep vmem
s_vmem                5.9G
h_vmem                6G
[root@t3ce02 ~]# qconf -sq all.q.admin | grep vmem
s_vmem                5.9G
h_vmem                6G

-- FabioMartinelli - 2012-07-26


Arrow left Go to previous page / next page of Tier3 site log MOVED TO...


This topic: CmsTier3 > WebHome > CMSTier3Log > CMSTier3Log25
Topic revision: r2 - 2012-07-26 - FabioMartinelli
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback