Go to
previous page /
next page of Tier3 site log
26. 07. 2012 Enforcing flexible memory limits on SGE
Change proposal
Up to now we didn't apply memory limits on SGE but because we got 5 servers crashes in the last days now it's time to enforce them; said that I propose to:
- Assign
h_vmem
values to each host according to its memory capacity, so for instance 25G
to t3wn[10-29]
and 50G
to t3wn[30-40]
. We'll run for each WN:
- qconf -se WN
- qconf -aattr exechost complex_values h_vmem=25G WN
- qconf -se WN
- Configure the
h_vmem
complex like a forced consumable with a default value; so switching:
- its
consumable
property from NO
to YES
- its default value from
0
to 3G
.
- Run
qconf -sc
to see the complexes before and after the change.
- For each queue
Q
configure the hard limit h_vmem
like 6G
, so users can request more memory than the default 3G
but <= 6G
.
- The limit
h_vmem
is per server slot and it's an effective method to enforce an upper limit on h_vmem
but even better we might program a JVS
to reject the unsatisfiable job requests instead to leave them queued forever.
- Our
JSV
can be installed like a forced check into /gridware/sge/util/sge_request
.
- We modify the file
/etc/security/limits.conf
to enforce the memory the limit '@cms as 6500000' like a final security mechanism would be SGE stop to respect its h_vmem
deal or becasue of an SGE Admin misconfiguration.
Some logs collected during the change
Configuring the h_vmem
limit per each host
[root@t3ce02 ~]# for i in `seq 10 29` ; do qconf -aattr exechost complex_values h_vmem=25G t3wn$i ; done
root@t3ce02.psi.ch modified "t3wn10.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn11.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn12.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn13.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn14.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn15.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn16.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn17.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn18.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn19.psi.ch" in exechost list
No modification because "h_vmem" already exists in "complex_values" of "exechost"
root@t3ce02.psi.ch modified "t3wn21.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn22.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn23.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn24.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn25.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn26.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn27.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn28.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn29.psi.ch" in exechost list
[root@t3ce02 ~]# for i in `seq 30 40` ; do qconf -aattr exechost complex_values h_vmem=50G t3wn$i ; done
root@t3ce02.psi.ch modified "t3wn30.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn31.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn32.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn33.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn34.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn35.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn36.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn37.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn38.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn39.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn40.psi.ch" in exechost list
Following the h_vmem complex change
[root@t3ce02 ~]# qconf -sc | egrep 'h_vmem|shortcut'
#name shortcut type relop requestable consumable default urgency
h_vmem h_vmem MEMORY <= YES YES 3G 0
Interesting to note, since I've enforced a default value even the running jobs were affected, this is reported by
qhost
:
[root@t3ce02 ~]# qhost -F h_vmem | grep h_vmem
Host Resource(s): hc:h_vmem=16.000G <-- WN10
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=25.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=2.000G <-- WN30, 16 jobs running => 16*3G=48G, and 50G-48G = 2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G <-- WN40
Strangely but a
qstat -j JOBID
doesn't report the implict
h_vmem
request; conversely this limit is considered to start, or not, a new job and later used like a
ulimit -v
memory limit.
Soft and Hard limit for the corner case 6G
[root@t3ce02 ~]# qconf -sq all.q | grep vmem
s_vmem 5.9G
h_vmem 6G
[root@t3ce02 ~]# qconf -sq short.q | grep vmem
s_vmem 5.9G
h_vmem 6G
[root@t3ce02 ~]# qconf -sq long.q | grep vmem
s_vmem 5.9G
h_vmem 6G
[root@t3ce02 ~]# qconf -sq all.q.admin | grep vmem
s_vmem 5.9G
h_vmem 6G
--
FabioMartinelli - 2012-07-26
Go to
previous page /
next page of Tier3 site log