Go to
previous page /
next page of Tier3 site log
26. 07. 2012 Enforcing flexible memory limits on SGE
Change proposal
Up to now we didn't apply memory limits on SGE but because we got 5 servers crashes in the last days now it's time to apply them; said that I propose to:
- Assign
h_vmem
values to each host according to its memory capacity, so for instance 25G
to t3wn[10-29]
and 50G
to t3wn[30-40]
. We'll run for each WN:
- qconf -se WN
- qconf -aattr exechost complex_values h_vmem=25G WN
- qconf -se WN
- Configure the
h_vmem
complex like a consumable with a default value; so switching:
- its
consumable
property from NO
to YES
- its default value from
0
to 3G
.
- Run
qconf -sc
to see the complexes before and after the change.
- For each queue
Q
configure the hard limit h_vmem
like 6G
, so users can request more memory than the default 3G
but <= 6G
.
- The limit
h_vmem
is per server slot and it's an effective method to enforce an upper limit on h_vmem
but even better we might program a JVS script to reject the unsatisfiable job requests instead to leave them queued forever in the cluster
- Our
JSV
can be installed like a forced check into /gridware/sge/util/sge_request
.
- We modify the file
/etc/security/limits.conf
to enforce the memory the limit '@cms as 6500000' like a final security mechanism would be SGE stop to respect its h_vmem
deal or becasue of an SGE Admin misconfiguration.
Some logs collected during the change
Configuring the h_vmem
limit per each host
[root@t3ce02 ~]# for i in `seq 10 29` ; do qconf -aattr exechost complex_values h_vmem=25G t3wn$i ; done
root@t3ce02.psi.ch modified "t3wn10.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn11.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn12.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn13.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn14.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn15.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn16.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn17.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn18.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn19.psi.ch" in exechost list
No modification because "h_vmem" already exists in "complex_values" of "exechost"
root@t3ce02.psi.ch modified "t3wn21.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn22.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn23.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn24.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn25.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn26.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn27.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn28.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn29.psi.ch" in exechost list
[root@t3ce02 ~]# for i in `seq 30 40` ; do qconf -aattr exechost complex_values h_vmem=50G t3wn$i ; done
root@t3ce02.psi.ch modified "t3wn30.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn31.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn32.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn33.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn34.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn35.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn36.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn37.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn38.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn39.psi.ch" in exechost list
root@t3ce02.psi.ch modified "t3wn40.psi.ch" in exechost list
After the h_vmem complex change
[root@t3ce02 ~]# qconf -sc | egrep 'h_vmem|shortcut'
#name shortcut type relop requestable consumable default urgency
h_vmem h_vmem MEMORY <= YES YES 3G 0
Interesting to note, since I've enforced a default value even the running jobs were affected, this is reported by
qhost
:
[root@t3ce02 ~]# qhost -F h_vmem | grep h_vmem
Host Resource(s): hc:h_vmem=16.000G <-- WN10
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=25.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=16.000G
Host Resource(s): hc:h_vmem=19.000G
Host Resource(s): hc:h_vmem=2.000G <-- WN30, 16 jobs running => 16*3G=48G, and 50G-48G = 2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G
Host Resource(s): hc:h_vmem=2.000G <-- WN40
Strangely but a
qstat -j JOBID
doesn't report the implict
h_vmem=3G
request; conversely this limit is considered to start, or not, a new job and when the job will start like a
ulimit -v
memory limit.
Soft and Hard limit for the corner case 6G
[root@t3ce02 ~]# qconf -sq all.q | grep vmem
s_vmem 5.9G
h_vmem 6G
[root@t3ce02 ~]# qconf -sq short.q | grep vmem
s_vmem 5.9G
h_vmem 6G
[root@t3ce02 ~]# qconf -sq long.q | grep vmem
s_vmem 5.9G
h_vmem 6G
[root@t3ce02 ~]# qconf -sq all.q.admin | grep vmem
s_vmem 5.9G
h_vmem 6G
3GB ulimit applied
[root@t3wn40 ~]# ps fax | grep --color 2644280 -A 2
14192 ? S 0:00 \_ sge_shepherd-2644280 -bg
14193 ? Ss 0:00 | \_ /bin/bash /gridware/sge/default/spool/t3wn40/job_scripts/2644280 /shome/fronga/work/CBAF8prod/Plotting/
14245 ? D 1:48 | \_ ./Selective_Plot_Generator_14193.exec --results
[root@t3wn40 ~]# cd /proc/14245
[root@t3wn40 14245]# cat limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size unlimited unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 401408 401408 processes
Max open files 1024 1024 files
Max locked memory 32768 32768 bytes
Max address space 3221225472 3221225472 bytes <-------
Max file locks unlimited unlimited locks
Max pending signals 401408 401408 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
--
FabioMartinelli - 2012-07-26
Go to
previous page /
next page of Tier3 site log