Go to
previous page /
next page of Tier3 site log
03. 05. 2015 Son of Grid Engine 8.1.8 cpuset error and fix
Ref:
http://linux.oracle.com/documentation/EL6/Red_Hat_Enterprise_Linux-6-Resource_Management_Guide-en-US.pdf
Catching the error by
strace
:
[root@t3vmui01 ~]# strace -ff -p `pidof sge_execd` -o ./log
[root@t3vmui01 ~]# grep cpuse log.*
log.3070:read(4, "v/cpuset cgroup rw,relatime,cpus"..., 1024) = 89
log.3070:read(4, "1:cpuset:/\n", 1024) = 11
log.3070:openat(3, "dev/cpuset//cpus", O_RDONLY) = 4
log.3070:openat(3, "dev/cpuset//mems", O_RDONLY) = 4
log.3070:read(5, "v/cpuset cgroup rw,relatime,cpus"..., 1024) = 89
log.3070:stat("/dev/cpuset/sge", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
log.3070:stat("/dev/cpuset/sge/cpuset.mems", 0x7fff55e0b650) = -1 ENOENT (No such file or directory)
log.3070:open("/dev/cpuset/sge/mems", O_RDONLY) = 5
log.3070:open("/dev/cpuset/sge/cpus", O_RDONLY) = 5
log.3070:stat("/dev/cpuset/sge/18.1", 0x7fff55e0c7c0) = -1 ENOENT (No such file or directory)
log.3159:read(3, "1:cpuset:/\n", 1048576) = 11
log.3159:write(1, "1:cpuset:/\n", 11) = 11
log.3161:read(3, "1:cpuset:/\n", 1048576) = 11
log.3161:write(1, "1:cpuset:/\n", 11) = 11
[root@t3vmui01 ~]#
Fixed by:
[root@t3vmui01 ~]# cat /etc/sysconfig/sgeexecd
export SGE_CGROUP_DIR=/dev/cpuset/sge
and
[root@t3vmui01 ~]# grep -Hn setup-cgroups-etc /etc/init.d/sgeexecd.p6444
/etc/init.d/sgeexecd.p6444:427: /opt/sge/util/resources/scripts/setup-cgroups-etc start
plus a couple of
sgeexecd
service stop/start ;
Some logs showing a proper behaviour :
05/03/2015 23:12:27 [0:7176]: shepherd called with uid = 0, euid = 0
05/03/2015 23:12:27 [0:7176]: starting up 8.1.8
05/03/2015 23:12:27 [0:7176]: setpgid(7176, 7176) returned 0
05/03/2015 23:12:27 [0:7176]: do_core_binding: explicit
05/03/2015 23:12:27 [0:7176]: bind_process_to_mask: SGE_BINDING env var created
05/03/2015 23:12:27 [0:7176]: do_core_binding: explicit: binding done
05/03/2015 23:12:27 [0:7176]: do_core_binding: finishing
05/03/2015 23:12:27 [0:7176]: set cpuset cpus per core binding
05/03/2015 23:12:27 [0:7176]: no prolog script to start
05/03/2015 23:12:27 [0:7176]: parent: forked "job" with pid 7177
05/03/2015 23:12:27 [0:7176]: parent: job-pid: 7177
05/03/2015 23:12:27 [0:7177]: child: starting son(job, /opt/sge/default/spool/t3vmui01/job_scripts/26, 0, 4096);
...
05/03/2015 23:21:24 [0:7176]: writing usage file to "usage"
05/03/2015 23:21:24 [0:7176]: no epilog script to start
[root@t3vmui01 ~]# grep cpuset /opt/sge/default/spool/t3vmui01/messages
...
05/03/2015 23:21:25| main|t3vmui01|I|removing task cpuset /dev/cpuset/sge/26.1
And again some files showing a proper behaviour :
[root@t3vmui01 ~]# find /dev/cpuset/sge
/dev/cpuset/sge
/dev/cpuset/sge/32.1
/dev/cpuset/sge/32.1/5094
/dev/cpuset/sge/32.1/5094/memory_spread_slab
/dev/cpuset/sge/32.1/5094/memory_spread_page
/dev/cpuset/sge/32.1/5094/memory_pressure
/dev/cpuset/sge/32.1/5094/memory_migrate
/dev/cpuset/sge/32.1/5094/sched_relax_domain_level
/dev/cpuset/sge/32.1/5094/sched_load_balance
/dev/cpuset/sge/32.1/5094/mem_hardwall
/dev/cpuset/sge/32.1/5094/mem_exclusive
/dev/cpuset/sge/32.1/5094/cpu_exclusive
/dev/cpuset/sge/32.1/5094/mems
/dev/cpuset/sge/32.1/5094/cpus <--------------- 0 inside, nice.
/dev/cpuset/sge/32.1/5094/cgroup.event_control
/dev/cpuset/sge/32.1/5094/notify_on_release
/dev/cpuset/sge/32.1/5094/cgroup.procs
/dev/cpuset/sge/32.1/5094/tasks <---- 5094 5095 5180 5182 5184 5185 ( namely the procs created by my SGE job )
/dev/cpuset/sge/32.1/0
/dev/cpuset/sge/32.1/0/memory_spread_slab
/dev/cpuset/sge/32.1/0/memory_spread_page
/dev/cpuset/sge/32.1/0/memory_pressure
/dev/cpuset/sge/32.1/0/memory_migrate
/dev/cpuset/sge/32.1/0/sched_relax_domain_level
/dev/cpuset/sge/32.1/0/sched_load_balance
/dev/cpuset/sge/32.1/0/mem_hardwall
/dev/cpuset/sge/32.1/0/mem_exclusive
/dev/cpuset/sge/32.1/0/cpu_exclusive
/dev/cpuset/sge/32.1/0/mems
/dev/cpuset/sge/32.1/0/cpus
/dev/cpuset/sge/32.1/0/cgroup.event_control
/dev/cpuset/sge/32.1/0/notify_on_release
/dev/cpuset/sge/32.1/0/cgroup.procs
/dev/cpuset/sge/32.1/0/tasks
/dev/cpuset/sge/32.1/memory_spread_slab
/dev/cpuset/sge/32.1/memory_spread_page
/dev/cpuset/sge/32.1/memory_pressure
/dev/cpuset/sge/32.1/memory_migrate
/dev/cpuset/sge/32.1/sched_relax_domain_level
/dev/cpuset/sge/32.1/sched_load_balance
/dev/cpuset/sge/32.1/mem_hardwall
/dev/cpuset/sge/32.1/mem_exclusive
/dev/cpuset/sge/32.1/cpu_exclusive
/dev/cpuset/sge/32.1/mems
/dev/cpuset/sge/32.1/cpus <------------------ 0-7 inside
/dev/cpuset/sge/32.1/cgroup.event_control
/dev/cpuset/sge/32.1/notify_on_release
/dev/cpuset/sge/32.1/cgroup.procs
/dev/cpuset/sge/32.1/tasks
Checking if all the procs created by my SGE job are running on the same CPU core:
[root@t3wn42 5094]# ps -F 5094 5095 5180 5182 5184 5185
UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD
root 5094 5058 0 15418 6412 0 15:29 ? S 0:00 sge_shepherd-32 -bg
2980 5095 5094 0 26833 1460 0 15:29 ? Ss 0:00 -sh /opt/sge/default/spool/t3wn42/job_scripts/32
2980 5180 1 4 28074 1212 0 15:29 ? D 0:55 find /bla
2980 5182 1 1 28070 1200 0 15:29 ? D 0:21 find /blabla
2980 5185 5095 0 25226 564 0 15:29 ? S 0:00 sleep 20000
For MPI
this seems interesting, but I didn't check it.
Go to
previous page /
next page of Tier3 site log