Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Site Specific Modifications This page contains details about modification made to software that are specific to CSCS. Full details should be found on the relevant service page, this is intended to give a brief overview and to be more readable than cfengine. If something is mentioned here and not in detail on the wiki page for the service please inform the service maintainer found at the link below. https://wiki.chipp.ch/twiki/bin/view/LCGTier2/ServiceInformation %TOC% ---++ General modifications ---+++ Polyinstantiated /tmp on WNs To make sure that jobs see a specific /tmp directory on GPFS, we need to polyinstantiate /tmp and put it on the shared filesystem. Following http://tech.ryancox.net/2013/07/per-user-tmp-and-devshm-directories.html, we can configure it by doing this: 1. Add the following to =/etc/rc.local= <verbatim># MG 05.05.14 10:00 as per http://tech.ryancox.net/2013/07/per-user-tmp-and-devshm-directories.html # #mkdir -pm 000 /tmp/usertmp mkdir -pm 000 /dev/shm/usertmp mount --make-shared / mount --bind /tmp /tmp mount --make-private /tmp mount --bind /dev/shm /dev/shm mount --make-private /dev/shm mount --bind /cvmfs /cvmfs mount --make-rshared /cvmfs mount --bind /gpfs_pp /gpfs_pp mount --make-rshared /gpfs_pp</verbatim> 1. Add the proper PAM configuration: a. At the end of =/etc/pam.d/sshd=, add the following:<verbatim>session required pam_namespace.so ignore_instance_parent_mode</verbatim> a. At the end of =/etc/pam.d/slurm=, add the following: <verbatim>auth required pam_localuser.so account required pam_unix.so session required pam_limits.so session required pam_namespace.so ignore_instance_parent_mode</verbatim>NOTE: the argument =ignore_instance_parent_mode= is there to allow =/tmp= to be polyinstantiated on a subdirectory of which parent is not created with =000= permissions (i.e. =/gpfs_pp=) 1. Now the file =/etc/security/namespace.conf= needs to have the following lines:<verbatim>/tmp /gpfs_pp/usertmp/ user root /dev/shm /dev/shm/usertmp/ user root </verbatim> 1. And all left is to create the directory with where the polyinstantiated /tmp will be stored: <verbatim>mkdir -pm 000 /gpfs_pp/usertmp</verbatim> *NOTE* Please, take into account that this directory is shared across the whole cluster, so if user's jobs create directories on /tmp, they have to be *uniquely named*. ---++++ SLURM To enable this for SLURM jobs, additional steps need to be done: 1. Make sure the directive =UsePAM= is enabled in =/etc/slurm/slurm.conf=:<verbatim>UsePAM=1</verbatim> 1. Make sure that there *is no slurmd information on /tmp* by properly configuring the following variables: <verbatim>#-------------------------------------------------------------------------------------- # PATHS #-------------------------------------------------------------------------------------- SlurmdSpoolDir=/var/spool/slurmd # this is for slurmd, must be local to each system StateSaveLocation=/var/spool # in PROD /slurm/spool/state, this is for slurmctld </verbatim> 1. Also, due to *cvmfs* working via =autofs= and the way namespaces behave, a Prolog script that makes sure that cvmfs is mounted on the WN before the job runs is required: a. Add the following entries to =slurm.conf= (TaskProlog and TaskEpilog existed before). The idea is not to make sure that CVMFS works (that is done on the Node Health check), but to make sure that those filesystems are mounted if they are to be there. Of course we could add more complex checks in here, but that'd stress the system much more. <verbatim># http://slurm.schedmd.com/prolog_epilog.html # Prolog & Epilog to be run before/after each batch task, to set default environment variables etc. TaskProlog=/etc/slurm/TaskProlog.sh TaskEpilog=/etc/slurm/TaskEpilog.sh # Prolog & Epilog to be run before/after the job is actually executed Prolog=/etc/slurm/Prolog.sh Epilog=/etc/slurm/Epilog.sh</verbatim> a. The contents of =Prolog.sh= and =Epilog.sh= are these: * = -- Prolog.sh -- = <verbatim>#! /bin/bash if [ -e /usr/bin/cvmfs_config ]; then /usr/bin/cvmfs_config probe 2>&1 >/dev/null fi exit 0</verbatim> * = -- Epilog.sh -- = <verbatim>#! /bin/bash </verbatim> ---++++ Keeping a mixed environment with polyinstantiated /tmp only on some nodes In order to accomplish this, make sure that all modifications shown above are applied. It is especially important that =/etc/pam.d/slurm= exists in *all nodes*, otherwise those in which it doesn't exist will be marked as =DOWN=. Then, the following change must be applied: 1. On those nodes in which we want to enforce the polyinstantiated /tmp, the following line must be present:<verbatim>session required pam_namespace.so ignore_instance_parent_mode</verbatim> 1. While for those nodes which we want to show a standard behaviour, it must be commented out<verbatim>#session required pam_namespace.so ignore_instance_parent_mode</verbatim> ---+++ du on GPFS When running du on a file on the GPFS filesystem the disk usage is reported as twice the size of the file. This is because we use the GPFS native RAID which ensures there are two copies of the file. The ATLAS pilot jobs had an issue with this so du on the worker nodes was replaced with a script that halves the size reported when run on GPFS. ---++ Cream ---+++ Publishing Only cream01 should publish values for the CPUs in the cluster. Other creams should not report this, the relevant file is <verbatim> /var/lib/bdii/gip/ldif/static-file-Cluster.ldif </verbatim> The following file determines if the cream ce publishes as production or draining. This is controlled via cfengine. <verbatim> /var/lib/bdii/gip/plugin/glite-info-dynamic-ce </verbatim> Which in turn runs a script from the hoem of root which queires slurm in order to generate certain values e.g. <verbatim> vim /root/fakeinfo.bash ... MaxJobsPerGroup=$(${SACCTMGR_CMD} -n -r list association account=${VO} format=grpjobs | awk '{print $1}') ... </verbatim> ---+++ Sym links /usr/local/bin/sacct needs to be a symbolic link to /usr/bin/sacct <verbatim> ll /usr/local/bin/sacct lrwxrwxrwx 1 root root 14 9. Okt 10:14 /usr/local/bin/sacct -> /usr/bin/sacct </verbatim> ---+++ Job working directory On the worker nodes to ensure jobs run in tmpdir_slurm rather than the users home the following file is modified <verbatim> /etc/glite/cp_1.sh </verbatim> ---+++ Priorities The following file is modified so jobs submitted with sgm/ops accounts are run on a reservation. <verbatim> /usr/libexec/slurm_local_submit_attributes.sh </verbatim> We saw pilot failures across VOs due to their pilot jobs being queued for too long. As such the following was implemented. ---+++ Accounting Slurm logs parser used by APEL: <verbatim> /usr/lib/python2.6/site-packages/apel/parsers/slurm.py /usr/lib/python2.6/site-packages/apel/common/datetime_utils.py </verbatim> have been patched according to this [[https://ggus.eu/ws/ticket_info.php?ticket=98409][GGUS ticket]]. A few issues still need to be solved: the correct version should be included in the new version of package =apel-parsers= (currently 1.1.2) that will be updated when available. ---+++ Reservation for ops As we are unable to reserve a single core on a node so we have reserved two nodes t run these jobs <verbatim> ReservationName=priority_jobs StartTime=24 Oct 16:29 EndTime=24 Oct 2014 Duration=365-00:00:00 Nodes=wn65,wn73 NodeCnt=2 CoreCnt=64 Features=(null) PartitionName=(null) Flags=IGNORE_JOBS,SPEC_NODES Users=(null) Accounts=ops,dteam Licenses=(null) State=ACTIVE </verbatim> Jobs are directed to this reservation by a modification to the submission script. <verbatim> vim /usr/libexec/slurm_local_submit_attributes.sh ... # DTEAM REGEX="dteam[0-9][0-9][0-9]" USER=`whoami` if [[ ( $USER =~ $REGEX ) ]] ; then # This extracts the queue from the SUDO command and assigns the dteam reservation if required QUEUE=$(echo $SUDO_COMMAND |awk -F'-q ' '{print $2}' | sed 's/ -n.*$//g' 2>&1) if [ "$QUEUE" == "cscs" ]; then echo "#SBATCH --reservation=priority_jobs" fi fi ... </verbatim> ---++ Arc ---+++ Submit with non atlas user The below file can be modified to allow a user to submit to the ARC CE. <verbatim> vim /usr/share/arc/ARC0ClusterInfo.pm if ($q->{'name'} eq "cscs" and $sn !~ m/Pablo Fern/) { next; } </verbatim> ---+++ Accounting Currently the accounting data publishing via _jura_ is under investigation, but since temporary accounting data are filling up =arc[01,02]= 's disks those data have been stored on the NAS for future reference: <verbatim> nas.lcg.cscs.ch:/ifs/LCG/shared/apel_accounting_backup /opt/apel_accounting_backup </verbatim> for each machine a specific directory has been created where temporary data (i.e. APEL-compliant records not sent yet) can be moved from time to time to free some space on the disk: <verbatim> [root@arc02:~]# mv /var/spool/arc/ssm/test-msg02.afroditi.hellasgrid.gr/outgoing/00000000/* /opt/apel_accounting_backup/arc02_outgoing_tmp/ssm/test-msg02/ [root@arc01:~]# mv /var/spool/arc/ssm/test-msg02.afroditi.hellasgrid.gr/outgoing/00000000/* /opt/apel_accounting_backup/arc01_outgoing_tmp/ssm/test-msg02/ </verbatim> Another file that can be moved in order to free some space is: <verbatim> [root@arc02:~]# mv /var/spool/nordugrid/jobstatus/job.logger.errors /gpfs/apel_test/job.logger.errors_arc02_20140205 </verbatim> this file can be easily grow to a few GB in case of sending errors reported by _jura_. ---+++ Modify the job comment to reflect the DN The file "/usr/share/arc/submit-SLURM-job" has modifications made to it to enable viewing of the DN the job was submitted by in the job comment. This gives much more detail when looking at things like squeue. <verbatim> MYUSERDN=$(/usr/bin/openssl x509 -in ${X509_USER_PROXY} -subject -noout | sed -r 's/.*= (.*)/\1/g' 2>&1) MYHN=$(hostname -s) COMMENT="\"$MYHN,$MYUSERDN\"" echo "#SBATCH --comment=$COMMENT" >> $LRMS_JOB_SCRIPT </verbatim> Previously there were issues with the memory size requested by jobs however this has since been resolved upstream. ---++ dCache No real modifications specific to dCache itself for CSCS. See dCache wiki page for set-up. * Storage pools =se0[1-8]= have python26 package installed by hand from *epel* <verbatim>pdsh -w se0[1-8] 'yum install python26 --enablerepo=epel -y' |dshbak -c</verbatim> ---+++ Prevent publishing of file access protocol. With the NFS41 domain the file access protocol is published and not easily disabled. As WN do not mount /pnfs pilots will fail. To work around this we perform a sed. Below is an example prior to modifying the info provider script. <verbatim> /var/lib/bdii/gip/provider/info-based-infoProvider.sh > /tmp/info.orig sed -e '232,246d' /tmp/info.orig > /tmp/info.mod diff /tmp/info.* 231a232,246 > dn: GlueSEAccessProtocolLocalID=NFSv41-storage02@nfs-storage02Domain,GlueSE > UniqueID=storage01.lcg.cscs.ch,mds-vo-name=resource,o=grid > objectClass: GlueSETop > objectClass: GlueSEAccessProtocol > objectClass: GlueKey > objectClass: GlueSchemaVersion > GlueSEAccessProtocolLocalID: NFSv41-storage02@nfs-storage02Domain > GlueSEAccessProtocolType: file > GlueSEAccessProtocolEndpoint: file://storage02.lcg.cscs.ch:2049 > GlueSEAccessProtocolMaxStreams: 5 > GlueSEAccessProtocolCapability: file transfer > GlueSEAccessProtocolVersion: file > GlueSchemaVersionMajor: 1 > GlueSchemaVersionMinor: 3 > GlueChunkKey: GlueSEUniqueID=storage01.lcg.cscs.ch </verbatim> It appears in later versions of dcache a second modification is required, the following must be set to a blank value (by default it is set to file). We noticed this when upgrading to 2.6.27 <verbatim> nfs.published.name = </verbatim> https://ggus.eu/index.php?mode=ticket_info&ticket_id=105586#update#6 -- Main.GeorgeBrown - 2013-11-11
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r16
<
r15
<
r14
<
r13
<
r12
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r16 - 2014-07-07
-
GeorgeBrown
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback