Tags:
create new tag
view all tags

Node Type: OLDNFShomeServer

Firewall requirements

local port open to reason


Emergency Measures

Too many frequent or heavy writes NEW

You can identify which user files are opened in write mode by : More... Close
[root@t3admin01 ~]# salt 't3*' cmd.run  " lsof -w -N | grep shome | grep REG | egrep ' [0-9]*u | [0-9]*w '|  awk '{ print \$9}' | xargs -I {} -i bash -c 'ls -lh {}' "
t3bdii02.psi.ch:
t3ldap02.psi.ch:
t3frontier01.psi.ch:
t3ldap01:
t3wn42.psi.ch:
t3ce01.psi.ch:
t3wn18.psi.ch:
t3wn28.psi.ch:
t3wn38.psi.ch:
t3wn40.psi.ch:
t3wn12.psi.ch:
t3wn23.psi.ch:
t3wn26.psi.ch:
t3wn17.psi.ch:
t3wn19.psi.ch:
t3fs14.psi.ch:
t3bdii01.psi.ch:
t3nagios.psi.ch:
t3wn31.psi.ch:
t3wn21.psi.ch:
t3ui17.psi.ch:
    -rw-r--r-- 1 mquittna ethz-ecal 16K Jul  8 09:18 /shome/mquittna/CMSSW/EXO_7_4_0_pre9/src/diphotons/Analysis/macros/.combine_maker.py.swp
    -rw-r--r-- 1 gaperrin ethz-susy 20K Jul  8 13:31 /shome/gaperrin/tnp_gael/SSDLBkgEstimationTP/TandP/.FitInvMassBkg.C.swp
    -rw-r--r-- 1 gaperrin ethz-susy 12K Jul  8 13:34 /shome/gaperrin/tnp_gael/SSDLBkgEstimationTP/TandP/.start_job2.sh.swp
    -rw-r--r-- 1 gaperrin ethz-susy 12K Jul  8 11:43 /shome/gaperrin/tnp_gael/SSDLBkgEstimationTP/TandP/.job2.sh.swp
    -rw-r--r-- 1 gaperrin ethz-susy 48K Jul  8 13:50 /shome/gaperrin/tnp_gael/SSDLBkgEstimationTP/TandP/.DrawInvMassBkg_combi.cc.swp
    -rw-r--r-- 1 gaperrin ethz-susy 52K Jul  8 13:45 /shome/gaperrin/tnp_gael/SSDLBkgEstimationTP/TandP/.MC_Ratio.C.swp
    -rw-r--r-- 1 gaperrin ethz-susy 48K Jul  8 13:47 /shome/gaperrin/tnp_gael/SSDLBkgEstimationTP/TandP/.TandP.C.swp
    -rw-r--r-- 1 gaperrin ethz-susy 36K Jul  8 13:50 /shome/gaperrin/tnp_gael/SSDLBkgEstimationTP/TandP/.CompareMCvsTandP.cc.swp
    -rw-r--r-- 1 gaperrin ethz-susy 12K Jul  8 13:52 /shome/gaperrin/tnp_gael/SSDLBkgEstimationTP/TandP/.start_job.sh.swp
    -rw-r--r-- 1 mdunser ethz-susy 88K May  5 10:33 /shome/mdunser/FakeLeptonFW/macros/.closure.py.swo
    ls: cannot access /shome/bianchi/TTH-72X-heppy/CMSSW/src/TTH/MEIntegratorStandalone/test/validate_^W7^A: No such file or directory
    -rw-r--r-- 1 mdunser ethz-susy 102K May  5 16:21 /shome/mdunser/.ipython/profile_default/history.sqlite
    -rw-r--r-- 1 mdunser ethz-susy 102K May  5 16:21 /shome/mdunser/.ipython/profile_default/history.sqlite
t3wn27.psi.ch:
t3wn22.psi.ch:
t3wn16.psi.ch:
t3wn11.psi.ch:
t3wn10.psi.ch:
t3wn33.psi.ch:
t3ui05.psi.ch:
    -rw-r--r-- 1 casal ethz-susy 16K May 18 15:22 /shome/casal/CMSSW/sms_prod/CMSSW_5_3_7_patch5/src/MT2analysis/Code/MT2AnalysisCode/RootMacros/.treeConversion.py.swp
t3wn32.psi.ch:
t3wn34.psi.ch:
t3service01:
t3ce02.psi.ch:
t3vmui01.psi.ch:
t3wn20.psi.ch:
t3cmsvobox01.psi.ch:
t3wn39.psi.ch:
t3wn43.psi.ch:
t3wn35.psi.ch:
t3fs13.psi.ch:
t3ui19.psi.ch:
t3ui12.psi.ch:
    -rw-rw-r-- 1 jngadiub uniz-higgs 0 Jul  8 14:05 /shome/jngadiub/EXOVVAnalysisRunII/CMSSW_7_4_3/tmp/slc6_amd64_gcc491/src/CalibMuon/DTCalibration/plugins/CalibMuonDTCalibrationPlugins/SealModule.o
    -rw-rw-r-- 1 jngadiub uniz-higgs 0 Jul  8 14:05 /shome/jngadiub/EXOVVAnalysisRunII/CMSSW_7_4_3/tmp/slc6_amd64_gcc491/src/TrackingTools/TrackAssociator/test/testTrackingToolsTrackAssociator/TestTrackAssociator.o
    -rw-rw-r-- 1 jngadiub uniz-higgs 0 Jul  8 14:05 /shome/jngadiub/EXOVVAnalysisRunII/CMSSW_7_4_3/tmp/slc6_amd64_gcc491/src/TrackingTools/TrackAssociator/test/testCaloMatchingExample/CaloMatchingExample.o
    -rw-rw-r-- 1 jngadiub uniz-higgs 0 Jul  8 14:05 /shome/jngadiub/EXOVVAnalysisRunII/CMSSW_7_4_3/tmp/slc6_amd64_gcc491/src/TrackingTools/TrackAssociator/plugins/TrackingToolsTrackAssociatorPlugins/DetIdAssociatorESProducer.o
    -rw-rw-r-- 1 jngadiub uniz-higgs 0 Jul  8 14:05 /shome/jngadiub/EXOVVAnalysisRunII/CMSSW_7_4_3/tmp/slc6_amd64_gcc491/src/TrackingTools/TrackAssociator/plugins/TrackingToolsTrackAssociatorPlugins/MuonDetIdAssociator.o
    -rw-rw-r-- 1 jngadiub uniz-higgs 0 Jul  8 14:05 /shome/jngadiub/EXOVVAnalysisRunII/CMSSW_7_4_3/tmp/slc6_amd64_gcc491/src/TrackingTools/TrackAssociator/plugins/TrackingToolsTrackAssociatorPlugins/modules.o
    -rw-rw-r-- 1 jngadiub uniz-higgs 0 Jul  8 14:05 /shome/jngadiub/EXOVVAnalysisRunII/CMSSW_7_4_3/tmp/slc6_amd64_gcc491/src/SimTracker/TrackerHitAssociation/plugins/SimTrackerTrackerHitAssociationPlugins/ClusterTPAssociationProducer.o
    -rw-rw-r-- 1 jngadiub uniz-higgs 0 Jul  8 14:05 /shome/jngadiub/EXOVVAnalysisRunII/CMSSW_7_4_3/tmp/slc6_amd64_gcc491/src/SimTracker/VertexAssociatorESProducer/src/SimTrackerVertexAssociatorESProducer/SealModules.o
    -rw-rw-r-- 1 jngadiub uniz-higgs 0 Jul  8 14:05 /shome/jngadiub/EXOVVAnalysisRunII/CMSSW_7_4_3/tmp/slc6_amd64_gcc491/src/SimTracker/VertexAssociatorESProducer/src/SimTrackerVertexAssociatorESProducer/VertexAssociatorByTracksESProducer.o
    -rw-r--r-- 1 jpata ethz-higgs 0 Jul  8 13:40 /shome/jpata/TTH-72X-heppy-dev/CMSSW/src/VHbbAnalysis/Heppy/test/test/log.txt
    -rw-r--r-- 1 jpata ethz-higgs 228 Jul  8 13:40 /shome/jpata/TTH-72X-heppy-dev/CMSSW/src/VHbbAnalysis/Heppy/test/test/pileup.root
    -rw-r--r-- 1 jpata ethz-higgs 14M Jul  8 14:05 /shome/jpata/TTH-72X-heppy-dev/CMSSW/src/VHbbAnalysis/Heppy/test/test/tree.root
t3wn13.psi.ch:
t3wn24.psi.ch:
t3mon01:
t3wn37.psi.ch:
t3wn41.psi.ch:
t3ui18.psi.ch:
t3wn15.psi.ch:
t3se01.psi.ch:
t3wn29.psi.ch:
t3wn25.psi.ch:
t3wn30.psi.ch:
t3wn44.psi.ch:
t3wn50.psi.ch:
t3ui16.psi.ch:
t3ui15.psi.ch:
t3wn14.psi.ch:
t3wn36.psi.ch:
t3dcachedb03.psi.ch:

RPC program nfs version 3 tcp is not running

In Nov 2014 we got this CMSTier3Log67 case
  • check nagios
  • If t3fs06 will fail then the t3ui1* and the t3wn* servers that mount t3fs06:/shome will be immediately affected ; if you can't quickly recover t3fs06:/shome ( e.g. due to a failed motherboard ) you'll have to umount /shome from those servers and mount t3fs05:/shome2 that is suppose to be an identical copy of t3fs06:/shome ; probably you'll need to make symbolic links /shome2 -> /shome
  • On t3fs05 obviously stop the cron sending by rsync /swshare to t3fs06.
  • Tweak t3nagios to forget about t3fs06

Regular Maintenance work

Nagios

check nagios

Installation

crontab -l root

                                                                                                                                                                                                                                                                                                                                                 
#ident  "@(#)root       1.21    04/03/23 SMI"
#
# The root crontab should be used to perform accounting data collection.
#
#
10 3 * * * /usr/sbin/logadm
15 3 * * 0 /usr/lib/fs/nfs/nfsfind
30 3 * * * [ -x /usr/lib/gss/gsscred_clean ] && /usr/lib/gss/gsscred_clean
#
# The rtc command is run to adjust the real time clock if and when
# daylight savings time changes.
#
1 2 * * * [ -x /usr/sbin/rtc ] && /usr/sbin/rtc -c > /dev/null 2>&1
#
# create regular snapshots of the shome file system
#
#20 00 * * * /root/psit3-tools/regular-snapshot-new -f shome -v -s t3fs05 -r shome2/shomebup 2>&1 | /usr/bin/tee /var/cron/lastsnap.txt 2>&1 ; [[ $? -ne 0 ]] && /usr/bin/mail cms-tier3@lists.psi.ch  < /var/cron/lastsnap.txt 
#
# Added by cswcrontab for CSWlogwatch
02 4 * * * /opt/csw/bin/logwatch
#
# for ganglia monitoring of shome space
53 * * * * /root/gmetric/gmetric_partition_space-cron.sh
#
# for detailed local monitoring of user space 
44 01 * * * /shome/monuser/shome-du.cron.sh
#
43 3 * * * [ -x /opt/csw/bin/gupdatedb ] && /opt/csw/bin/gupdatedb --prunepaths="/shome /dev /devices /proc /tmp /var/tmp" 1>/dev/null 2>&1 # Added by CSWfindutils
# 09/03/2015 - F.Martinelli 
22 03 * * * /opt/zfsnap/zfssnap -v shome && /opt/csw/bin/rsync --progress -v --delete -a  -e "ssh -c arcfour"   /shome/ t3fs05:/shome2  2>&1 | /usr/bin/tee /var/cron/zfssnap.shome.log 2>&1  

Shared File Systems on ZFS - OLD

  • /shome: Two 9 disk raidz2 sets are used for shome.
  • /vmshare: raidz2 set and spares. Hosts some of the older VMs
  • spare disks

---------------------SunFireX4500------Rear----------------------------

36:   37:   38:   39:   40:   41:   42:   43:   44:   45:   46:   47:
c6t3  c6t7  c5t3  c5t7  c8t3  c8t7  c7t3  c7t7  c1t3  c1t7  c0t3  c0t7
^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
24:   25:   26:   27:   28:   29:   30:   31:   32:   33:   34:   35:
c6t2  c6t6  c5t2  c5t6  c8t2  c8t6  c7t2  c7t6  c1t2  c1t6  c0t2  c0t6
^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
12:   13:   14:   15:   16:   17:   18:   19:   20:   21:   22:   23:
c6t1  c6t5  c5t1  c5t5  c8t1  c8t5  c7t1  c7t5  c1t1  c1t5  c0t1  c0t5
^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
 0:    1:    2:    3:    4:    5:    6:    7:    8:    9:   10:   11:
c6t0  c6t4  c5t0  c5t4  c8t0  c8t4  c7t0  c7t4  c1t0  c1t4  c0t0  c0t4
^b+   ^b+   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
-------*-----------*-SunFireX4500--*---Front-----*-----------*----------

User quotas

After upgrading the ZFS version, it was necessary to initialise the accounting information. This can take quite some time...

zfs userspace shome

User quotas can be set and viewed in the following way (can use name or id for users)

zfs set userquota@3896=500G  shome

zfs get userquota@3896  shome

The current usage of all users can be seen with

zfs userspace shome
zfs userspace -p -s used shome  # exact value and sorted

zfs list -t snapshot

                                                                                                                                                                                                                                                                                                                                                
NAME                                          USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT/s10x_u8wos_08a@2011-Feb-18_11-51  46.3M      -  3.70G  -
rpool/ROOT/s10x_u8wos_08a@python-20110303    49.7M      -  3.77G  -
rpool/ROOT/s10x_u8wos_08a@31-May-2012         142M      -  4.11G  -
rpool/ROOT/s10x_u8wos_08a@09-Apr-2013         118M      -  5.35G  -
rpool/ROOT/s10x_u8wos_08a@05-Jun-2013         133M      -  5.43G  -
rpool/ROOT/s10x_u8wos_08a@28-Nov-2013         121M      -  5.53G  -
rpool/ROOT/s10x_u8wos_08a@21-03-2014          131M      -  5.55G  -
rpool/ROOT/s10x_u8wos_08a@24-Jun-2014         144M      -  5.62G  -
rpool/ROOT/s10x_u8wos_08a@11-Sep-2014         155M      -  5.64G  -
rpool/ROOT/s10x_u8wos_08a@20-01-2015          163M      -  5.66G  -
rpool/ROOT/s10x_u8wos_08a@30-01-2015          164M      -  5.66G  -
rpool/ROOT/s10x_u8wos_08a@06-03-2015          165M      -  5.66G  -
rpool/ROOT/s10x_u8wos_08a@03-06-2015             0      -  5.67G  -
shome@zfssnap_2015-05-25_03.22.00--10d       3.08G      -  4.98T  -  <-- /opt/zfsnap/zfssnap -v shome && /opt/csw/bin/rsync --progress -v --delete -a  -e "ssh -c arcfour"   /shome/ t3fs05:/shome2
shome@zfssnap_2015-05-26_03.22.00--10d       3.12G      -  4.98T  -
shome@zfssnap_2015-05-27_03.22.00--10d       6.70G      -  4.94T  -
shome@zfssnap_2015-05-28_03.22.00--10d       6.00G      -  4.95T  -
shome@zfssnap_2015-05-29_03.22.00--10d       4.08G      -  4.95T  -
shome@zfssnap_2015-05-30_03.22.00--10d       2.87G      -  4.94T  -
shome@zfssnap_2015-05-31_03.22.00--10d       2.87G      -  4.94T  -
shome@zfssnap_2015-06-01_03.22.00--10d       6.48G      -  4.94T  -
shome@zfssnap_2015-06-02_03.22.00--10d       3.06G      -  4.96T  -
shome@zfssnap_2015-06-03_03.22.00--10d       3.22G      -  4.95T  -
swshare2/swsharebup@auto2015-05-27_06:00:00  2.36G      -   560G  -
swshare2/swsharebup@auto2015-05-28_06:00:01  1.07M      -   560G  -
swshare2/swsharebup@auto2015-05-29_06:00:00   950K      -   559G  -
swshare2/swsharebup@auto2015-05-30_06:00:00  1.47M      -   559G  -
swshare2/swsharebup@auto2015-05-31_06:00:00  1.36M      -   559G  -
swshare2/swsharebup@auto2015-06-01_06:00:00   725K      -   560G  -
swshare2/swsharebup@auto2015-06-02_06:00:00  1.00M      -   559G  -
swshare2/swsharebup@auto2015-06-03_06:00:00      0      -   560G  -

daily snapshots and backup

A script /root/psit3-tools/regular-snapshot is called by root's crontab to make a daily incremental snapshot of the shome ZFS file system to t3fs05. Users can retrieve files from these snapshots by themselves, as documented in HowToRetrieveBackupFiles. The script also deletes the older snapshots. The script is run by cron. Look also at the tests for doing incremental snapshot transfers in CMSTier3Log12.

ZFS Backup server on t3fs05 - OLD

  • shome2 : Backup of shome area
  • vmshare : Backup of virtual machine area (for the older vmware-server based machines)
  • swshare : cluster's shared software space (e.g. experiment SW)
  • spare disks

---------------------SunFireX4500------Rear----------------------------

36:   37:   38:   39:   40:   41:   42:   43:   44:   45:   46:   47:
c6t3  c6t7  c5t3  c5t7  c8t3  c8t7  c7t3  c7t7  c1t3  c1t7  c0t3  c0t7
^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
24:   25:   26:   27:   28:   29:   30:   31:   32:   33:   34:   35:
c6t2  c6t6  c5t2  c5t6  c8t2  c8t6  c7t2  c7t6  c1t2  c1t6  c0t2  c0t6
^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
12:   13:   14:   15:   16:   17:   18:   19:   20:   21:   22:   23:
c6t1  c6t5  c5t1  c5t5  c8t1  c8t5  c7t1  c7t5  c1t1  c1t5  c0t1  c0t5
^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
 0:    1:    2:    3:    4:    5:    6:    7:    8:    9:   10:   11:
c6t0  c6t4  c5t0  c5t4  c8t0  c8t4  c7t0  c7t4  c1t0  c1t4  c0t0  c0t4
^b+   ^b+   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
-------*-----------*-SunFireX4500--*---Front-----*-----------*----------
NodeTypeForm
Hostnames t3fs06 - OUTDATED !
Services NFS (user home area), backup on t3fs05
Hardware SUN X4500 (2*Opt 290, 16GB RAM, 48*500GB SATA)
Install Profile none
Guarantee/maintenance until t3fs05,06: 2011-02-14
Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r20 - 2016-11-04 - FabioMartinelli
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback