Tags:
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> %TOC% %ICON{arrowleft}% Go to [[CMSTier3LogXX][previous page]] / [[CMSTier3LogXX][next page]] of Tier3 site log %M% ---+ 10. 03. 2010 Testing the SEs We want to test different SE setup, and their behaviour under a real user case. For this, we plan to use CMSSW_3_3_6 and some standard cfgs. The test will be performed running N=1,2,4,6,8,10 parallel jobs, and monitoring the SE performances though dstat (on the wn) and dtrace (on the fs pool). ---++ File server setup * t3fs01: 4 * (9 disk raidz1) + 1 * (8 disk raidz1), disk brand: HITACHI HUA7250S %TWISTY{showlink="Show" hidelink="Hide" showimgleft="%ICONURLPATH{toggleopen-small}%" hideimgleft="%ICONURLPATH{toggleclose-small}%"}%<pre> root@t3fs01.psi.ch # zpool status pool: data1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM data1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 spares c0t3d0 AVAIL c0t7d0 AVAIL errors: No known data errors </pre>%ENDTWISTY% * t3fs05: 4 * (11 disk raidz2), disk brand: HITACHI HUA7250S %TWISTY{showlink="Show" hidelink="Hide" showimgleft="%ICONURLPATH{toggleopen-small}%" hideimgleft="%ICONURLPATH{toggleclose-small}%"}%<pre> bash-4.0# zpool status pool: data1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM data1 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c8t0d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 spares c0t3d0 AVAIL c0t7d0 AVAIL errors: No known data errors </pre> %ENDTWISTY% ---++ Choosing the cfg The needed cfg must put a nice load on the pool, and also be sensible to raid configuration (random reads). One good candidate is the PAT produces, with statndard CMSSW settings (application-only, cache=0). This can be seens from these preliminary tests (1=fs01, 2= fs05, 3=fs03): * T3_CH_PSI-PAT-MultiTest-Std-dstat-NET_Rx.jpg: <br /> <img width="696" alt="T3_CH_PSI-PAT-MultiTest-Std-dstat-NET_Rx.jpg" src="/twiki/pub/CmsTier3/CMSTier3Log10/T3_CH_PSI-PAT-MultiTest-Std-dstat-NET_Rx.jpg" height="472" /> It can be seen that for fs05 (raidz2) performances are degraded. Also, you can see as a lazy-download with cache=20 has a much higher load on the net, but in bursts. Nevertheless, for putting an high load a JetAnalyzer job is more suited (always in Lazy-download configuration), as it request data with an higher frequency (see https://twiki.cern.ch/twiki/bin/view/Main/LSIOforAnalysisOperations-SingleTests#TestJPE and [[here][https://twiki.cern.ch/twiki/pub/Main/LSIOforAnalysisOperations-SingleTests/T3_CH_PSI-JPE-20100220-1_2.png]] ). This should be less sensitive to underlying raid cfg (bigger read chuncks, to be verified) So, the proposal is to use: * PAT jobs reading 2k events each (2 root files as source), ~10 mins duration each. This to measure differences in raid setup * JetAnalyzer jobs reading ~10k events each (~8 files) with lazy-download setup, ~5 mins each The number of events depends on how much long we want the job. ---++ Setting up the test First of all, we need to place some test root files as source on the desidere pool. We can just use one RECO file: =/store/mc/Summer09/QCD_Pt170-herwig/GEN-SIM-RECO/MC_31X_V3-v1/0002/549DB055-FF7D-DE11-BF9C-00221981BA9B.root= and replicate it in various copies on the pools. One idea should be to use some convention in naming the files, as test_${POOL}_${N}_${n}-${i}.root, where: * ${POOL} is the pool where the file is * ${N} is the number of total jobs running in parallel * ${n} is the number of the job * ${i} is the number of the file (from 1 to 8?) Each file weights ~1GB. A script for doing that is: <verbatim> MOTHER=/store/mc/Summer09/QCD_Pt170-herwig/GEN-SIM-RECO/MC_31X_V3-v1/0002/549DB055-FF7D-DE11-BF9C-00221981BA9B.root SON=test.root SEs=(t3fs07 t3fs08 t3fs09) RUNS=(1 4 8) srmcp -2 srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/${MOTHER} file:////tmp/${SON} for POOL in ${SEs[*]}; do for N in ${RUNS[*]}; do for n in `seq 1 ${N}`; do srmcp -2 file:////tmp/${SON} srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/store/user/leo/TestBed/test_${POOL}_${N}_${n}-1.root srmcp -2 file:////tmp/${SON} srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/store/user/leo/TestBed/test_${POOL}_${N}_${n}-2.root srmcp -2 file:////tmp/${SON} srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/store/user/leo/TestBed/test_${POOL}_${N}_${n}-3.root srmcp -2 file:////tmp/${SON} srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/store/user/leo/TestBed/test_${POOL}_${N}_${n}-4.root done done done </verbatim> the pools are populated singularly putting all the fs but the chosen one in read-only (for derek: command?). Now, let's prepare a CMSSW project dir: =$ scram p CMSSW CMSSW_3_3_6= and get the JetAnalyzer code: <verbatim> $ cd CMSSW_3_3_6/src $ export SCRAM_ARCH=slc4_ia32_gcc345 && eval `scram ru -sh` && cmsenv $ scp -r /shome/leo/Installations/Perf_3_3_6/CMSSW_3_3_6/src/RecoJets . $ scram b </verbatim> Create a directory for our tests (TestBed) and paste the two cfg at the end of the page into two files called =PAT_template.py= and =JPE_CHLD_CACHE20_template.py=. A script will create the cfgs and run the jobs in parallel: <verbatim> #!/bin/bash CFG=PAT DIR=Site.T3_CH_PSI-Label.HT-Setting.ParallelTest-Cfg.${CFG}-Set. SEDIR=/store/user/leo/Test SEs=(fs01 fs05) RUNS=(1 2 4 6 8 10) for N in `seq 1 10`; do mkdir ${DIR}${N}; done for i in ${SEs[*]}; do for N in ${RUNS[*]}; do M=$(( ${N}-1 )) for n in `seq 1 ${M}`; do echo $n $N $i; DIRNAME=${DIR}${N} FILENAME=${CFG}-${N}_${n}_${i} echo Config:${FILENAME} cat ${CFG}_template.py | sed "s|<FILES>|\'${SEDIR}/test_${i}_${N}_${n}-1.root\',\'${SEDIR}/test_${i}_${N}_${n}-2.r oot\',\'${SEDIR}/test_${i}_${N}_${n}-3.root\',${SEDIR}/test_${i}_${N}_${n}-4.root\' |g " > tmp.py cat tmp.py | sed "s|<OUTFILE>|out_${i}_${N}_${n}.root|g" > ${FILENAME}.py dstat -T -c -d -m -n --nocolor --noheaders --output ${DIRNAME}/${FILENAME}_dstat.csv 30 > /dev/null & PID=$! echo $PID sleep 60 && ( /usr/bin/time cmsRun -j ${DIRNAME}/${FILENAME}.xml ${FILENAME}.py ) &> ${DIRNAME}/${FILENAME}.stdout &&\ echo PID is ${PID} && kill ${PID} & done n=${N} DIRNAME=${DIR}${N} FILENAME=${CFG}-${N}_${n}_${i} echo Config:${FILENAME} cat ${CFG}_template.py | sed "s|<FILES>|\'${SEDIR}/test_${i}_${N}_${n}-1.root\',\'${SEDIR}/test_${i}_${N}_${n}-2.\ root\',\'file:${SEDIR}/test_${i}_${N}_${n}-3.root\',\'file:${SEDIR}/test_${i}_${N}_${n}-4.root\' |g " > tmp.py cat tmp.py | sed "s|<OUTFILE>|out_${i}_${N}_${n}.root|g" > ${FILENAME}.py dstat -T -c -d -m -n --nocolor --noheaders --output ${DIRNAME}/${FILENAME}_dstat.csv 30 > /dev/null & PID=$! echo $PID sleep 60 && ( /usr/bin/time cmsRun -j ${DIRNAME}/${FILENAME}.xml ${FILENAME}.py ) &> ${DIRNAME}/${FILENAME}.stdout &&\ echo PID is ${PID} && kill ${PID} sleep 300; done done </verbatim> This ugly script (no time for doing it better) creates a logging directory for each set of parallel jobs, then for each poll runs 1,2,4...10 jobs in parallel, saving also the information retrieved from dstat. Before each run there is one minute of sleeping to let measure the initial conditions, and 300 secs of sleeping among different parallel sets for the same reason (and also having a good view on ganglia plots) ---++ Cfg ---+++ PAT <verbatim> from PhysicsTools.PatAlgos.patTemplate_cfg import * process.load("PhysicsTools.PatAlgos.patSequences_cff") from PhysicsTools.PatAlgos.tools.coreTools import * restrictInputToAOD(process) from PhysicsTools.PatAlgos.tools.tauTools import * switchTo31Xdefaults(process) from PhysicsTools.PatAlgos.tools.jetTools import * switchJetCollection(process, cms.InputTag('sisCone5CaloJets'), doJTA = True, doBTagging = True, jetCorrLabel = ('SC5','Calo'), doType1MET = True, genJetCollection = cms.InputTag("sisCone5GenJets"), doJetID = False, jetIdLabel = "ak5" ) process.p = cms.Path( process.patDefaultSequence) process.source.fileNames = [ <FILES> ] #important, as the files are actually the same process.source.duplicateCheckMode = cms.untracked.string('noDuplicateCheck') #process.AdaptorConfig = cms.Service("AdaptorConfig", # tempDir=cms.untracked.string(""), # cacheHint=cms.untracked.string("auto-detect"), # readHint=cms.untracked.string("auto-detect") # ) #process.source.cacheSize = cms.untracked.uint32(20*1024*1024) process.maxEvents.input = 2000 ## (e.g. -1 to run on all events) process.out.fileName = '<OUTFILE>' ## (e.g. 'myTuple.root') process.out.outputCommands = cms.untracked.vstring('drop *' ) process.options.wantSummary = False ## (to suppress the long output at the end of the job) process.MessageLogger.cerr.threshold = 'ERROR' process.CPU = cms.Service("CPU", useJobReport = cms.untracked.bool(True), reportCPUProperties = cms.untracked.bool(True) ) </verbatim> ---+++ JetAna <verbatim> import FWCore.ParameterSet.Config as cms process = cms.Process("Ana") process.load("FWCore.MessageService.MessageLogger_cfi") process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(10000) ) process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring( <FILES>) ) #important, as the files are actually the same process.source.duplicateCheckMode = cms.untracked.string('noDuplicateCheck') process.calo = cms.EDAnalyzer("CaloJetPlotsExample", JetAlgorithm = cms.string('iterativeCone5CaloJets'), HistoFileName = cms.string('<OUTFILE>'), NJets = cms.int32(2) ) process.p = cms.Path(process.calo) process.MessageLogger.cerr.FwkReport.reportEvery = 50000 process.AdaptorConfig = cms.Service("AdaptorConfig", tempDir=cms.untracked.string(""), cacheHint=cms.untracked.string("lazy-download"), readHint=cms.untracked.string("auto-detect")) process.source.cacheSize = cms.untracked.uint32(20*1024*1024) </verbatim> -- Main.LeonardoSala - 2010-03-10 --- %ICON{arrowleft}% Go to [[CMSTier3LogXX][previous page]] / [[CMSTier3LogXX][next page]] of Tier3 site log %M%
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
jpg
T3_CH_PSI-PAT-MultiTest-Std-dstat-NET_Rx.jpg
r1
manage
18.9 K
2010-03-10 - 13:28
LeonardoSala
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r6
<
r5
<
r4
<
r3
<
r2
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r5 - 2010-03-16
-
DerekFeichtinger
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback