Go to
previous page /
next page of Tier3 site log
10. 03. 2010 Testing the SEs
We want to test different SE setup, and their behaviour under a real user case. For this, we plan to use CMSSW_3_3_6 and some standard cfgs. The test will be performed running N=1,2,4,6,8,10 parallel jobs, and monitoring the SE performances though dstat (on the wn) and dtrace (on the fs pool).
Choosing the cfg
The needed cfg must put a nice load on the pool, and also be sensible to raid configuration (random reads). One good candidate is the PAT produces, with statndard CMSSW settings (application-only, cache=0). This can be seens from these preliminary tests (1=fs01, 2= fs05, 3=fs03):
- T3_CH_PSI-PAT-MultiTest-Std-dstat-NET_Rx.jpg:
It can be seen that for fs05 (raidz2) performances are degraded. Also, you can see as a lazy-download with cache=20 has a much higher load on the net, but in bursts.
Nevertheless, for putting an high load a
JetAnalyzer job is more suited (always in Lazy-download configuration), as it request data with an higher frequency (see
https://twiki.cern.ch/twiki/bin/view/Main/LSIOforAnalysisOperations-SingleTests#TestJPE and
https://twiki.cern.ch/twiki/pub/Main/LSIOforAnalysisOperations-SingleTests/T3_CH_PSI-JPE-20100220-1_2.png ). This should be less sensitive to underlying raid cfg (bigger read chuncks, to be verified)
So, the proposal is to use:
- PAT jobs reading 2k events each (2 root files as source), ~10 mins duration each. This to measure differences in raid setup
- JetAnalyzer jobs reading ~10k events each (~8 files) with lazy-download setup, ~5 mins each
The number of events depends on how much long we want the job.
Setting up the test
First of all, we need to place some test root files as source on the desidere pool. We can just use one RECO file:
/store/mc/Summer09/QCD_Pt170-herwig/GEN-SIM-RECO/MC_31X_V3-v1/0002/549DB055-FF7D-DE11-BF9C-00221981BA9B.root
and replicate it in various copies on the pools. One idea should be to use some convention in naming the files, as test_${POOL}_${N}_${n}-${i}.root, where:
- ${POOL} is the pool where the file is
- ${N} is the number of total jobs running in parallel
- ${n} is the number of the job
- ${i} is the number of the file (from 1 to 8?)
Each file weights ~1GB. A script for doing that is:
MOTHER=/store/mc/Summer09/QCD_Pt170-herwig/GEN-SIM-RECO/MC_31X_V3-v1/0002/549DB055-FF7D-DE11-BF9C-00221981BA9B.root
SON=test.root
SEs=(t3fs07 t3fs08 t3fs09)
RUNS=(1 4 8)
srmcp -2 srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/${MOTHER} file:////tmp/${SON}
for POOL in ${SEs[*]}; do
for N in ${RUNS[*]}; do
for n in `seq 1 ${N}`; do
srmcp -2 file:////tmp/${SON} srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/store/user/leo/TestBed/test_${POOL}_${N}_${n}-1.root
srmcp -2 file:////tmp/${SON} srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/store/user/leo/TestBed/test_${POOL}_${N}_${n}-2.root
srmcp -2 file:////tmp/${SON} srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/store/user/leo/TestBed/test_${POOL}_${N}_${n}-3.root
srmcp -2 file:////tmp/${SON} srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/store/user/leo/TestBed/test_${POOL}_${N}_${n}-4.root
done
done
done
Now, le's prepare a CMSSW project dir:
$ scram p CMSSW CMSSW_3_3_6
and get the
JetAnalyzer code:
$ cd CMSSW_3_3_6/src
$ export SCRAM_ARCH=slc4_ia32_gcc345 && eval `scram ru -sh` && cmsenv
$ scp -r /shome/leo/Installations/Perf_3_3_6/CMSSW_3_3_6/src/RecoJets .
$ scram b
Create a directory for our tests (
TestBed) and paste the two cfg at the end of the page into two files called
PAT_template.py
and
JPE_CHLD_CACHE20_template.py
. A script will create the cfgs and run the jobs in parallel:
#!/bin/bash
CFG=PAT
DIR=Site.T3_CH_PSI-Label.HT-Setting.ParallelTest-Cfg.${CFG}-Set.
SEDIR=/store/user/leo/Test
SEs=(fs01 fs05)
RUNS=(1 2 4 6 8 10)
for N in `seq 1 10`; do mkdir ${DIR}${N}; done
for i in ${SEs[*]}; do
for N in ${RUNS[*]}; do
M=$(( ${N}-1 ))
for n in `seq 1 ${M}`; do
echo $n $N $i;
DIRNAME=${DIR}${N}
FILENAME=${CFG}-${N}_${n}_${i}
echo Config:${FILENAME}
cat ${CFG}_template.py | sed "s|<FILES>|\'${SEDIR}/test_${i}_${N}_${n}-1.root\',\'${SEDIR}/test_${i}_${N}_${n}-2.r
oot\',\'${SEDIR}/test_${i}_${N}_${n}-3.root\',${SEDIR}/test_${i}_${N}_${n}-4.root\' |g " > tmp.py
cat tmp.py | sed "s|<OUTFILE>|out_${i}_${N}_${n}.root|g" > ${FILENAME}.py
dstat -T -c -d -m -n --nocolor --noheaders --output ${DIRNAME}/${FILENAME}_dstat.csv 30 > /dev/null &
PID=$!
echo $PID
sleep 60 && ( /usr/bin/time cmsRun -j ${DIRNAME}/${FILENAME}.xml ${FILENAME}.py ) &> ${DIRNAME}/${FILENAME}.stdout &&\
echo PID is ${PID} && kill ${PID} &
done
n=${N}
DIRNAME=${DIR}${N}
FILENAME=${CFG}-${N}_${n}_${i}
echo Config:${FILENAME}
cat ${CFG}_template.py | sed "s|<FILES>|\'${SEDIR}/test_${i}_${N}_${n}-1.root\',\'${SEDIR}/test_${i}_${N}_${n}-2.\
root\',\'file:${SEDIR}/test_${i}_${N}_${n}-3.root\',\'file:${SEDIR}/test_${i}_${N}_${n}-4.root\' |g " > tmp.py
cat tmp.py | sed "s|<OUTFILE>|out_${i}_${N}_${n}.root|g" > ${FILENAME}.py
dstat -T -c -d -m -n --nocolor --noheaders --output ${DIRNAME}/${FILENAME}_dstat.csv 30 > /dev/null &
PID=$!
echo $PID
sleep 60 && ( /usr/bin/time cmsRun -j ${DIRNAME}/${FILENAME}.xml ${FILENAME}.py ) &> ${DIRNAME}/${FILENAME}.stdout &&\
echo PID is ${PID} && kill ${PID}
sleep 300;
done
done
This ugly script (no time for doing it better) creates a logging directory for each set of parallel jobs, then for each poll runs 1,2,4...10 jobs in parallel, saving also the information retrieved from dstat. Before each run there is one minute of sleeping to let measure the initial conditions, and 300 secs of sleeping among different parallel sets for the same reason (and also having a good view on ganglia plots)
Cfg
PAT
from PhysicsTools.PatAlgos.patTemplate_cfg import *
process.load("PhysicsTools.PatAlgos.patSequences_cff")
from PhysicsTools.PatAlgos.tools.coreTools import *
restrictInputToAOD(process)
from PhysicsTools.PatAlgos.tools.tauTools import *
switchTo31Xdefaults(process)
from PhysicsTools.PatAlgos.tools.jetTools import *
switchJetCollection(process,
cms.InputTag('sisCone5CaloJets'),
doJTA = True,
doBTagging = True,
jetCorrLabel = ('SC5','Calo'),
doType1MET = True,
genJetCollection = cms.InputTag("sisCone5GenJets"),
doJetID = False,
jetIdLabel = "ak5"
)
process.p = cms.Path( process.patDefaultSequence)
process.source.fileNames = [ <FILES> ]
#process.AdaptorConfig = cms.Service("AdaptorConfig",
# tempDir=cms.untracked.string(""),
# cacheHint=cms.untracked.string("auto-detect"),
# readHint=cms.untracked.string("auto-detect")
# )
#process.source.cacheSize = cms.untracked.uint32(20*1024*1024)
process.maxEvents.input = 2000 ## (e.g. -1 to run on all events)
process.out.fileName = '<OUTFILE>' ## (e.g. 'myTuple.root')
process.out.outputCommands = cms.untracked.vstring('drop *' )
process.options.wantSummary = False ## (to suppress the long output at the end of the job)
process.MessageLogger.cerr.threshold = 'ERROR'
process.CPU = cms.Service("CPU",
useJobReport = cms.untracked.bool(True),
reportCPUProperties = cms.untracked.bool(True)
)
import FWCore.ParameterSet.Config as cms
process = cms.Process("Ana")
process.load("FWCore.MessageService.MessageLogger_cfi")
process.maxEvents = cms.untracked.PSet(
input = cms.untracked.int32(10000)
)
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring( <FILES>)
)
process.calo = cms.EDAnalyzer("CaloJetPlotsExample",
JetAlgorithm = cms.string('iterativeCone5CaloJets'),
HistoFileName = cms.string('<OUTFILE>'),
NJets = cms.int32(2)
)
process.p = cms.Path(process.calo)
process.MessageLogger.cerr.FwkReport.reportEvery = 50000
process.AdaptorConfig = cms.Service("AdaptorConfig",
tempDir=cms.untracked.string(""),
cacheHint=cms.untracked.string("lazy-download"),
readHint=cms.untracked.string("auto-detect"))
process.source.cacheSize = cms.untracked.uint32(20*1024*1024)
--
LeonardoSala - 2010-03-10
Go to
previous page /
next page of Tier3 site log