10. 03. 2010 Testing the SEs
- File server setup
- Choosing the cfg
- Setting up the test
- Cfg
  - PAT
  - JetAna
- Results
  - PAT
  - JPE
  - Multinode test

Go to previous page / next page of Tier3 site log

10. 03. 2010 Testing the SEs

We want to test different SE setup, and their behaviour under a real user case. For this, we plan to use CMSSW_3_3_6 and some standard cfgs. The test will be performed running N=1,2,4,6,8,10 parallel jobs, and monitoring the SE performances though dstat (on the wn) and dtrace (on the fs pool).

File server setup

t3fs01: 4 * (9 disk raidz1) + 1 * (8 disk raidz1), disk brand: HITACHI HUA7250S

Show

Hide

root@t3fs01.psi.ch # zpool status
  pool: data1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data1       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
            c4t4d0  ONLINE       0     0     0
            c7t0d0  ONLINE       0     0     0
            c7t4d0  ONLINE       0     0     0
            c6t0d0  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c0t0d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c5t5d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c4t5d0  ONLINE       0     0     0
            c7t1d0  ONLINE       0     0     0
            c7t5d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c5t6d0  ONLINE       0     0     0
            c4t2d0  ONLINE       0     0     0
            c4t6d0  ONLINE       0     0     0
            c7t2d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c7t6d0  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
            c6t6d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c1t6d0  ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
            c0t6d0  ONLINE       0     0     0
            c5t3d0  ONLINE       0     0     0
            c5t7d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c4t3d0  ONLINE       0     0     0
            c4t7d0  ONLINE       0     0     0
            c7t3d0  ONLINE       0     0     0
            c7t7d0  ONLINE       0     0     0
            c6t3d0  ONLINE       0     0     0
            c6t7d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c1t7d0  ONLINE       0     0     0
        spares
          c0t3d0    AVAIL
          c0t7d0    AVAIL

errors: No known data errors

t3fs05: 4 * (11 disk raidz2), disk brand: HITACHI HUA7250S

Show

Hide

bash-4.0# zpool status
  pool: data1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data1       ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c5t4d0  ONLINE       0     0     0
            c8t0d0  ONLINE       0     0     0
            c8t4d0  ONLINE       0     0     0
            c7t0d0  ONLINE       0     0     0
            c7t4d0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c0t0d0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c5t5d0  ONLINE       0     0     0
            c8t1d0  ONLINE       0     0     0
            c8t5d0  ONLINE       0     0     0
            c7t1d0  ONLINE       0     0     0
            c7t5d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
            c6t6d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c5t6d0  ONLINE       0     0     0
            c8t2d0  ONLINE       0     0     0
            c8t6d0  ONLINE       0     0     0
            c7t2d0  ONLINE       0     0     0
            c7t6d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c1t6d0  ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t6d0  ONLINE       0     0     0
            c6t3d0  ONLINE       0     0     0
            c6t7d0  ONLINE       0     0     0
            c5t3d0  ONLINE       0     0     0
            c5t7d0  ONLINE       0     0     0
            c8t3d0  ONLINE       0     0     0
            c8t7d0  ONLINE       0     0     0
            c7t3d0  ONLINE       0     0     0
            c7t7d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c1t7d0  ONLINE       0     0     0
        spares
          c0t3d0    AVAIL
          c0t7d0    AVAIL

errors: No known data errors

Choosing the cfg

The needed cfg must put a nice load on the pool, and also be sensible to raid configuration (random reads). One good candidate is the PAT produces, with statndard CMSSW settings (application-only, cache=0). This can be seens from these preliminary tests (1=fs01, 2= fs05, 3=fs03):

T3_CH_PSI-PAT-MultiTest-Std-dstat-NET_Rx.jpg:

It can be seen that for fs05 (raidz2) performances are degraded. Also, you can see as a lazy-download with cache=20 has a much higher load on the net, but in bursts.

Nevertheless, for putting an high load a JetAnalyzer job is more suited (always in Lazy-download configuration), as it request data with an higher frequency (see https://twiki.cern.ch/twiki/bin/view/Main/LSIOforAnalysisOperations-SingleTests#TestJPE and https://twiki.cern.ch/twiki/pub/Main/LSIOforAnalysisOperations-SingleTests/T3_CH_PSI-JPE-20100220-1_2.png ). This should be less sensitive to underlying raid cfg (bigger read chuncks, to be verified)

So, the proposal is to use:

PAT jobs reading 2k events each (2 root files as source), ~10 mins duration each. This to measure differences in raid setup
JetAnalyzer jobs reading ~10k events each (~8 files) with lazy-download setup, ~5 mins each

The number of events depends on how much long we want the job.

Setting up the test

First of all, we need to place some test root files as source on the desidere pool. We can just use one RECO file:

/store/mc/Summer09/QCD_Pt170-herwig/GEN-SIM-RECO/MC_31X_V3-v1/0002/549DB055-FF7D-DE11-BF9C-00221981BA9B.root

and replicate it in various copies on the pools. One idea should be to use some convention in naming the files, as test_${POOL}_${N}_${n}-${i}.root, where:

${POOL} is the pool where the file is
${N} is the number of total jobs running in parallel
${n} is the number of the job
${i} is the number of the file (from 1 to 8?)

Each file weights ~1GB. A script for doing that is:

MOTHER=/store/mc/Summer09/QCD_Pt170-herwig/GEN-SIM-RECO/MC_31X_V3-v1/0002/549DB055-FF7D-DE11-BF9C-00221981BA9B.root
SON=test.root
SEs=(t3fs07 t3fs08 t3fs09)
RUNS=(1 4 8)

srmcp -2 srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/${MOTHER} file:////tmp/${SON}
for POOL in ${SEs[*]}; do
    for N in ${RUNS[*]}; do
        for n in `seq 1 ${N}`; do
            srmcp -2 file:////tmp/${SON} srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/store/user/leo/TestBed/test_${POOL}_${N}_${n}-1.root
            srmcp -2 file:////tmp/${SON} srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/store/user/leo/TestBed/test_${POOL}_${N}_${n}-2.root
            srmcp -2 file:////tmp/${SON} srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/store/user/leo/TestBed/test_${POOL}_${N}_${n}-3.root
            srmcp -2 file:////tmp/${SON} srm://t3se01.psi.ch:8443/srm/managerv2\?SFN=/pnfs/psi.ch/cms/trivcat/store/user/leo/TestBed/test_${POOL}_${N}_${n}-4.root
        done
    done
done

the pools are populated singularly putting all the fs but the chosen one in read-only (for derek: command?). Now, let's prepare a CMSSW project dir:

$ scram p CMSSW CMSSW_3_3_6

and get the JetAnalyzer code:

$ cd CMSSW_3_3_6/src
$ export SCRAM_ARCH=slc4_ia32_gcc345 && eval `scram ru -sh` && cmsenv
$ scp -r /shome/leo/Installations/Perf_3_3_6/CMSSW_3_3_6/src/RecoJets .
$ scram b

Create a directory for our tests (TestBed) and paste the two cfg at the end of the page into two files called PAT_template.py and JPE_CHLD_CACHE20_template.py. A script will create the cfgs and run the jobs in parallel:

#!/bin/bash

CFG=PAT
DIR=Site.T3_CH_PSI-Label.HT-Setting.ParallelTest-Cfg.${CFG}-Set.
SEDIR=/store/user/leo/Test
SEs=(fs01 fs05)
RUNS=(1 2 4 6 8 10)

for N in `seq 1 10`; do mkdir ${DIR}${N}; done

for i in ${SEs[*]}; do
    for N in ${RUNS[*]}; do
        M=$(( ${N}-1 ))
        for n in `seq 1 ${M}`; do
            echo $n $N $i;
            DIRNAME=${DIR}${N}
            FILENAME=${CFG}-${N}_${n}_${i}
            echo Config:${FILENAME}
            cat ${CFG}_template.py | sed "s|<FILES>|\'${SEDIR}/test_${i}_${N}_${n}-1.root\',\'${SEDIR}/test_${i}_${N}_${n}-2.r
oot\',\'${SEDIR}/test_${i}_${N}_${n}-3.root\',${SEDIR}/test_${i}_${N}_${n}-4.root\' |g " > tmp.py
            cat  tmp.py | sed "s|<OUTFILE>|out_${i}_${N}_${n}.root|g"  > ${FILENAME}.py
            dstat -T -c -d -m -n --nocolor --noheaders --output ${DIRNAME}/${FILENAME}_dstat.csv 30 > /dev/null &
            PID=$!
            echo $PID
            sleep 60 && ( /usr/bin/time cmsRun -j ${DIRNAME}/${FILENAME}.xml ${FILENAME}.py ) &> ${DIRNAME}/${FILENAME}.stdout &&\
                echo PID is ${PID} && kill ${PID} &
        done
        n=${N}
        DIRNAME=${DIR}${N}
        FILENAME=${CFG}-${N}_${n}_${i}
        echo Config:${FILENAME}
        cat ${CFG}_template.py | sed "s|<FILES>|\'${SEDIR}/test_${i}_${N}_${n}-1.root\',\'${SEDIR}/test_${i}_${N}_${n}-2.\
root\',\'file:${SEDIR}/test_${i}_${N}_${n}-3.root\',\'file:${SEDIR}/test_${i}_${N}_${n}-4.root\' |g " > tmp.py
        cat  tmp.py | sed "s|<OUTFILE>|out_${i}_${N}_${n}.root|g"  > ${FILENAME}.py
        dstat -T -c -d -m -n --nocolor --noheaders --output ${DIRNAME}/${FILENAME}_dstat.csv 30 > /dev/null &
        PID=$!
        echo $PID
        sleep 60 && ( /usr/bin/time cmsRun -j ${DIRNAME}/${FILENAME}.xml ${FILENAME}.py ) &> ${DIRNAME}/${FILENAME}.stdout &&\
            echo PID is ${PID} && kill ${PID}

        sleep 300;

    done
done

This ugly script (no time for doing it better) creates a logging directory for each set of parallel jobs, then for each poll runs 1,2,4...10 jobs in parallel, saving also the information retrieved from dstat. Before each run there is one minute of sleeping to let measure the initial conditions, and 300 secs of sleeping among different parallel sets for the same reason (and also having a good view on ganglia plots)

Cfg

PAT

from PhysicsTools.PatAlgos.patTemplate_cfg import *
process.load("PhysicsTools.PatAlgos.patSequences_cff")
from PhysicsTools.PatAlgos.tools.coreTools import *
restrictInputToAOD(process)
from PhysicsTools.PatAlgos.tools.tauTools import *
switchTo31Xdefaults(process)
from PhysicsTools.PatAlgos.tools.jetTools import *
switchJetCollection(process,
                                        cms.InputTag('sisCone5CaloJets'),
                                        doJTA            = True,
                                        doBTagging       = True,
                                        jetCorrLabel     = ('SC5','Calo'),
                                        doType1MET       = True,
                                        genJetCollection = cms.InputTag("sisCone5GenJets"),
                                        doJetID      = False,
                                        jetIdLabel   = "ak5"
                                        )
process.p = cms.Path(    process.patDefaultSequence)
process.source.fileNames = [ <FILES>    ]
#important, as the files are actually the same
process.source.duplicateCheckMode = cms.untracked.string('noDuplicateCheck')
#process.AdaptorConfig = cms.Service("AdaptorConfig",
#                                    tempDir=cms.untracked.string(""),
#                                    cacheHint=cms.untracked.string("auto-detect"),
#                                    readHint=cms.untracked.string("auto-detect")
#                                    )
#process.source.cacheSize = cms.untracked.uint32(20*1024*1024)
process.maxEvents.input = 2000         ##  (e.g. -1 to run on all events)
process.out.fileName = '<OUTFILE>'            ##  (e.g. 'myTuple.root')
process.out.outputCommands = cms.untracked.vstring('drop *' )
process.options.wantSummary = False       ##  (to suppress the long output at the end of the job)
process.MessageLogger.cerr.threshold = 'ERROR'
process.CPU = cms.Service("CPU",
                          useJobReport = cms.untracked.bool(True),
                          reportCPUProperties = cms.untracked.bool(True)
                          )

JetAna

import FWCore.ParameterSet.Config as cms
process = cms.Process("Ana")
process.load("FWCore.MessageService.MessageLogger_cfi")
process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32(10000)
)
process.source = cms.Source("PoolSource",
    fileNames = cms.untracked.vstring( <FILES>)
)
#important, as the files are actually the same
process.source.duplicateCheckMode = cms.untracked.string('noDuplicateCheck')

process.calo = cms.EDAnalyzer("CaloJetPlotsExample",
    JetAlgorithm  = cms.string('iterativeCone5CaloJets'),
    HistoFileName = cms.string('<OUTFILE>'),
    NJets         = cms.int32(2)
)
process.p = cms.Path(process.calo)
process.MessageLogger.cerr.FwkReport.reportEvery = 50000
process.AdaptorConfig = cms.Service("AdaptorConfig",
                                    tempDir=cms.untracked.string(""),
                                    cacheHint=cms.untracked.string("lazy-download"),
                                    readHint=cms.untracked.string("auto-detect"))
process.source.cacheSize = cms.untracked.uint32(20*1024*1024)

Results

Three settings have been reproduced on the new SUN thor servers:

fs07: 5*9 disk raidz2
fs08: 4*11 disk raidz2
fs09: 5*9 disk raidz1

No differences between the three settings has been noticed: the fs05 issue must be something else.

PAT

T3_CH_PSI-PAT-8-tstoragefile-read-total-msecs_summary.jpg

READ	PAT t3fs07 8	PAT t3fs08 8	PAT t3fs09 8
	PAT t3fs07 8	PAT t3fs08 8	PAT t3fs09 8
Success	100.0% (8 / 8)	100.0% (8 / 8)	100.0% (8 / 8)
CpuPercentage	67.25 +- 3.60	62.25 +- 6.42	61.12 +- 5.69
ExeTime	370.66 +- 26.56	407.89 +- 46.64	412.87 +- 47.15
SysTime	14.46 +- 0.86	14.03 +- 0.63	13.89 +- 1.09
UserTime	235.59 +- 5.99	238.67 +- 7.03	237.44 +- 5.95
dstat-CPU_Idle	75.63 +- 12.39	75.36 +- 11.81	75.59 +- 11.57
dstat-CPU_Sys	1.31 +- 4.11	1.30 +- 3.93	1.28 +- 3.89
dstat-CPU_User	22.80 +- 12.53	23.07 +- 12.19	22.87 +- 11.98
dstat-CPU_Wait	0.01 +- 0.02	0.01 +- 0.01	0.01 +- 0.01
dstat-DISK_Read	12.96 +- 49.24	9.19 +- 35.59	7.09 +- 27.69
dstat-MEM_Buff	12960582.55 +- 62505.48	13321728.00 +- 61166.32	13622508.31 +- 56475.68
dstat-MEM_Cached	393981887.50 +- 726126.03	397896352.00 +- 767088.32	402788572.55 +- 809128.94
dstat-MEM_Free	14351637753.95 +- 1109783685.54	14397225120.00 +- 1097276377.21	14389759432.86 +- 1096122636.08
dstat-MEM_Used	2048876544.00 +- 1109566195.82	1999013568.00 +- 1097075940.95	2001286254.28 +- 1095945351.33
dstat-NET_Rx	15005.01 +- 10105.73	14760.10 +- 10027.20	14605.70 +- 9828.05
dstat-NET_Tx	328.09 +- 189.88	317.39 +- 183.78	316.66 +- 182.22
_custom-read-actual-MB_per_operation_	8.61e-03 +- 0.00e+00	8.61e-03 +- 0.00e+00	8.61e-03 +- 0.00e+00
tstoragefile-read-actual-total-megabytes	141.00 +- 0.00	141.00 +- 0.00	141.00 +- 0.00
tstoragefile-read-total-megabytes	141.00 +- 0.00	141.00 +- 0.00	141.00 +- 0.00
_dcap-read-total-megabytes_	141.00 +- 0.00	141.00 +- 0.00	141.00 +- 0.00
tstoragefile-read-actual-total-msecs	105301.30 +- 20406.54	137884.17 +- 40249.28	144060.64 +- 41293.75
tstoragefile-read-total-msecs	105321.21 +- 20407.19	137904.55 +- 40249.75	144081.04 +- 41294.11
_dcap-read-total-msecs_	105277.89 +- 20405.69	137859.70 +- 40248.57	144036.34 +- 41293.40
tstoragefile-read-actual-num-operations	16373.00 +- 0.00	16373.00 +- 0.00	16373.00 +- 0.00
tstoragefile-read-actual-num-successful-operations	16373.00 +- 0.00	16373.00 +- 0.00	16373.00 +- 0.00
tstoragefile-read-num-operations	16373.00 +- 0.00	16373.00 +- 0.00	16373.00 +- 0.00
_dcap-read-num-operations_	16373.00 +- 0.00	16373.00 +- 0.00	16373.00 +- 0.00
tstoragefile-read-num-successful-operations	16373.00 +- 0.00	16373.00 +- 0.00	16373.00 +- 0.00
_dcap-read-num-successful-operations_	16373.00 +- 0.00	16373.00 +- 0.00	16373.00 +- 0.00

JPE

Four jobs were enough to saturate the bandwidth, no differences between the settings:

T3_CH_PSI-JPE_RHAUTO_CHLD_CACHE20-4-ExeTime_summary.jpg

Multinode test

A big load test has been performed, running 8 parallel jobs from wn22-28, for a total of 56 total jobs for each fs0X. No differences have been noticed.

The difference in fs09 is due to not all the jobs started properly... anyway, no difference can be seen, after repeating the test:

The same holds also for PAT jobs:

-- LeonardoSala - 2010-03-10

Go to previous page / next page of Tier3 site log

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who
png	MultinodeTest-JPE-8Jobs-fs07_08_09.png	r1	manage	52.6 K	2010-03-19 - 14:31	LeonardoSala
png	MultinodeTest-JPE-8Jobs-fs09-2.png	r1	manage	37.4 K	2010-03-19 - 14:33	LeonardoSala
png	MultinodeTest-PAT-8Jobs-fs07_08_09.png	r1	manage	49.0 K	2010-03-19 - 14:33	LeonardoSala
jpg	T3_CH_PSI-JPE_RHAUTO_CHLD_CACHE20-4-ExeTime_summary.jpg	r1	manage	15.8 K	2010-03-19 - 14:38	LeonardoSala
jpg	T3_CH_PSI-JPE_RHAUTO_CHLD_CACHE20-4-dstat-NET_Rx.jpg	r1	manage	18.3 K	2010-03-19 - 14:38	LeonardoSala
jpg	T3_CH_PSI-JPE_RHAUTO_CHLD_CACHE20-4-tstoragefile-read-total-msecs_summary.jpg	r1	manage	14.3 K	2010-03-19 - 14:39	LeonardoSala
jpg	T3_CH_PSI-PAT-8-ExeTime_summary.jpg	r1	manage	14.2 K	2010-03-19 - 14:18	LeonardoSala
jpg	T3_CH_PSI-PAT-8-dstat-CPU_User.jpg	r1	manage	19.2 K	2010-03-19 - 14:18	LeonardoSala
jpg	T3_CH_PSI-PAT-8-dstat-NET_Rx.jpg	r1	manage	19.7 K	2010-03-19 - 14:18	LeonardoSala
jpg	T3_CH_PSI-PAT-8-tstoragefile-read-total-msecs_summary.jpg	r1	manage	15.0 K	2010-03-19 - 14:18	LeonardoSala
jpg	T3_CH_PSI-PAT-MultiTest-Std-dstat-NET_Rx.jpg	r1	manage	18.9 K	2010-03-10 - 13:28	LeonardoSala

Topic revision: r6 - 2010-03-19 - LeonardoSala

CmsTier3

User Pages
Main Page
Policies

Physics Groups
Steering Board Meetings

Admin Pages
AdminArea
Cluster Specs