CMS data ordering by Rucio
CMS is using
Rucio
for the data management across sites. Our Tier-3 has the Rucio name
T3_CH_PSI, the Swiss Tier-2 is accordingly named
T2_CH_CSCS.
Users can submit own
requests
by Rucio. Since our sites' data managers want to retain the ability to manage rules, and user rules only can be managed by rucio global CMS administrators, we duplicate such requests into requests using the special site account
t3_ch_psi_local_users using our scripts - the original rule requests will be denied after the new rules have been created. Alternatively you can also just send us (admin mailing list
cms-tier3@lists.psi.ch) a list of data sets or blocks together with the intended
expiry date, and we will order them on your behalf.
- You must specify an expiry date. Rules without expiry date will not be accepted. Please try to pick a minimal date
- transfers are fast, TBs can be transferred within hours to the Tier-3, if they are on disks somewhere on the Grid (some more waiting time will be needed, if the data first needs to be staged from tape). Therefore, data can be refetched.
- Ideally, just specify 1-3 months
- The Tier-3 has limited storage (~1.5 PB) and most of it is consumed by user data. Please try to limit the volume that you bring to the Tier-3.
- The Tier-2 is reasonably well connected (just 5ms latency), so you could also bring some data there and process it from the Tier-3 remotely, or from other sites of the grid, since the Tier-2 provides better bandwidth for other sites than the Tier-3.
You will receive an Email from us containing the rule-IDs that have been generated on your behalf for the
t3_ch_psi_local_users.
Rucio rule deletion campaign (mostly OBSOLETE in 2024 - was for anonymous data without expiry dates from PheDeX days)
This is initially now some work, since we have many data sets still deriving from PhEDEx days, and also many initial rules were entered into the system without any expiration time.
In the folder
/t3home/T3-INFO/rucio-delete/
you will find files containing rucio rules selected for deletion. An email announcing the deletion campaign will specify when the deletions are timed to be executed. Each line contains a
rule name
and a
data identifier
Example:
head /t3home/T3-INFO/rucio-delete/rucio-sync_t3_ch_psi-20221115.txt
7ae06b76de8d44f188c97b193cc1840a cms:/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIFall17NanoAOD-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/NANOAODSIM#32c93a5d-7c9d-4d3d-b769-0f68dae58e53
50e0b0f9e5914a41bbbf8746a122f74e cms:/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIFall17NanoAOD-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/NANOAODSIM#5bc24552-5745-11e8-84f1-02163e018fca
c499fb4bd6fb48ab8a93dc083fce2c42 cms:/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIFall17NanoAOD-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/NANOAODSIM#c9778e32-5456-11e8-84f1-02163e018fca
We will create new rules belonging to the special site account
t3_ch_psi_local_users for data you want to keep. To notify us of data you want to keep, please do the following
- Create a folder
RUCIO-KEEP
in your home directory
mkdir ~/RUCIO-KEEP
- Into your
RUCIO-KEEP
folder you can place files containing a header defining the expiry date, a comment (optional) followed by one line per data identifier. Do not use your own User container
definitions. Use exactly the data sets from the cms:
scope as listed in the source files. Example:
#EXPIRY: 2022-12-24
#COMMENT: I need these data for my analysis XYZ
cms:/MuonEG/Run2017F-31Mar2018-v1/NANOAOD#102d6373-d6ab-4bb4-8e4a-b1f3f42a6dbf
cms:/SingleElectron/Run2017C-31Mar2018-v1/NANOAOD#c837e60a-487d-11e8-8c33-02163e01877e
cms:/Tau/Run2017C-31Mar2018-v1/NANOAOD#4a97700c-3bb9-44ab-8d18-bf4a795fbf3b
Note: each such file will be turned into a single new rule, so there only can be a single expiry time and a single comment. Please use exactly the given date format.
User scope containers will not be accepted, since we need to manage all data under the same rucio account, and we only have a single quota for that account. Users could add datasets at any later date to the containers, but we cannot put a quota on the container. This is a regrettable limitation in regard to how CMS is using rucio.
When the deadline for deletion arrives, we will collect the files in your
~/RUCIO-KEEP
folders and generate rucio requests on behalf of the
t3_ch_psi_local_users account. Only when the new rules are in place will we delete the original rules.