How to efficiently search into /pnfs
To quickly search for
/pnfs
files and dirs and their metadata the T3 users can directly query two
PostgreSQL views with
the limitation that the information presented there will be in the worst case 12h old ; this limitation is absolutely indifferent in basically all the practical cases and
dc_find
will run faster than searching in
/pnfs
by
cd
,
ls
,
find
; so we encourage the T3 users to pay a try to
dc_find
dc_find
dc_find
needs the T3 user password and it has many options because of its flexibility but most of them have a
default value that can be left as it is.
$ dc_find -h
usage: dc_find [-h] [-v] [-p PATH] [-s NAME] [-as ATIME_START] [-ae ATIME_END]
[-cs CTIME_START] [-ce CTIME_END] [-min SIZE_MIN] [-max SIZE_MAX]
[-d DEPTH] [-u UID] [-g GID] [-oat | -oct | -ou | -og | -os]
[-oa] [-od] [-l LIMIT] [-f PNFS_FILE_OUTPUT] [--db DBNAME]
[--dbuser DBUSER] [--dbhost DBHOST] [--dbport DBPORT]
The tool partially replicates the 'find' command in a dCache environment
optional arguments:
-h, --help show this help message and exit
-v, --verbose verbose
-p PATH, --pnfspath PATH
The /pnfs dir to be used to start the search; if not specified and the user is inside a /pnfs dir then the search will start from there.
-s NAME, --string NAME
A string to be searched among the /pnfs files and dirs names; default '*'
-as ATIME_START, --access-time-start ATIME_START
files or dirs accessed after this time; default "1999-01-01 00:00:00"
-ae ATIME_END, --access-time-end ATIME_END
files or dirs accessed before this time; default "2999-01-01 00:00:00"
-cs CTIME_START, --creation-time-start CTIME_START
files or dirs created after this time; default "1999-01-01 00:00:00"
-ce CTIME_END, --creation-time-end CTIME_END
files or dirs created before this time; default "2999-01-01 00:00:00"
-min SIZE_MIN, --min-size-MB SIZE_MIN
files bigger than this size; default 0
-max SIZE_MAX, --max-size-MB SIZE_MAX
files smaller than this size; default 1000000
-d DEPTH, --max-depth DEPTH
stop the search at this depth; default 1000
-u UID, --uid UID files or dirs that belong to this uid; default '*'
-g GID, --gid GID files or dirs that belong to this gid; default '*'
-oat, --order-by-access-time
-oct, --order-by-creation-time
-ou, --order-by-uid
-og, --order-by-gid
-os, --order-by-size
-oa, --order-ascendant
-od, --order-descendant
-l LIMIT, --limit-result LIMIT
-f PNFS_FILE_OUTPUT, --file PNFS_FILE_OUTPUT
output file; default dc_find.pnfs.txt
--db DBNAME the database name; default chimera
--dbuser DBUSER the database user name used to authenticate; default your Linux account name
--dbhost DBHOST the database host; default t3dcachedb03.psi.ch
--dbport DBPORT the database port; default 5432
examples:
dc_find --file /tmp/dc_find.out -p /pnfs/psi.ch/cms/trivcat/store/user/bianchi/ --string t-channel_TuneZ2star --order-by-access-time
dc_find --string .root -p /pnfs/ --order-by-uid -l 100
dc_find --string .root -p /pnfs/ --access-time-start "2014-01-01 00:00:00" -l 100
Follows a
dc_find
example that involves an arbitrary
/pnfs
dir where the user entered by
cd
:
$ pwd
/pnfs/psi.ch/cms/trivcat/store/user/ursl
$ dc_find --string .root --order-by-size --order-desc --limit 1000 --file $HOME/dc_find.out
Password:
$ head $HOME/dc_find.out
+-------+-----+-----+-----+----------------------------------+----------------------------------+------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| depth | MB | uid | gid | ctime | atime | mode | adler32 | /pnfs | pools |
+-------+-----+-----+-----+----------------------------------+----------------------------------+------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| 11 | 422 | 584 | 538 | 2013-11-08 17:16:27.816000+01:00 | 2014-01-10 15:41:58.263000+01:00 | 644 | ce688b67 | /pnfs/psi.ch/cms/trivcat/store/user/yangyong/data/reco/v6/TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM/analysis.TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM_3_1_nxT.root | t3fs14_cms_8 |
| 11 | 482 | 584 | 538 | 2013-11-08 17:16:27.695000+01:00 | 2014-01-13 10:49:45.009000+01:00 | 644 | f50c2466 | /pnfs/psi.ch/cms/trivcat/store/user/yangyong/data/reco/v6/TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM/analysis.TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM_13_1_HsS.root | t3fs14_cms_11 |
...
View v_pnfs
The advanced T3 users can directly use
psql to interact with the
PostgreSQL view
v_pnfs
that reports each
/pnfs
file and dir; we highlighted the
columns that are interesting for these users:
We list some common
psql queries, but
it's up to the T3 users to write, save and regularly run their specific queries.
Ho many GB am I globally using in /pnfs
?
BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where iuid=$(id -u);"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where iuid=`id -u`;"
Ho many GB is my group globally using in /pnfs
?
BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=$(id -g);"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=`id -g`;"
Ho many GB is my group using in an arbitrary /pnfs
dir, e.g. /pnfs/psi.ch/cms/trivcat/store/*
?
BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=$(id -g) and path like '/pnfs/psi.ch/cms/trivcat/store/%';"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=`id -g` and path like '/pnfs/psi.ch/cms/trivcat/store/%';"
Ho many GB is my group using in /pnfs/psi.ch/cms/trivcat/store/*
that weren't read since 2014-01-01 ? You can specify whatever
iatime
date.
BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=$(id -g) and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=`id -g` and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "
How to save in a file all the /pnfs
files involved in the previous query? Here the files list is going to be saved in
$PWD/pnfsfiles
.
BASH $ psql -o $PWD/pnfsfiles -d chimera -h t3dcachedb03.psi.ch -A -q --command="select path,isize as bytes,iatime,iuid,igid from v_pnfs where igid=$(id -g) and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "
TCSH $ psql -o $PWD/pnfsfiles -d chimera -h t3dcachedb03.psi.ch -A -q --command="select path,isize as bytes,iatime,iuid,igid from v_pnfs where igid=`id -g` and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "
$ head -1 pnfsfiles
path|bytes|iatime|iuid|igid
Ho many TB are the T3 groups using in /pnfs/psi.ch/cms/trivcat/store/*
that weren't read since 2014-01-01 ?
BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000/1000) as TB,igid from v_pnfs where path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime < '2014-01-01' group by igid order by TB desc ;"
TB | igid
-----+------
176 | 500 # common data
129 | 533 # ethz-susy
23 | 536 # uniz-bphys
19 | 531 # ethz-ewk
17 | 534 # psi-bphys
8 | 538 # uniz-pixel
5 | 532 # ethz-higgs
3 | 530 # ethz-bphys
0 | 535 # psi-pixel
0 | 537 # uniz-higgs
Who did upload more GB during the last month ( without considering files that were deleted ) ?
BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB,iuid from v_pnfs where ictime > date_trunc('day', NOW() - interval '1 month') group by iuid order by GB desc ;"
Password:
GB | iuid
------+------
7401 | 521 # ursl
4451 | 607 # mwang
3540 | 585 # ...
2587 | 584
2466 | 603
...
How to list the /pnfs
files such that .. Here we retrieve a list of files satisfying many different constraints in parallel; a this stage the SQL code should be self-explanatory.
BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select path,iatime,isize from v_pnfs where path like '/pnfs/psi.ch/cms/trivcat/store/user/ursl/%.gz' and iatime < '2014-03-01' and iatime > '2014-01-01' order by iatime limit 10 ;"
Password:
path | iatime | isize
-------------------------------------------------------------------------------------+----------------------------+------------
/pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_0.tar.gz | 2014-02-27 13:33:15.222+01 | 1127148238
/pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_1.tar.gz | 2014-02-27 13:34:17.122+01 | 189441386
/pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_3.tar.gz | 2014-02-27 13:35:12.52+01 | 950718190
/pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_4.tar.gz | 2014-02-27 13:35:36.329+01 | 947247447
(4 rows)
As the queries get complex and recurring is better to save them in a file: so here we prepare a
query file and we run it:
$ cat query
select trunc(sum(isize)/1000/1000/1000) as GB,
ictime::timestamp::date as day
from v_pnfs
where path like '%PU_S10_START53_V7D-v3%root'
and ictime > '2011-01-01'
group by day
order by day asc;
BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q -f query
gb | day
------+------------
1071 | 2013-02-01
4130 | 2013-02-02
3316 | 2013-02-03
1190 | 2013-02-04
43 | 2013-11-11
46 | 2013-11-12
63 | 2013-11-18
292 | 2013-11-19
28 | 2013-11-20
Which are the /pnfs
dirs where have been uploaded more GB during the last month ?
Here we combine together the expressing power of SQL with the
regular expressions; the metadata of a
/pnfs
dir is
512
bytes so by excluding the rows with isize =
512
we're basically considering only the
/pnfs
files. The expression
[^\/]*$ matches the file name that will be erased by the
regexp_replace.
$ cat query
select trunc(sum(isize)/1000/1000/1000) as GB,
regexp_replace(path, '[^\/]*$', '') as dir
from v_pnfs
where ictime > date_trunc('day', NOW() - interval '1 month')
and isize != 512
group by dir
order by GB desc;
BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q -f query
gb | dir
------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1995 | /pnfs/psi.ch/cms/trivcat/store/user/yangyong/data/reco/tree/v5_13/
1246 | /pnfs/psi.ch/cms/trivcat/store/user/jngadiub/Thea/QCD_defaultCones/
1042 | /pnfs/psi.ch/cms/trivcat/store/user/mwang/data/SingleElectron/EDBR_PATtuple_edbr_wh_20140210_SingleElectron_Run2012D-22Jan2013-v1/c2d529e1c78e50623ca40825abf53f99/
902 | /pnfs/psi.ch/cms/trivcat/store/user/mwang/data/SingleElectron/EDBR_PATtuple_edbr_wh_20140210_SingleElectron_Run2012C-22Jan2013-v1/c2d529e1c78e50623ca40825abf53f99/
Which are the /pnfs
dirs where I/my group have/has uploaded more GB during the last month ?
Add to the previous query either this contraint
and iuid = YOUR_UID
or this contraint:
and igid = YOUR_GID
View v_pnfs_top_dirs
NOTE The simplest ways to consult the preordered
/pnfs
dirs list consists of:
The view
v_pnfs_top_dirs
reports each
/pnfs
dirs ( not the files )
preordered from the biggest dirs to the smallest dirs:
typically a T3 user will extract from the view
v_pnfs_top_dirs
all the dirs involving his/her
uid
, or his/her
gid
, by:
BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select dir_du_gb,path,ictime,iatime from v_pnfs_top_dirs where iuid=$(id -u) ;"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select dir_du_gb,path,ictime,iatime from v_pnfs_top_dirs where iuid=`id -u` ;"
dir_du_gb | path | ictime | iatime
-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+----------------------------
3979 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012D/DoubleElectron/RAW-RECO/ZElectron-22Jan2013-v1/10000 | 2013-11-15 17:48:01.372+01 | 2013-11-14 11:13:55.999+01
3976 | /pnfs/psi.ch/cms/trivcat/store/data/Run2011A/MuOnia/AOD/08Nov2011-v1/0001 | 2012-12-07 21:24:37.456+01 | 2012-12-07 21:24:37.456+01
3964 | /pnfs/psi.ch/cms/trivcat/store/data/Run2011A/MuOnia/AOD/08Nov2011-v1/0000 | 2012-12-07 21:24:37.456+01 | 2012-12-07 21:24:37.456+01
3953 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012C/DoubleElectron/RAW-RECO/ZElectron-22Jan2013-v1/20000 | 2013-06-01 13:56:01.777+02 | 2013-05-30 18:28:18.34+02
3935 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012B/MuOnia/AOD/13Jul2012-v1/0000 | 2012-11-30 17:15:09.289+01 | 2012-11-12 19:55:24+01
3899 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012B/MuOnia/AOD/13Jul2012-v1/0001 | 2012-11-30 17:15:21.191+01 | 2012-11-11 11:56:39+01
3877 | /pnfs/psi.ch/cms/trivcat/store/data/Run2011B/MuOnia/AOD/19Nov2011-v1/0000 | 2012-12-08 23:27:58.153+01 | 2012-12-08 23:27:58.153+01
or a query based on a string present in the dir name, e.g.
*Run2011A* :
BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select dir_du_gb,path,ictime,iatime from v_pnfs_top_dirs where path like '%Run2011A%' and iuid=$(id -u);"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select dir_du_gb,path,ictime,iatime from v_pnfs_top_dirs where path like '%Run2011A%' and iuid=`id -u`;"
Concluding, by regularly paying attention the view v_pnfs_top_dirs
all the T3 users are kindly invited to proactively delete their big and unnecessary /pnfs
dirs: you can also simply contact us by e-mail to get removed a /pnfs
dir and we'll remove it on your behalf.