Searching into /pnfs by dc_find
To search into
/pnfs
the T3 users can use
dc_find
with
the limitation that the information retrieved will be in the worst case 24h old ;
dc_find
will run faster than searching into
/pnfs
by the usual Linux commands
cd
,
ls
,
find
and it offers several
--order-by-*
options.
dc_find
dc_find
needs the T3 user password and it offers many options with a reasonable
default value that can be safely left as it is.
$ dc_find -h
usage: dc_find [-h] [-v] [-p PATH] [-s NAME] [-as ATIME_START] [-ae ATIME_END]
[-cs CTIME_START] [-ce CTIME_END] [-min SIZE_MIN] [-max SIZE_MAX]
[-d DEPTH] [-u UID] [-g GID] [-oat | -oct | -ou | -og | -os]
[-oa] [-od] [-l LIMIT] [-f PNFS_FILE_OUTPUT] [--db DBNAME]
[--dbuser DBUSER] [--dbhost DBHOST] [--dbport DBPORT]
dc_find partially replicates the 'find' command in a dCache environment
optional arguments:
-h, --help show this help message and exit
-v, --verbose verbose
-p PATH, --pnfspath PATH
The /pnfs dir to be used to start the search; if not specified and the user is inside a /pnfs dir then the search will start from there.
-s NAME, --string NAME
A string to be searched among the /pnfs files and dirs names; default '*'
-as ATIME_START, --access-time-start ATIME_START
files or dirs accessed after this time; default "1999-01-01 00:00:00"
-ae ATIME_END, --access-time-end ATIME_END
files or dirs accessed before this time; default "2999-01-01 00:00:00"
-cs CTIME_START, --creation-time-start CTIME_START
files or dirs created after this time; default "1999-01-01 00:00:00"
-ce CTIME_END, --creation-time-end CTIME_END
files or dirs created before this time; default "2999-01-01 00:00:00"
-min SIZE_MIN, --min-size-MB SIZE_MIN
files bigger than this size; default 0
-max SIZE_MAX, --max-size-MB SIZE_MAX
files smaller than this size; default 1000000
-d DEPTH, --max-depth DEPTH
stop the search at this depth; default 1000
-u UID, --uid UID files or dirs that belong to this uid; default '*'
-g GID, --gid GID files or dirs that belong to this gid; default '*'
-oat, --order-by-access-time
-oct, --order-by-creation-time
-ou, --order-by-uid
-og, --order-by-gid
-os, --order-by-size
-oa, --order-ascending
-od, --order-descending
-l LIMIT, --limit-result LIMIT
-f PNFS_FILE_OUTPUT, --file PNFS_FILE_OUTPUT
output file; default dc_find.pnfs.txt
--db DBNAME the database name; default chimera
--dbuser DBUSER the database user name used to authenticate; default your Linux account name
--dbhost DBHOST the database host; default t3dcachedb03.psi.ch
--dbport DBPORT the database port; default 5432
dc_find examples:
dc_find --file /tmp/dc_find.out -p /pnfs/psi.ch/cms/trivcat/store/user/bianchi/ --string t-channel_TuneZ2star --order-by-access-time
dc_find --string .root -p /pnfs/ --order-by-uid -l 100
dc_find --string .root -p /pnfs/ --access-time-start "2014-01-01 00:00:00" -l 100
Following a
dc_find
example involving a generic
/pnfs
dir where the T3 user entered by
cd
:
$ pwd
/pnfs/psi.ch/cms/trivcat/store/user/ursl
$ dc_find --string .root --order-by-size --order-desc --limit 1000 --file $HOME/dc_find.out
Password:
$ head $HOME/dc_find.out
+-------+-----+-----+-----+----------------------------------+----------------------------------+------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| depth | MB | uid | gid | ctime | atime | mode | adler32 | /pnfs | pools |
+-------+-----+-----+-----+----------------------------------+----------------------------------+------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| 11 | 422 | 584 | 538 | 2013-11-08 17:16:27.816000+01:00 | 2014-01-10 15:41:58.263000+01:00 | 644 | ce688b67 | /pnfs/psi.ch/cms/trivcat/store/user/yangyong/data/reco/v6/TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM/analysis.TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM_3_1_nxT.root | t3fs14_cms_8 |
| 11 | 482 | 584 | 538 | 2013-11-08 17:16:27.695000+01:00 | 2014-01-13 10:49:45.009000+01:00 | 644 | f50c2466 | /pnfs/psi.ch/cms/trivcat/store/user/yangyong/data/reco/v6/TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM/analysis.TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM_13_1_HsS.root | t3fs14_cms_11 |
...
View v_pnfs
The advanced T3 users can also directly query by the CLI
psql the
PostgreSQL view
v_pnfs
used by
dc_find
; we highlighted the
columns that are usually interesting for these users:
We list some common
psql queries, but
it's up to the T3 users to write, save and regularly run their specific queries.
How many GB am I globally using in /pnfs
?
BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where iuid=$(id -u);"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where iuid=`id -u`;"
How many GB is my group globally using in /pnfs
?
BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=$(id -g);"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=`id -g`;"
How many GB is my group using in an arbitrary /pnfs
dir, e.g. /pnfs/psi.ch/cms/trivcat/store/*
?
BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=$(id -g) and path like '/pnfs/psi.ch/cms/trivcat/store/%';"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=`id -g` and path like '/pnfs/psi.ch/cms/trivcat/store/%';"
How many GB is my group using in /pnfs/psi.ch/cms/trivcat/store/*
that weren't read since 2014-01-01 ? You can specify whatever
iatime
date.
BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=$(id -g) and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=`id -g` and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "
How to save in a file all the /pnfs
files involved in the previous query? Here the files list is going to be saved in
$PWD/pnfsfiles
.
BASH $ psql -o $PWD/pnfsfiles -d chimera -h t3dcachedb03.psi.ch -A -q --command="select path,isize as bytes,iatime,iuid,igid from v_pnfs where igid=$(id -g) and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "
TCSH $ psql -o $PWD/pnfsfiles -d chimera -h t3dcachedb03.psi.ch -A -q --command="select path,isize as bytes,iatime,iuid,igid from v_pnfs where igid=`id -g` and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "
$ head -1 pnfsfiles
path|bytes|iatime|iuid|igid
How many TB are the T3 groups using in /pnfs/psi.ch/cms/trivcat/store/*
that weren't read since 2014-01-01 ?
BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000/1000) as TB,igid from v_pnfs where path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime < '2014-01-01' group by igid order by TB desc ;"
TB | igid
-----+------
176 | 500 # common data
129 | 533 # ethz-susy
23 | 536 # uniz-bphys
19 | 531 # ethz-ewk
17 | 534 # psi-bphys
8 | 538 # uniz-pixel
5 | 532 # ethz-higgs
3 | 530 # ethz-bphys
0 | 535 # psi-pixel
0 | 537 # uniz-higgs
Who did upload more GB during the last month ( without considering files that were deleted ) ?
BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB,iuid from v_pnfs where ictime > date_trunc('day', NOW() - interval '1 month') group by iuid order by GB desc ;"
Password:
GB | iuid
------+------
7401 | 521 # ursl
4451 | 607 # mwang
3540 | 585 # ...
2587 | 584
2466 | 603
...
How to list the /pnfs
files such that .. Here we retrieve a list of files satisfying many different constraints in parallel; a this stage the SQL code should be self-explanatory.
BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select path,iatime,isize from v_pnfs where path like '/pnfs/psi.ch/cms/trivcat/store/user/ursl/%.gz' and iatime < '2014-03-01' and iatime > '2014-01-01' order by iatime limit 10 ;"
Password:
path | iatime | isize
-------------------------------------------------------------------------------------+----------------------------+------------
/pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_0.tar.gz | 2014-02-27 13:33:15.222+01 | 1127148238
/pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_1.tar.gz | 2014-02-27 13:34:17.122+01 | 189441386
/pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_3.tar.gz | 2014-02-27 13:35:12.52+01 | 950718190
/pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_4.tar.gz | 2014-02-27 13:35:36.329+01 | 947247447
(4 rows)
As the queries get complex and recurring is better to save them in a file: so here we prepare a
query file and we run it:
$ cat query
select trunc(sum(isize)/1000/1000/1000) as GB,
ictime::timestamp::date as day
from v_pnfs
where path like '%PU_S10_START53_V7D-v3%root'
and ictime > '2011-01-01'
group by day
order by day asc;
BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q -f query
gb | day
------+------------
1071 | 2013-02-01
4130 | 2013-02-02
3316 | 2013-02-03
1190 | 2013-02-04
43 | 2013-11-11
46 | 2013-11-12
63 | 2013-11-18
292 | 2013-11-19
28 | 2013-11-20
Which are the /pnfs
dirs where have been uploaded more GB during the last month ?
Here we combine together the expressing power of SQL with the
regular expressions; the metadata of a
/pnfs
dir is
512
bytes so by excluding the rows with isize =
512
we're basically considering only the
/pnfs
files. The expression
[^\/]*$ matches the file name that will be erased by the
regexp_replace.
$ cat query
select trunc(sum(isize)/1000/1000/1000) as GB,
regexp_replace(path, '[^\/]*$', '') as dir
from v_pnfs
where ictime > date_trunc('day', NOW() - interval '1 month')
and isize != 512
group by dir
order by GB desc;
BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q -f query
gb | dir
------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1995 | /pnfs/psi.ch/cms/trivcat/store/user/yangyong/data/reco/tree/v5_13/
1246 | /pnfs/psi.ch/cms/trivcat/store/user/jngadiub/Thea/QCD_defaultCones/
1042 | /pnfs/psi.ch/cms/trivcat/store/user/mwang/data/SingleElectron/EDBR_PATtuple_edbr_wh_20140210_SingleElectron_Run2012D-22Jan2013-v1/c2d529e1c78e50623ca40825abf53f99/
902 | /pnfs/psi.ch/cms/trivcat/store/user/mwang/data/SingleElectron/EDBR_PATtuple_edbr_wh_20140210_SingleElectron_Run2012C-22Jan2013-v1/c2d529e1c78e50623ca40825abf53f99/
Which are the /pnfs
dirs where I/my group have/has uploaded more GB during the last month ?
Add to the previous query either this contraint
and iuid = YOUR_UID
or this contraint:
and igid = YOUR_GID
View v_pnfs_top_dirs
The view
v_pnfs_top_dirs
reports each
/pnfs
dirs ( not the files )
already preordered from the biggest dir to the smallest dir:
The simplest ways to use
v_pnfs_top_dirs
are:
A T3 user can also extract from the view
v_pnfs_top_dirs
all the dirs involving his/her
uid
, or his/her
gid
, for instance by:
BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select totmb,path,ictime,iatime from v_pnfs_top_dirs where iuid=$(id -u) ;"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select totmb,path,ictime,iatime from v_pnfs_top_dirs where iuid=`id -u` ;"
totmb | path | ictime | iatime
-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+----------------------------
3979 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012D/DoubleElectron/RAW-RECO/ZElectron-22Jan2013-v1/10000 | 2013-11-15 17:48:01.372+01 | 2013-11-14 11:13:55.999+01
3976 | /pnfs/psi.ch/cms/trivcat/store/data/Run2011A/MuOnia/AOD/08Nov2011-v1/0001 | 2012-12-07 21:24:37.456+01 | 2012-12-07 21:24:37.456+01
3964 | /pnfs/psi.ch/cms/trivcat/store/data/Run2011A/MuOnia/AOD/08Nov2011-v1/0000 | 2012-12-07 21:24:37.456+01 | 2012-12-07 21:24:37.456+01
3953 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012C/DoubleElectron/RAW-RECO/ZElectron-22Jan2013-v1/20000 | 2013-06-01 13:56:01.777+02 | 2013-05-30 18:28:18.34+02
3935 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012B/MuOnia/AOD/13Jul2012-v1/0000 | 2012-11-30 17:15:09.289+01 | 2012-11-12 19:55:24+01
3899 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012B/MuOnia/AOD/13Jul2012-v1/0001 | 2012-11-30 17:15:21.191+01 | 2012-11-11 11:56:39+01
3877 | /pnfs/psi.ch/cms/trivcat/store/data/Run2011B/MuOnia/AOD/19Nov2011-v1/0000 | 2012-12-08 23:27:58.153+01 | 2012-12-08 23:27:58.153+01
or a query based on a string present in the dir name, e.g.
*Run2011A* :
BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select totmb,path,ictime,iatime from v_pnfs_top_dirs where path like '%Run2011A%' and iuid=$(id -u);"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select totmb,path,ictime,iatime from v_pnfs_top_dirs where path like '%Run2011A%' and iuid=`id -u`;"
The group
uniz-higgs
requested these two queries to identify and remove their group files not accessed during the last 3 months; each T3 group can obviously run them too;
psql -q -t -d chimera -h t3dcachedb03.psi.ch -q --command="select iatime::timestamp::date,iuid,igid,totmb,path from v_pnfs_top_dirs where igid=$(id -g) and iatime < ( now() - interval '3 month' ) order by iatime asc ;"
psql -q -t -d chimera -h t3dcachedb03.psi.ch -q --command="select iatime::timestamp::date,iuid,igid,totmb,path from v_pnfs_top_dirs where igid=$(id -g) and iatime < ( now() - interval '3 month' ) order by totmb desc ;"
By regularly paying attention the view v_pnfs_top_dirs
all the T3 users are kindly invited to proactively delete their big and unnecessary /pnfs
dirs by HowToAccessSe#gfalFS
Relevant and readable only by the T3 admins NodeTypedCacheDB#t3users