Fast searching into /pnfs by dc_find NEW

To search into /pnfs the T3 users can use dc_find with the constraint that the information retrieved will be in the worst case 24h old ; this limitation is absolutely indifferent in basically all the practical cases and dc_find will run faster than searching into /pnfs by the usual Linux commands cd, ls, find .

dc_find

dc_find requests the T3 user password and it has many options because of its flexibility but most of them have a reasonable default value that can be left as it is.
$ dc_find -h
usage: dc_find [-h] [-v] [-p PATH] [-s NAME] [-as ATIME_START] [-ae ATIME_END]
             [-cs CTIME_START] [-ce CTIME_END] [-min SIZE_MIN] [-max SIZE_MAX]
             [-d DEPTH] [-u UID] [-g GID] [-oat | -oct | -ou | -og | -os]
             [-oa] [-od] [-l LIMIT] [-f PNFS_FILE_OUTPUT] [--db DBNAME]
             [--dbuser DBUSER] [--dbhost DBHOST] [--dbport DBPORT]

The tool partially replicates the 'find' command in a dCache environment

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         verbose
  -p PATH, --pnfspath PATH
                        The /pnfs dir to be used to start the search; if not specified and the user is inside a /pnfs dir then the search will start from there.
  -s NAME, --string NAME
                        A string to be searched among the /pnfs files and dirs names; default '*'
  -as ATIME_START, --access-time-start ATIME_START
                        files or dirs accessed after  this time; default "1999-01-01 00:00:00"
  -ae ATIME_END, --access-time-end ATIME_END
                        files or dirs accessed before this time; default "2999-01-01 00:00:00"
  -cs CTIME_START, --creation-time-start CTIME_START
                        files or dirs created  after  this time; default "1999-01-01 00:00:00"
  -ce CTIME_END, --creation-time-end CTIME_END
                        files or dirs created  before this time; default "2999-01-01 00:00:00"
  -min SIZE_MIN, --min-size-MB SIZE_MIN
                        files bigger  than this size; default       0
  -max SIZE_MAX, --max-size-MB SIZE_MAX
                        files smaller than this size; default 1000000
  -d DEPTH, --max-depth DEPTH
                        stop the search at this depth; default 1000
  -u UID, --uid UID     files or dirs that belong to this uid; default '*'
  -g GID, --gid GID     files or dirs that belong to this gid; default '*'
  -oat, --order-by-access-time
  -oct, --order-by-creation-time
  -ou, --order-by-uid
  -og, --order-by-gid
  -os, --order-by-size
  -oa, --order-ascending
  -od, --order-descending
  -l LIMIT, --limit-result LIMIT
  -f PNFS_FILE_OUTPUT, --file PNFS_FILE_OUTPUT
                        output file; default dc_find.pnfs.txt
  --db DBNAME           the database name; default chimera
  --dbuser DBUSER       the database user name used to authenticate; default your Linux account name
  --dbhost DBHOST       the database host; default t3dcachedb03.psi.ch
  --dbport DBPORT       the database port; default 5432

examples:
   dc_find --file /tmp/dc_find.out -p /pnfs/psi.ch/cms/trivcat/store/user/bianchi/ --string t-channel_TuneZ2star --order-by-access-time
   dc_find --string .root -p /pnfs/ --order-by-uid -l 100
   dc_find --string .root -p /pnfs/ --access-time-start "2014-01-01 00:00:00" -l 100
Follows a dc_find example that involves an arbitrary /pnfs dir where the user entered by cd:
$ pwd
/pnfs/psi.ch/cms/trivcat/store/user/ursl
$ dc_find --string .root --order-by-size --order-desc --limit 1000 --file $HOME/dc_find.out
Password: 
$ head $HOME/dc_find.out 
+-------+-----+-----+-----+----------------------------------+----------------------------------+------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| depth |  MB | uid | gid | ctime                            | atime                            | mode | adler32  | /pnfs                                                                                                                                                                                                                                                       | pools         |
+-------+-----+-----+-----+----------------------------------+----------------------------------+------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
|    11 | 422 | 584 | 538 | 2013-11-08 17:16:27.816000+01:00 | 2014-01-10 15:41:58.263000+01:00 | 644  | ce688b67 | /pnfs/psi.ch/cms/trivcat/store/user/yangyong/data/reco/v6/TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM/analysis.TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM_3_1_nxT.root                               | t3fs14_cms_8  |
|    11 | 482 | 584 | 538 | 2013-11-08 17:16:27.695000+01:00 | 2014-01-13 10:49:45.009000+01:00 | 644  | f50c2466 | /pnfs/psi.ch/cms/trivcat/store/user/yangyong/data/reco/v6/TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM/analysis.TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM_13_1_HsS.root                              | t3fs14_cms_11 |
...

View v_pnfs

The advanced T3 users can directly use psql to interact with the PostgreSQL view v_pnfs that reports each /pnfs file and dir; we highlighted the columns that are interesting for these users:
depth ipnfsid path iname isize iuid igid ictime iatime imtime imode adler32 pools


We list some common psql queries, but it's up to the T3 users to write, save and regularly run their specific queries.

Ho many GB am I globally using in /pnfs ?

BASH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where iuid=$(id -u);"
TCSH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where iuid=`id -u`;"

Ho many GB is my group globally using in /pnfs?

BASH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=$(id -g);"
TCSH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=`id -g`;" 

Ho many GB is my group using in an arbitrary /pnfs dir, e.g. /pnfs/psi.ch/cms/trivcat/store/* ?

BASH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=$(id -g) and path like '/pnfs/psi.ch/cms/trivcat/store/%';" 
TCSH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=`id -g`  and path like '/pnfs/psi.ch/cms/trivcat/store/%';"

Ho many GB is my group using in /pnfs/psi.ch/cms/trivcat/store/* that weren't read since 2014-01-01 ?
You can specify whatever iatime date.

BASH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=$(id -g) and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "
TCSH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=`id -g`  and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "

How to save in a file all the /pnfs files involved in the previous query?
Here the files list is going to be saved in $PWD/pnfsfiles.

BASH $  psql -o $PWD/pnfsfiles -d chimera -h t3dcachedb03.psi.ch -A -q --command="select path,isize as bytes,iatime,iuid,igid from v_pnfs where igid=$(id -g) and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "
TCSH $  psql -o $PWD/pnfsfiles -d chimera -h t3dcachedb03.psi.ch -A -q --command="select path,isize as bytes,iatime,iuid,igid from v_pnfs where igid=`id -g`  and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "

$ head -1 pnfsfiles 
path|bytes|iatime|iuid|igid

Ho many TB are the T3 groups using in /pnfs/psi.ch/cms/trivcat/store/* that weren't read since 2014-01-01 ?

BASH/TCSH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000/1000) as TB,igid from v_pnfs where path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime < '2014-01-01' group by igid order by TB desc ;" 

 TB  | igid
-----+------
 176 |  500 # common data
 129 |  533 # ethz-susy
  23 |  536 # uniz-bphys
  19 |  531 # ethz-ewk
  17 |  534 # psi-bphys
   8 |  538 # uniz-pixel
   5 |  532 # ethz-higgs
   3 |  530 # ethz-bphys
   0 |  535 # psi-pixel
   0 |  537 # uniz-higgs

Who did upload more GB during the last month ( without considering files that were deleted ) ?

BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB,iuid from v_pnfs where ictime > date_trunc('day', NOW() - interval '1 month') group by iuid order by GB desc ;"
Password: 
  GB  | iuid
------+------
 7401 |  521 # ursl
 4451 |  607 # mwang
 3540 |  585 # ...
 2587 |  584
 2466 |  603
 ...

How to list the /pnfs files such that ..
Here we retrieve a list of files satisfying many different constraints in parallel; a this stage the SQL code should be self-explanatory.

BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select path,iatime,isize from v_pnfs where path like '/pnfs/psi.ch/cms/trivcat/store/user/ursl/%.gz' and iatime < '2014-03-01' and iatime > '2014-01-01' order by iatime limit 10 ;"
Password: 
                                        path                                         |           iatime           |   isize    
-------------------------------------------------------------------------------------+----------------------------+------------
 /pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_0.tar.gz | 2014-02-27 13:33:15.222+01 | 1127148238
 /pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_1.tar.gz | 2014-02-27 13:34:17.122+01 |  189441386
 /pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_3.tar.gz | 2014-02-27 13:35:12.52+01  |  950718190
 /pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_4.tar.gz | 2014-02-27 13:35:36.329+01 |  947247447
(4 rows)

As the queries get complex and recurring is better to save them in a file: so here we prepare a query file and we run it:

$ cat query 
select trunc(sum(isize)/1000/1000/1000) as GB, 
       ictime::timestamp::date as day 
from   v_pnfs 
where  path like '%PU_S10_START53_V7D-v3%root'
       and ictime > '2011-01-01'
group by day
order by day asc;

BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q -f query
  gb  |    day     
------+------------
 1071 | 2013-02-01
 4130 | 2013-02-02
 3316 | 2013-02-03
 1190 | 2013-02-04
   43 | 2013-11-11
   46 | 2013-11-12
   63 | 2013-11-18
  292 | 2013-11-19
   28 | 2013-11-20

Which are the /pnfs dirs where have been uploaded more GB during the last month ?
Here we combine together the expressing power of SQL with the regular expressions; the metadata of a /pnfs dir is 512 bytes so by excluding the rows with isize = 512 we're basically considering only the /pnfs files. The expression [^\/]*$ matches the file name that will be erased by the regexp_replace.

 $ cat query 
select trunc(sum(isize)/1000/1000/1000) as GB,
       regexp_replace(path, '[^\/]*$', '') as dir
from   v_pnfs
where  ictime > date_trunc('day', NOW() - interval '1 month')
and    isize != 512
group by dir
order by GB desc;

BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q -f query
  gb  |                                                                                                     dir                                                                                                      
------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1995 | /pnfs/psi.ch/cms/trivcat/store/user/yangyong/data/reco/tree/v5_13/
 1246 | /pnfs/psi.ch/cms/trivcat/store/user/jngadiub/Thea/QCD_defaultCones/
 1042 | /pnfs/psi.ch/cms/trivcat/store/user/mwang/data/SingleElectron/EDBR_PATtuple_edbr_wh_20140210_SingleElectron_Run2012D-22Jan2013-v1/c2d529e1c78e50623ca40825abf53f99/
  902 | /pnfs/psi.ch/cms/trivcat/store/user/mwang/data/SingleElectron/EDBR_PATtuple_edbr_wh_20140210_SingleElectron_Run2012C-22Jan2013-v1/c2d529e1c78e50623ca40825abf53f99/
Which are the /pnfs dirs where I/my group have/has uploaded more GB during the last month ?
Add to the previous query either this contraint
 and iuid   = YOUR_UID
or this contraint:
 and igid   = YOUR_GID

View v_pnfs_top_dirs

NOTE The simplest ways to consult the preordered /pnfs dirs list consists of:

The view v_pnfs_top_dirs reports each /pnfs dirs ( not the files ) preordered from the biggest dirs to the smallest dirs:

totmb depth ipnfsid path iuid igid ictime iatime imtime

typically a T3 user will extract from the view v_pnfs_top_dirs all the dirs involving his/her uid, or his/her gid, by:

BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select totmb,path,ictime,iatime from v_pnfs_top_dirs where iuid=$(id -u) ;"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select totmb,path,ictime,iatime from v_pnfs_top_dirs where iuid=`id -u` ;"
 totmb |                                                                               path                                                                                |           ictime           |           iatime           
-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+----------------------------
      3979 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012D/DoubleElectron/RAW-RECO/ZElectron-22Jan2013-v1/10000                                                                 | 2013-11-15 17:48:01.372+01 | 2013-11-14 11:13:55.999+01
      3976 | /pnfs/psi.ch/cms/trivcat/store/data/Run2011A/MuOnia/AOD/08Nov2011-v1/0001                                                                                         | 2012-12-07 21:24:37.456+01 | 2012-12-07 21:24:37.456+01
      3964 | /pnfs/psi.ch/cms/trivcat/store/data/Run2011A/MuOnia/AOD/08Nov2011-v1/0000                                                                                         | 2012-12-07 21:24:37.456+01 | 2012-12-07 21:24:37.456+01
      3953 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012C/DoubleElectron/RAW-RECO/ZElectron-22Jan2013-v1/20000                                                                 | 2013-06-01 13:56:01.777+02 | 2013-05-30 18:28:18.34+02
      3935 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012B/MuOnia/AOD/13Jul2012-v1/0000                                                                                         | 2012-11-30 17:15:09.289+01 | 2012-11-12 19:55:24+01
      3899 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012B/MuOnia/AOD/13Jul2012-v1/0001                                                                                         | 2012-11-30 17:15:21.191+01 | 2012-11-11 11:56:39+01
      3877 | /pnfs/psi.ch/cms/trivcat/store/data/Run2011B/MuOnia/AOD/19Nov2011-v1/0000                                                                                         | 2012-12-08 23:27:58.153+01 | 2012-12-08 23:27:58.153+01

or a query based on a string present in the dir name, e.g. *Run2011A* :

BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select totmb,path,ictime,iatime from v_pnfs_top_dirs where path like '%Run2011A%' and iuid=$(id -u);"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select totmb,path,ictime,iatime from v_pnfs_top_dirs where path like '%Run2011A%' and iuid=`id -u`;"

By regularly paying attention the view v_pnfs_top_dirs all the T3 users are kindly invited to proactively delete their big and unnecessary /pnfs dirs by HowToAccessSe#gfalFS

PostgreSQL permissions

This is relevant and accessible only to the T3 admins NodeTypeDBServerChimera#t3users
Edit | Attach | Watch | Print version | History: r18 | r16 < r15 < r14 < r13 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r14 - 2014-12-09 - FabioMartinelli
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback