Tags:
dcache1Add my vote for this tag create new tag
view all tags

Searching into /pnfs by dc_find NEW

To search into /pnfs the T3 users can use dc_find with the limitation that the information retrieved will be in the worst case 24h old ;

dc_find will run faster than searching into /pnfs by the usual Linux commands cd, ls, find and it offers several --order-by-* options.

dc_find

dc_find needs the T3 user password and it offers many options with a reasonable default value that can be safely left as it is.
$ dc_find -h
usage: dc_find [-h] [-v] [-p PATH] [-s NAME] [-as ATIME_START] [-ae ATIME_END]
             [-cs CTIME_START] [-ce CTIME_END] [-min SIZE_MIN] [-max SIZE_MAX]
             [-d DEPTH] [-u UID] [-g GID] [-oat | -oct | -ou | -og | -os]
             [-oa] [-od] [-l LIMIT] [-f PNFS_FILE_OUTPUT] [--db DBNAME]
             [--dbuser DBUSER] [--dbhost DBHOST] [--dbport DBPORT]

dc_find partially replicates the 'find' command in a dCache environment

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         verbose
  -p PATH, --pnfspath PATH
                        The /pnfs dir to be used to start the search; if not specified and the user is inside a /pnfs dir then the search will start from there.
  -s NAME, --string NAME
                        A string to be searched among the /pnfs files and dirs names; default '*'
  -as ATIME_START, --access-time-start ATIME_START
                        files or dirs accessed after  this time; default "1999-01-01 00:00:00"
  -ae ATIME_END, --access-time-end ATIME_END
                        files or dirs accessed before this time; default "2999-01-01 00:00:00"
  -cs CTIME_START, --creation-time-start CTIME_START
                        files or dirs created  after  this time; default "1999-01-01 00:00:00"
  -ce CTIME_END, --creation-time-end CTIME_END
                        files or dirs created  before this time; default "2999-01-01 00:00:00"
  -min SIZE_MIN, --min-size-MB SIZE_MIN
                        files bigger  than this size; default       0
  -max SIZE_MAX, --max-size-MB SIZE_MAX
                        files smaller than this size; default 1000000
  -d DEPTH, --max-depth DEPTH
                        stop the search at this depth; default 1000
  -u UID, --uid UID     files or dirs that belong to this uid; default '*'
  -g GID, --gid GID     files or dirs that belong to this gid; default '*'
  -oat, --order-by-access-time
  -oct, --order-by-creation-time
  -ou, --order-by-uid
  -og, --order-by-gid
  -os, --order-by-size
  -oa, --order-ascending
  -od, --order-descending
  -l LIMIT, --limit-result LIMIT
  -f PNFS_FILE_OUTPUT, --file PNFS_FILE_OUTPUT
                        output file; default dc_find.pnfs.txt
  --db DBNAME           the database name; default chimera
  --dbuser DBUSER       the database user name used to authenticate; default your Linux account name
  --dbhost DBHOST       the database host; default t3dcachedb03.psi.ch
  --dbport DBPORT       the database port; default 5432

dc_find examples:
   dc_find --file /tmp/dc_find.out -p /pnfs/psi.ch/cms/trivcat/store/user/bianchi/ --string t-channel_TuneZ2star --order-by-access-time
   dc_find --string .root -p /pnfs/ --order-by-uid -l 100
   dc_find --string .root -p /pnfs/ --access-time-start "2014-01-01 00:00:00" -l 100
Following a dc_find example involving a generic /pnfs dir where the T3 user entered by cd:
$ pwd
/pnfs/psi.ch/cms/trivcat/store/user/ursl
$ dc_find --string .root --order-by-size --order-desc --limit 1000 --file $HOME/dc_find.out
Password: 
$ head $HOME/dc_find.out 
+-------+-----+-----+-----+----------------------------------+----------------------------------+------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| depth |  MB | uid | gid | ctime                            | atime                            | mode | adler32  | /pnfs                                                                                                                                                                                                                                                       | pools         |
+-------+-----+-----+-----+----------------------------------+----------------------------------+------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
|    11 | 422 | 584 | 538 | 2013-11-08 17:16:27.816000+01:00 | 2014-01-10 15:41:58.263000+01:00 | 644  | ce688b67 | /pnfs/psi.ch/cms/trivcat/store/user/yangyong/data/reco/v6/TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM/analysis.TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM_3_1_nxT.root                               | t3fs14_cms_8  |
|    11 | 482 | 584 | 538 | 2013-11-08 17:16:27.695000+01:00 | 2014-01-13 10:49:45.009000+01:00 | 644  | f50c2466 | /pnfs/psi.ch/cms/trivcat/store/user/yangyong/data/reco/v6/TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM/analysis.TTH_HToBB_M-125_8TeV-pythia6Summer12_DR53X-PU_S10_START53_V7A-v1AODSIM_13_1_HsS.root                              | t3fs14_cms_11 |
...

View v_pnfs

The advanced T3 users can also directly query by the CLI psql the PostgreSQL view v_pnfs used by dc_find; we highlighted the columns that are usually interesting for these users:

depth ipnfsid path iname isize iuid igid ictime iatime imtime imode adler32 pools


We list some common psql queries, but it's up to the T3 users to write, save and regularly run their specific queries.

How many GB am I globally using in /pnfs ?

BASH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where iuid=$(id -u);"
TCSH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where iuid=`id -u`;"

How many GB is my group globally using in /pnfs?

BASH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=$(id -g);"
TCSH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=`id -g`;" 

How many GB is my group using in an arbitrary /pnfs dir, e.g. /pnfs/psi.ch/cms/trivcat/store/* ?

BASH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=$(id -g) and path like '/pnfs/psi.ch/cms/trivcat/store/%';" 
TCSH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=`id -g`  and path like '/pnfs/psi.ch/cms/trivcat/store/%';"

How many GB is my group using in /pnfs/psi.ch/cms/trivcat/store/* that weren't read since 2014-01-01 ?
You can specify whatever iatime date.

BASH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=$(id -g) and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "
TCSH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB from v_pnfs where igid=`id -g`  and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "

How to save in a file all the /pnfs files involved in the previous query?
Here the files list is going to be saved in $PWD/pnfsfiles.

BASH $  psql -o $PWD/pnfsfiles -d chimera -h t3dcachedb03.psi.ch -A -q --command="select path,isize as bytes,iatime,iuid,igid from v_pnfs where igid=$(id -g) and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "
TCSH $  psql -o $PWD/pnfsfiles -d chimera -h t3dcachedb03.psi.ch -A -q --command="select path,isize as bytes,iatime,iuid,igid from v_pnfs where igid=`id -g`  and path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime <= '2014-01-01'; "

$ head -1 pnfsfiles 
path|bytes|iatime|iuid|igid

How many TB are the T3 groups using in /pnfs/psi.ch/cms/trivcat/store/* that weren't read since 2014-01-01 ?

BASH/TCSH $  psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000/1000) as TB,igid from v_pnfs where path like '/pnfs/psi.ch/cms/trivcat/store/%' and iatime < '2014-01-01' group by igid order by TB desc ;" 

 TB  | igid
-----+------
 176 |  500 # common data
 129 |  533 # ethz-susy
  23 |  536 # uniz-bphys
  19 |  531 # ethz-ewk
  17 |  534 # psi-bphys
   8 |  538 # uniz-pixel
   5 |  532 # ethz-higgs
   3 |  530 # ethz-bphys
   0 |  535 # psi-pixel
   0 |  537 # uniz-higgs

Who did upload more GB during the last month ( without considering files that were deleted ) ?

BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select trunc(sum(isize)/1000/1000/1000) as GB,iuid from v_pnfs where ictime > date_trunc('day', NOW() - interval '1 month') group by iuid order by GB desc ;"
Password: 
  GB  | iuid
------+------
 7401 |  521 # ursl
 4451 |  607 # mwang
 3540 |  585 # ...
 2587 |  584
 2466 |  603
 ...

How to list the /pnfs files such that ..
Here we retrieve a list of files satisfying many different constraints in parallel; a this stage the SQL code should be self-explanatory.

BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select path,iatime,isize from v_pnfs where path like '/pnfs/psi.ch/cms/trivcat/store/user/ursl/%.gz' and iatime < '2014-03-01' and iatime > '2014-01-01' order by iatime limit 10 ;"
Password: 
                                        path                                         |           iatime           |   isize    
-------------------------------------------------------------------------------------+----------------------------+------------
 /pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_0.tar.gz | 2014-02-27 13:33:15.222+01 | 1127148238
 /pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_1.tar.gz | 2014-02-27 13:34:17.122+01 |  189441386
 /pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_3.tar.gz | 2014-02-27 13:35:12.52+01  |  950718190
 /pnfs/psi.ch/cms/trivcat/store/user/ursl/strebeli/ggHyy/store/mcanlo_store_4.tar.gz | 2014-02-27 13:35:36.329+01 |  947247447
(4 rows)

As the queries get complex and recurring is better to save them in a file: so here we prepare a query file and we run it:

$ cat query 
select trunc(sum(isize)/1000/1000/1000) as GB, 
       ictime::timestamp::date as day 
from   v_pnfs 
where  path like '%PU_S10_START53_V7D-v3%root'
       and ictime > '2011-01-01'
group by day
order by day asc;

BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q -f query
  gb  |    day     
------+------------
 1071 | 2013-02-01
 4130 | 2013-02-02
 3316 | 2013-02-03
 1190 | 2013-02-04
   43 | 2013-11-11
   46 | 2013-11-12
   63 | 2013-11-18
  292 | 2013-11-19
   28 | 2013-11-20

Which are the /pnfs dirs where have been uploaded more GB during the last month ?
Here we combine together the expressing power of SQL with the regular expressions; the metadata of a /pnfs dir is 512 bytes so by excluding the rows with isize = 512 we're basically considering only the /pnfs files. The expression [^\/]*$ matches the file name that will be erased by the regexp_replace.

 $ cat query 
select trunc(sum(isize)/1000/1000/1000) as GB,
       regexp_replace(path, '[^\/]*$', '') as dir
from   v_pnfs
where  ictime > date_trunc('day', NOW() - interval '1 month')
and    isize != 512
group by dir
order by GB desc;

BASH/TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q -f query
  gb  |                                                                                                     dir                                                                                                      
------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1995 | /pnfs/psi.ch/cms/trivcat/store/user/yangyong/data/reco/tree/v5_13/
 1246 | /pnfs/psi.ch/cms/trivcat/store/user/jngadiub/Thea/QCD_defaultCones/
 1042 | /pnfs/psi.ch/cms/trivcat/store/user/mwang/data/SingleElectron/EDBR_PATtuple_edbr_wh_20140210_SingleElectron_Run2012D-22Jan2013-v1/c2d529e1c78e50623ca40825abf53f99/
  902 | /pnfs/psi.ch/cms/trivcat/store/user/mwang/data/SingleElectron/EDBR_PATtuple_edbr_wh_20140210_SingleElectron_Run2012C-22Jan2013-v1/c2d529e1c78e50623ca40825abf53f99/
Which are the /pnfs dirs where I/my group have/has uploaded more GB during the last month ?
Add to the previous query either this contraint
 and iuid   = YOUR_UID
or this contraint:
 and igid   = YOUR_GID

View v_pnfs_top_dirs

The view v_pnfs_top_dirs reports each /pnfs dirs ( not the files ) already preordered from the biggest dir to the smallest dir:
totmb depth ipnfsid path iuid igid ictime iatime imtime

The simplest ways to use v_pnfs_top_dirs are:

A T3 user can also extract from the view v_pnfs_top_dirs all the dirs involving his/her uid, or his/her gid, for instance by:

BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select totmb,path,ictime,iatime from v_pnfs_top_dirs where iuid=$(id -u) ;"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select totmb,path,ictime,iatime from v_pnfs_top_dirs where iuid=`id -u` ;"
 totmb |                                                                               path                                                                                |           ictime           |           iatime           
-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+----------------------------
      3979 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012D/DoubleElectron/RAW-RECO/ZElectron-22Jan2013-v1/10000                                                                 | 2013-11-15 17:48:01.372+01 | 2013-11-14 11:13:55.999+01
      3976 | /pnfs/psi.ch/cms/trivcat/store/data/Run2011A/MuOnia/AOD/08Nov2011-v1/0001                                                                                         | 2012-12-07 21:24:37.456+01 | 2012-12-07 21:24:37.456+01
      3964 | /pnfs/psi.ch/cms/trivcat/store/data/Run2011A/MuOnia/AOD/08Nov2011-v1/0000                                                                                         | 2012-12-07 21:24:37.456+01 | 2012-12-07 21:24:37.456+01
      3953 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012C/DoubleElectron/RAW-RECO/ZElectron-22Jan2013-v1/20000                                                                 | 2013-06-01 13:56:01.777+02 | 2013-05-30 18:28:18.34+02
      3935 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012B/MuOnia/AOD/13Jul2012-v1/0000                                                                                         | 2012-11-30 17:15:09.289+01 | 2012-11-12 19:55:24+01
      3899 | /pnfs/psi.ch/cms/trivcat/store/data/Run2012B/MuOnia/AOD/13Jul2012-v1/0001                                                                                         | 2012-11-30 17:15:21.191+01 | 2012-11-11 11:56:39+01
      3877 | /pnfs/psi.ch/cms/trivcat/store/data/Run2011B/MuOnia/AOD/19Nov2011-v1/0000                                                                                         | 2012-12-08 23:27:58.153+01 | 2012-12-08 23:27:58.153+01

or a query based on a string present in the dir name, e.g. *Run2011A* :

BASH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select totmb,path,ictime,iatime from v_pnfs_top_dirs where path like '%Run2011A%' and iuid=$(id -u);"
TCSH $ psql -d chimera -h t3dcachedb03.psi.ch -q --command="select totmb,path,ictime,iatime from v_pnfs_top_dirs where path like '%Run2011A%' and iuid=`id -u`;"

The group uniz-higgs requested these two queries to identify and remove their group files not accessed during the last 3 months; each T3 group can obviously run them too;

psql -q -t -d chimera -h t3dcachedb03.psi.ch -q --command="select iatime::timestamp::date,iuid,igid,totmb,path from v_pnfs_top_dirs where igid=$(id -g) and iatime < ( now() - interval '3 month' ) order by iatime asc ;"
psql -q -t -d chimera -h t3dcachedb03.psi.ch -q --command="select iatime::timestamp::date,iuid,igid,totmb,path from v_pnfs_top_dirs where igid=$(id -g) and iatime < ( now() - interval '3 month' ) order by totmb desc ;"
By regularly paying attention the view v_pnfs_top_dirs all the T3 users are kindly invited to proactively delete their big and unnecessary /pnfs dirs by HowToAccessSe#gfalFS

PostgreSQL permissions

Relevant and readable only by the T3 admins NodeTypedCacheDB#t3users
Edit | Attach | Watch | Print version | History: r18 < r17 < r16 < r15 < r14 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r18 - 2016-11-04 - FabioMartinelli
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback