Random Failures of the data management lcg-utils
As seen on LCG-ROLLOUT:
The thing is that since LFC-based tests became the official rm tests in the
SFTs, our site (BIFI) were experiencing random failures in the lcg-utils
commands (lcg-cr, lcg-cp, lcg-rep, lcg-del) reporting the classical "Invalid
argument" error message.
After days of exhaustive debugging, we found the problem to be in the
LFC-related environment variables (LCG_CATALOG_TYPE, LFC_HOST, LFC_HOME).
According to the CSH test of the SFTs, those vars were supposed to be always
correctly set and thus was confirmed after studying the SFTs sources (a
mixture of perl and shell scripts). However, we don't know why neither under
which circumstamces, sometimes it happens that those vars are not properly
set, and consequently the commands fail.
The recipe we aplied to solve the problem is quite straightforward; create in
all WNs a script under /etc/profile.d/ which sets those vars for users mapped
to the dteamsgm account (the one used for the execution of official SFTs):
Vincenzo has created a script called it sft.sh and placed it under /etc/profile.d/ on every WN:
#!/bin/bash
if [ `whoami` = "dteamsgm" ]
then
export LCG_CATALOG_TYPE=lfc
export LFC_HOST=lfc-dteam.cern.ch
export LFC_HOME=/grid/dteam/SFT
fi
This fixes the problem and now the datamgmt tests light up green.
--
PeterKunszt - 03 Apr 2006