Symptoms
Summary: "slapd" process won't start in a Xen
DomU virtual machines
The
slapd
daemon underlying the BDII service does not start. This
condition is not detected by the
/etc/init.d/bdii start
script
(which exists with "OK" state) but may be recognized in two ways:
1) no
slapd
instance is running; executing
ps -u edguser
shows a
lone
bdii-fwd
process (and no
slapd
or
bdii-update
processes).
2) invocation of
/etc/init.d/bdii status
reports that no
bdii-update
process is running and the PID file is stale.
Trying to start the BDII
slapd
process from the command line with
all debugging options set, yields the following result:
# /usr/sbin/slapd -d 2048 -f /opt/bdii/var/2172/bdii-slapd.conf -h ldap://localhost:2172 -u edguser
@(#) $OpenLDAP: slapd 2.2.13 (Jul 26 2008 12:40:45) $
root@yort.fnal.gov:/mnt/src/4/BUILD/openldap-2.2.13/openldap-2.2.13/build-servers/servers/slapd
bdb_initialize: Sleepycat Software: Berkeley DB 4.2.52: (December 3, 2003)
bdb_initialize: Sleepycat Software: Berkeley DB 4.2.52: (December 3, 2003)
/opt/bdii/var/2172/bdii-slapd.conf: line 11: schema checking disabled! your mileage may vary!
bdb_db_init: Initializing BDB database
bdb_db_init: Initializing BDB database
bdb(o=grid): unable to initialize mutex: Function not implemented
bdb(o=grid): /opt/bdii/var/2172/__db.001: unable to initialize environment lock: Function not implemented
bdb_db_open: dbenv_open failed: Function not implemented (38)
backend_startup: bi_db_open(0) failed! (38)
bdb(o=grid): DB_ENV->lock_id_free interface requires an environment configured for the locking subsystem
Segmentation fault
Occurrences
As of 2008-11-03, the problem is reproduceable every time with the
OpenLDAP RPMs version 2.2.13-12.el4 in a Xen
DomU running the
2.6.9-78.0.1.ELxenU SL4 kernel:
$ rpm -qa | fgrep ldap
openldap-2.2.13-12.el4
openldap-clients-2.2.13-12.el4
openldap-servers-2.2.13-12.el4
$ rpm -qa | fgrep kernel
kernel-xenU-2.6.9-78.0.1.EL
$ uname -a
Linux mon.lcg.cscs.ch 2.6.9-78.0.1.ELxenU #1 SMP Tue Aug 5 13:58:36 CDT 2008 i686 athlon i386 GNU/Linux
Observations
Apparently, the same problem has been mentioned in
gLite bug #42089:
It seems that:
1) the bug was only introduced in one of the latest gLite upgrades
2) the only solution is to recompile (see Martin Polak's comment in the above bug report):
a- either db4 and openldap
b- or the glibc (to add the Xen-friendly patches)
However, further email exchange with Martin Polak revealed that:
[...] There were two things that did help me:
I've built db and openldap from the tar.gz files which made it work
..however I was unable to "rebuild" working rpms for bdii mit -mno-tls-direct-segrefs
what I did in the end was: switch back the database backend of bdii to ldbm
The last bdii update to the pps however seems to work correctly with the bdb backend
again.
Indeed, googling for the issue does not reveal much but vague hints
that this is due to incompatibilities between the NPTL implementation
of POSIX threads and Xen
DomU kernels, but no actual indication on how
to fix it.
Solution or Workaround
The only known workaround, at the moment, is to change the
slapd
backend to
ldbm
(which does not make use of the db4 libraries).
This can be accomplished by changing all the lines in file
/opt/bdii/etc/glue-slapd.conf
that contain
database bdb
to read
database ldbm
instead. The
CfEngine file
cf.bdii
will do this
edit autmatically on all Xen
DomU's that have the
bdii
RPM package
installed.
Monitoring for this condition
Either one of two symptoms:
1) no
slapd
instance is running; executing
ps -u edguser
shows a
lone
bdii-fwd
process (and no
slapd
or
bdii-update
processes).
2) invocation of
/etc/init.d/bdii status
reports that no
bdii-update
process is running and the PID file is stale.
--
RiccardoMurri - 03 Nov 2008