Symptoms
Summary: too many pgsql connections, error 53300
SE cannot be accessed through srm.
- In t3se01:catalina.out:
createConnection(): Got exception org.postgresql.util.PSQLException, SQLState: 53300
- In t3cachedb01:pg_sql:
FATAL: connection limit exceeded for non-superusers
Occurrences
At what times did this problem occur (used to estimate frequency):
Observations
Number of connections is set up in
/var/lib/pgsql/data/postgresql.conf: max_connections
. From the
manual it can be seen that
to raise that number it could be necessary to modify the
SysV parameter
SEMMNI
.
It is not clear why the 100 limit is reached. Probably, a certain number of transfers fails and leaves hanged connections on the db level, piling up until the limit is reached. A Nagios plot about this will be created (Fabio) to
monitor this pile-up effect.
Solution or Workaround
A clean restart of dcache on se01,following the instructions reported here:
StartStopDcache. This cured the issue all the times.
Monitoring for this condition
A check for number of DB connections is needed into Nagios;
So I've implemented this
check_postgres like this
Nagios check deployed into t3dcachedb01.
[root@t3dcachedb01 ~]# rpm -ql check_postgres
/usr/bin/check_postgres.pl
/usr/share/doc/check_postgres-2.12.0/check_postgres.pl.html
[root@t3dcachedb01 ~]# grep postgres /etc/nagios/nrpe.cfg
command[check_postgres_backends]=/usr/bin/check_postgres.pl --action=backends
--
LeonardoSala - 2011-11-28