Symptoms
Summary: too many pgsql connections, error 53300
SE cannot be accessed through srm.
- In t3se01:catalina.out:
createConnection(): Got exception org.postgresql.util.PSQLException, SQLState: 53300
- In t3cachedb01:pg_sql:
FATAL: connection limit exceeded for non-superusers
Occurrences
At what times did this problem occur (used to estimate frequency):
Observations
Number of connections is set up in
/var/lib/pgsql/data/postgresql.conf: max_connections
. From the
manual it can be seen that
to raise that number it could be necessary to modify the
SysV parameter
SEMMNI
.
It is not clear why the 100 limit is reached. Probably, a certain number of transfers fails and leaves hanged connections on the db level, piling up until the limit is reached. A Nagios plot about this will be created (Fabio) to
monitor this pile-up effect.
Solution or Workaround
Workaround: A restart of dcache initially was the only fix we had.
Pablo Fernandez had submitted a ticket for this issue at CSCS and Gerd Behrmann provided the solution:
- SRM has a feature called "srmVacuum" in dCacheConfig file that must be set to false when you do the autovacuum with postgres (default on >v8.2).
- max_connections in postgres should be bigger than 100 (our case). He recommended to set it up to 1000. It seems we hit that limit during the vacuum.
# ipcs -sl
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
# or alternatively
# sysctl kernel.sem
kernel.sem = 250 32000 32 128
SysV var name |
Description |
requirement |
SEMMNI |
Maximum number of semaphore identifiers (i.e., sets) |
at least ceil((max_connections + autovacuum_max_workers + 4) / 16) |
SEMMNS |
Maximum number of semaphores system-wide |
ceil((max_connections + autovacuum_max_workers + 4) / 16) * 17 plus room for other applications |
SEMMSL |
Maximum number of semaphores per set |
at least 17 |
We added to
/etc/sysctl.conf
:
# 2011-12-06 Derek. Raise max number of arrays from 128 to 256
# for being able to use a higher number of postgres connections (<100)
# during vaccuuming we several times had hit that threshhold
kernel.sem="250 32000 32 256"
Monitoring for this condition
2 checks for the number of DB connections was added to Nagios by Fabio:
check1
check2
[root@t3dcachedb02 ~]# rpm -ql check_postgres
/usr/bin/check_postgres.pl
/usr/share/doc/check_postgres-2.12.0/check_postgres.pl.html
[root@t3dcachedb02 ~]# grep postgres /etc/nagios/nrpe.cfg
command[check_postgres_backends]=/usr/bin/check_postgres.pl --action=backends
command[check_postgres_active_connections]=/usr/bin/check_postgres.pl -H t3dcachedb02 -db chimera -u postgres --timeout=1000 --action=custom_query --critical='190' --warning='180' --valtype=integer --showtime=0 --showperf=1 --query="SELECT count(*) as result, count(*) as performance FROM pg_stat_activity;"