Tags: view all tags

IssueDcacheTooManySQLConnections

Symptoms
Occurrences
Observations
Solution or Workaround
Monitoring for this condition

Symptoms

Summary: too many pgsql connections, error 53300

SE cannot be accessed through srm.

In t3se01:catalina.out: createConnection(): Got exception org.postgresql.util.PSQLException, SQLState: 53300
In t3cachedb01:pg_sql: FATAL: connection limit exceeded for non-superusers

Occurrences

At what times did this problem occur (used to estimate frequency):

2011-11-27 ~ 22:40 according to the Nagios t3vm01 SRM checks
2013-02-27 ~ 14:00

2011-11-27 ~ 22:40 according to the Nagios t3vm01 SRM checks

2013-02-27 ~ 14:00

Observations

Number of connections is set up in /var/lib/pgsql/data/postgresql.conf: max_connections. From the manual it can be seen that to raise that number it could be necessary to modify the SysV parameter SEMMNI.

It is not clear why the 100 limit is reached. Probably, a certain number of transfers fails and leaves hanged connections on the db level, piling up until the limit is reached. A Nagios plot about this will be created (Fabio) to monitor this pile-up effect.

Solution or Workaround

Workaround: A restart of dcache initially was the only fix we had.

Pablo Fernandez had submitted a ticket for this issue at CSCS and Gerd Behrmann provided the solution:

SRM has a feature called "srmVacuum" in dCacheConfig file that must be set to false when you do the autovacuum with postgres (default on >v8.2).
max_connections in postgres should be bigger than 100 (our case). He recommended to set it up to 1000. It seems we hit that limit during the vacuum.

# ipcs -sl

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

            # or alternatively
# sysctl kernel.sem

kernel.sem = 250        32000   32      128

SysV var name	Description	requirement
SEMMNI	Maximum number of semaphore identifiers (i.e., sets)	at least ceil((max_connections + autovacuum_max_workers + 4) / 16)
SEMMNS	Maximum number of semaphores system-wide	ceil((max_connections + autovacuum_max_workers + 4) / 16) * 17 plus room for other applications
SEMMSL	Maximum number of semaphores per set	at least 17

We added to /etc/sysctl.conf:

# 2011-12-06 Derek. Raise max number of arrays from 128 to 256
# for being able to use a higher number of postgres connections (<100)
# during vaccuuming we several times had hit that threshhold
kernel.sem="250        32000   32      256"

Monitoring for this condition

2 checks for the number of DB connections was added to Nagios by Fabio: check1 check2

[root@t3dcachedb02 ~]# rpm -ql check_postgres
/usr/bin/check_postgres.pl
/usr/share/doc/check_postgres-2.12.0/check_postgres.pl.html

[root@t3dcachedb02 ~]# grep postgres /etc/nagios/nrpe.cfg 
command[check_postgres_backends]=/usr/bin/check_postgres.pl --action=backends
command[check_postgres_active_connections]=/usr/bin/check_postgres.pl -H t3dcachedb02 -db chimera -u postgres --timeout=1000 --action=custom_query --critical='190' --warning='180' --valtype=integer  --showtime=0 --showperf=1 --query="SELECT count(*) as result, count(*) as performance FROM pg_stat_activity;"

IssueForm
Affected Service	dcache, srm
Symptom summary	too many pgsql connections, error 53300
Reason Understood	yes
Solution Exists	yes
Obsolete	no

Topic revision: r4 - 2013-02-27 - FabioMartinelli

CmsTier3

User Pages
Main Page
Policies

Physics Groups
Steering Board Meetings

Admin Pages
AdminArea
Cluster Specs