Tags:
create new tag
view all tags

IssueDcacheTooManySQLConnections

Symptoms

Summary: too many pgsql connections, error 53300

SE cannot be accessed through srm.

  • In t3se01:catalina.out: createConnection(): Got exception org.postgresql.util.PSQLException, SQLState: 53300
  • In t3cachedb01:pg_sql: FATAL:  connection limit exceeded for non-superusers

Occurrences

At what times did this problem occur (used to estimate frequency):

2011-11-27 ~ 22:40 according to the Nagios t3vm01 SRM checks
2013-02-27 ~ 14:00

Observations

Number of connections is set up in /var/lib/pgsql/data/postgresql.conf: max_connections. From the manual it can be seen that to raise that number it could be necessary to modify the SysV parameter SEMMNI.

It is not clear why the 100 limit is reached. Probably, a certain number of transfers fails and leaves hanged connections on the db level, piling up until the limit is reached. A Nagios plot about this will be created (Fabio) to monitor this pile-up effect.

Solution or Workaround

Workaround: A restart of dcache initially was the only fix we had.

Pablo Fernandez had submitted a ticket for this issue at CSCS and Gerd Behrmann provided the solution:

  • SRM has a feature called "srmVacuum" in dCacheConfig file that must be set to false when you do the autovacuum with postgres (default on >v8.2).
  • max_connections in postgres should be bigger than 100 (our case). He recommended to set it up to 1000. It seems we hit that limit during the vacuum.

# ipcs -sl

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

            # or alternatively
# sysctl kernel.sem

kernel.sem = 250        32000   32      128

SysV var name Description requirement
SEMMNI Maximum number of semaphore identifiers (i.e., sets) at least ceil((max_connections + autovacuum_max_workers + 4) / 16)
SEMMNS Maximum number of semaphores system-wide ceil((max_connections + autovacuum_max_workers + 4) / 16) * 17 plus room for other applications
SEMMSL Maximum number of semaphores per set at least 17

We added to /etc/sysctl.conf:

# 2011-12-06 Derek. Raise max number of arrays from 128 to 256
# for being able to use a higher number of postgres connections (<100)
# during vaccuuming we several times had hit that threshhold
kernel.sem="250        32000   32      256"

Monitoring for this condition

2 checks for the number of DB connections was added to Nagios by Fabio: check1 check2

[root@t3dcachedb02 ~]# rpm -ql check_postgres
/usr/bin/check_postgres.pl
/usr/share/doc/check_postgres-2.12.0/check_postgres.pl.html

[root@t3dcachedb02 ~]# grep postgres /etc/nagios/nrpe.cfg 
command[check_postgres_backends]=/usr/bin/check_postgres.pl --action=backends
command[check_postgres_active_connections]=/usr/bin/check_postgres.pl -H t3dcachedb02 -db chimera -u postgres --timeout=1000 --action=custom_query --critical='190' --warning='180' --valtype=integer  --showtime=0 --showperf=1 --query="SELECT count(*) as result, count(*) as performance FROM pg_stat_activity;"

IssueForm
Affected Service dcache, srm
Symptom summary too many pgsql connections, error 53300
Reason Understood yes
Solution Exists yes
Obsolete no
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r4 - 2013-02-27 - FabioMartinelli
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback