<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups # * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup # * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> ---+!! %TOPIC% %TOC% ---++ Symptoms Summary: %FORMFIELD{"Symptom summary"}% SE cannot be accessed through srm. * In t3se01:catalina.out: =createConnection(): Got exception org.postgresql.util.PSQLException, SQLState: 53300= * In t3cachedb01:pg_sql: =FATAL: connection limit exceeded for non-superusers= ---++ Occurrences At what times did this problem occur (used to estimate frequency): | 2011-11-27 ~ 22:40 according to the [[https://t3nagios.psi.ch/nagios/cgi-bin/status.cgi?host=t3vm01][Nagios t3vm01 SRM checks]] | | 2013-02-27 ~ 14:00 | ---++ Observations <!-- #collect here the information which may help to better understand the state of the system or services, e.g. #log excerpts, strace output, etc. #this also may help to identify the problem if similar conditions arise again --> Number of connections is set up in =/var/lib/pgsql/data/postgresql.conf: max_connections=. From the [[http://www.postgresql.org/docs/8.2/static/runtime-config-connection.html][manual]] it can be seen that to raise that number it could be necessary to modify the SysV parameter =SEMMNI=. <strike>It is not clear why the 100 limit is reached. Probably, a certain number of transfers fails and leaves hanged connections on the db level, piling up until the limit is reached. A Nagios plot about this will be created (Fabio) to monitor this pile-up effect.</strike> ---++ Solution or Workaround Workaround: A restart of dcache initially was the only fix we had. Pablo Fernandez had submitted a ticket for this issue at CSCS and Gerd Behrmann provided the solution: * SRM has a feature called "srmVacuum" in dCacheConfig file that must be set to false when you do the autovacuum with postgres (default on >v8.2). * max_connections in postgres should be bigger than 100 (our case). He recommended to set it up to 1000. It seems we hit that limit during the vacuum. <pre> # ipcs -sl ------ Semaphore Limits -------- max number of arrays = 128 max semaphores per array = 250 max semaphores system wide = 32000 max ops per semop call = 32 semaphore max value = 32767 # or alternatively # sysctl kernel.sem kernel.sem = 250 32000 32 128 </pre> | SysV var name | Description | requirement | |SEMMNI |Maximum number of semaphore identifiers (i.e., sets) |at least ceil((max_connections + autovacuum_max_workers + 4) / 16) | |SEMMNS |Maximum number of semaphores system-wide |ceil((max_connections + autovacuum_max_workers + 4) / 16) * 17 plus room for other applications | |SEMMSL |Maximum number of semaphores per set |at least 17 | We added to =/etc/sysctl.conf=: <pre> # 2011-12-06 Derek. Raise max number of arrays from 128 to 256 # for being able to use a higher number of postgres connections (<100) # during vaccuuming we several times had hit that threshhold kernel.sem="250 32000 32 256" </pre> ---++ Monitoring for this condition <!-- #how can this condition be recognized automatically, if at all? --> 2 checks for the number of DB connections was added to Nagios by Fabio: [[https://t3nagios.psi.ch/nagios/cgi-bin/extinfo.cgi?type=2&host=t3dcachedb02&service=PostgreSQL+DBs+active+connections][check1]] [[https://t3nagios.psi.ch/nagios/cgi-bin/extinfo.cgi?type=2&host=t3dcachedb02&service=PostgreSQL+TOT+active+connections][check2]] <pre> [root@t3dcachedb02 ~]# rpm -ql check_postgres /usr/bin/check_postgres.pl /usr/share/doc/check_postgres-2.12.0/check_postgres.pl.html [root@t3dcachedb02 ~]# grep postgres /etc/nagios/nrpe.cfg command[check_postgres_backends]=/usr/bin/check_postgres.pl --action=backends command[check_postgres_active_connections]=/usr/bin/check_postgres.pl -H t3dcachedb02 -db chimera -u postgres --timeout=1000 --action=custom_query --critical='190' --warning='180' --valtype=integer --showtime=0 --showperf=1 --query="SELECT count(*) as result, count(*) as performance FROM pg_stat_activity;" </pre>
IssueForm
Affected Service
dcache, srm
Symptom summary
too many pgsql connections, error 53300
Reason Understood
yes
Solution Exists
yes
Obsolete
no
This topic: CmsTier3
>
WebHome
>
AdminArea
>
IssueDcacheTooManySQLConnections
Topic revision: r4 - 2013-02-27 - FabioMartinelli
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback