Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups # * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup # * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> ---+!! %TOPIC% %TOC% ---++ Symptoms Summary: %FORMFIELD{"Symptom summary"}% SE cannot be accessed through srm. * In t3se01:catalina.out: =createConnection(): Got exception org.postgresql.util.PSQLException, SQLState: 53300= * In t3cachedb01:pg_sql: =FATAL: connection limit exceeded for non-superusers= ---++ Occurrences At what times did this problem occur (used to estimate frequency): | 2011-11-27 ~ 22:40 according to the [[https://t3nagios.psi.ch/nagios/cgi-bin/status.cgi?host=t3vm01][Nagios t3vm01 SRM checks]] | | 2013-02-27 ~ 14:00 | ---++ Observations <!-- #collect here the information which may help to better understand the state of the system or services, e.g. #log excerpts, strace output, etc. #this also may help to identify the problem if similar conditions arise again --> Number of connections is set up in =/var/lib/pgsql/data/postgresql.conf: max_connections=. From the [[http://www.postgresql.org/docs/8.2/static/runtime-config-connection.html][manual]] it can be seen that to raise that number it could be necessary to modify the SysV parameter =SEMMNI=. <strike>It is not clear why the 100 limit is reached. Probably, a certain number of transfers fails and leaves hanged connections on the db level, piling up until the limit is reached. A Nagios plot about this will be created (Fabio) to monitor this pile-up effect.</strike> ---++ Solution or Workaround Workaround: A restart of dcache initially was the only fix we had. Pablo Fernandez had submitted a ticket for this issue at CSCS and Gerd Behrmann provided the solution: * SRM has a feature called "srmVacuum" in dCacheConfig file that must be set to false when you do the autovacuum with postgres (default on >v8.2). * max_connections in postgres should be bigger than 100 (our case). He recommended to set it up to 1000. It seems we hit that limit during the vacuum. <pre> # ipcs -sl ------ Semaphore Limits -------- max number of arrays = 128 max semaphores per array = 250 max semaphores system wide = 32000 max ops per semop call = 32 semaphore max value = 32767 # or alternatively # sysctl kernel.sem kernel.sem = 250 32000 32 128 </pre> | SysV var name | Description | requirement | |SEMMNI |Maximum number of semaphore identifiers (i.e., sets) |at least ceil((max_connections + autovacuum_max_workers + 4) / 16) | |SEMMNS |Maximum number of semaphores system-wide |ceil((max_connections + autovacuum_max_workers + 4) / 16) * 17 plus room for other applications | |SEMMSL |Maximum number of semaphores per set |at least 17 | We added to =/etc/sysctl.conf=: <pre> # 2011-12-06 Derek. Raise max number of arrays from 128 to 256 # for being able to use a higher number of postgres connections (<100) # during vaccuuming we several times had hit that threshhold kernel.sem="250 32000 32 256" </pre> ---++ Monitoring for this condition <!-- #how can this condition be recognized automatically, if at all? --> 2 checks for the number of DB connections was added to Nagios by Fabio: [[https://t3nagios.psi.ch/nagios/cgi-bin/extinfo.cgi?type=2&host=t3dcachedb02&service=PostgreSQL+DBs+active+connections][check1]] [[https://t3nagios.psi.ch/nagios/cgi-bin/extinfo.cgi?type=2&host=t3dcachedb02&service=PostgreSQL+TOT+active+connections][check2]] <pre> [root@t3dcachedb02 ~]# rpm -ql check_postgres /usr/bin/check_postgres.pl /usr/share/doc/check_postgres-2.12.0/check_postgres.pl.html [root@t3dcachedb02 ~]# grep postgres /etc/nagios/nrpe.cfg command[check_postgres_backends]=/usr/bin/check_postgres.pl --action=backends command[check_postgres_active_connections]=/usr/bin/check_postgres.pl -H t3dcachedb02 -db chimera -u postgres --timeout=1000 --action=custom_query --critical='190' --warning='180' --valtype=integer --showtime=0 --showperf=1 --query="SELECT count(*) as result, count(*) as performance FROM pg_stat_activity;" </pre>
IssueForm
Affected Service
dcache, srm
Symptom summary
too many pgsql connections, error 53300
Reason Understood
yes
Solution Exists
yes
Obsolete
no
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r4 - 2013-02-27
-
FabioMartinelli
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback