<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> %TOC% %ICON{arrowleft}% Go to [[CMSTier3Log1][previous page]] / [[CMSTier3Log3][next page]] of Tier3 site log %M% ---+ 16. 10. 2008 Out of memory problem impacting t3wn06 Ganglia and other services have been impacted since a few days. Seems that an out of memory condition due to a user process could have been the source of the problem. * Need to shield the system better (queue limits) * Need to establish a daily checking of sensors by the admins excerpts of =/var/log/messages=: <pre %FILESTYLE%> Oct 14 12:53:55 t3wn06 kernel: Node 0 HighMem: empty Oct 14 12:53:55 t3wn06 kernel: Swap cache: add 510281, delete 510031, find 23/45, race 0+0 Oct 14 12:53:55 t3wn06 kernel: Free swap: 0kB Oct 14 12:53:55 t3wn06 kernel: 4325376 pages of RAM Oct 14 12:53:55 t3wn06 kernel: 220750 reserved pages Oct 14 12:53:55 t3wn06 kernel: 67679 pages shared Oct 14 12:53:55 t3wn06 kernel: 251 pages swap cached Oct 14 12:53:55 t3wn06 kernel: Out of Memory: Killed process 18107 (MarkovChains.ex). Oct 14 12:53:55 t3wn06 kernel: oom-killer: gfp_mask=0xd2 Oct 14 12:53:55 t3wn06 kernel: Mem-info: ..... Oct 14 12:53:56 t3wn06 kernel: Free pages: 28232kB (0kB HighMem) Oct 14 12:53:56 t3wn06 kernel: Active:2248797 inactive:1829427 dirty:0 writeback:0 unstable:0 free:7058 slab:4306 mapped:4 077596 pagetables:9275 Oct 14 12:53:56 t3wn06 kernel: Node 0 DMA free:11648kB min:12kB low:24kB high:36kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? yes Oct 14 12:53:56 t3wn06 kernel: protections[]: 0 0 0 Oct 14 12:53:56 t3wn06 kernel: Node 0 Normal free:16584kB min:16612kB low:33224kB high:49836kB active:8995752kB inactive:7 317196kB present:17285120kB pages_scanned:25577838 all_unreclaimable? yes Oct 14 12:53:56 t3wn06 kernel: protections[]: 0 0 0 Oct 14 12:53:56 t3wn06 kernel: Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Oct 14 12:53:56 t3wn06 kernel: protections[]: 0 0 0 Oct 14 12:53:56 t3wn06 kernel: Node 0 DMA: 2*4kB 1*8kB 1*16kB 5*32kB 3*64kB 2*128kB 3*256kB 0*512kB 0*1024kB 1*2048kB 2*40 96kB = 11648kB Oct 14 12:53:56 t3wn06 kernel: Node 0 Normal: 0*4kB 1*8kB 0*16kB 2*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 4 *4096kB = 16584kB Oct 14 12:53:56 t3wn06 kernel: Node 0 HighMem: empty Oct 14 12:53:56 t3wn06 kernel: Swap cache: add 510281, delete 510031, find 23/46, race 0+0 Oct 14 12:53:56 t3wn06 kernel: Free swap: 0kB Oct 14 12:53:56 t3wn06 kernel: 4325376 pages of RAM Oct 14 12:53:56 t3wn06 kernel: 220750 reserved pages Oct 14 12:53:56 t3wn06 kernel: 67642 pages shared Oct 14 12:53:56 t3wn06 kernel: 251 pages swap cached Oct 14 12:53:56 t3wn06 kernel: Out of Memory: Killed process 18090 (489). ... Oct 14 12:53:58 t3wn06 kernel: Out of Memory: Killed process 3503 (gmond). ... Oct 14 12:53:59 t3wn06 kernel: Out of Memory: Killed process 9646 (rpc.statd). ... Oct 16 08:29:43 t3wn06 kernel: statd: server localhost not responding, timed out Oct 16 08:29:43 t3wn06 kernel: lockd: cannot monitor 192.33.123.26 Oct 16 08:29:43 t3wn06 kernel: lockd: failed to monitor 192.33.123.26 </pre> -- Main.DerekFeichtinger - 16 Oct 2008 ---------------- %ICON{arrowleft}% Go to [[CMSTier3Log1][previous page]] / [[CMSTier3Log3][next page]] of Tier3 site log %M%
This topic: CmsTier3
>
WebHome
>
CMSTier3Log
>
CMSTier3Log2
Topic revision: r1 - 2008-10-16 - DerekFeichtinger
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback