Go to
previous page /
next page of Tier3 site log
16. 10. 2008 Out of memory problem impacting t3wn06
Ganglia and other services have been impacted since a few days. Seems that an out of memory condition due to a user process could have been the source of the problem.
- Need to shield the system better (queue limits)
- Need to establish a daily checking of sensors by the admins
excerpts of
/var/log/messages
:
Oct 14 12:53:55 t3wn06 kernel: Node 0 HighMem: empty
Oct 14 12:53:55 t3wn06 kernel: Swap cache: add 510281, delete 510031, find 23/45, race 0+0
Oct 14 12:53:55 t3wn06 kernel: Free swap: 0kB
Oct 14 12:53:55 t3wn06 kernel: 4325376 pages of RAM
Oct 14 12:53:55 t3wn06 kernel: 220750 reserved pages
Oct 14 12:53:55 t3wn06 kernel: 67679 pages shared
Oct 14 12:53:55 t3wn06 kernel: 251 pages swap cached
Oct 14 12:53:55 t3wn06 kernel: Out of Memory: Killed process 18107 (MarkovChains.ex).
Oct 14 12:53:55 t3wn06 kernel: oom-killer: gfp_mask=0xd2
Oct 14 12:53:55 t3wn06 kernel: Mem-info:
.....
Oct 14 12:53:56 t3wn06 kernel: Free pages: 28232kB (0kB HighMem)
Oct 14 12:53:56 t3wn06 kernel: Active:2248797 inactive:1829427 dirty:0 writeback:0 unstable:0 free:7058 slab:4306 mapped:4
077596 pagetables:9275
Oct 14 12:53:56 t3wn06 kernel: Node 0 DMA free:11648kB min:12kB low:24kB high:36kB active:0kB inactive:0kB present:16384kB
pages_scanned:0 all_unreclaimable? yes
Oct 14 12:53:56 t3wn06 kernel: protections[]: 0 0 0
Oct 14 12:53:56 t3wn06 kernel: Node 0 Normal free:16584kB min:16612kB low:33224kB high:49836kB active:8995752kB inactive:7
317196kB present:17285120kB pages_scanned:25577838 all_unreclaimable? yes
Oct 14 12:53:56 t3wn06 kernel: protections[]: 0 0 0
Oct 14 12:53:56 t3wn06 kernel: Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
pages_scanned:0 all_unreclaimable? no
Oct 14 12:53:56 t3wn06 kernel: protections[]: 0 0 0
Oct 14 12:53:56 t3wn06 kernel: Node 0 DMA: 2*4kB 1*8kB 1*16kB 5*32kB 3*64kB 2*128kB 3*256kB 0*512kB 0*1024kB 1*2048kB 2*40
96kB = 11648kB
Oct 14 12:53:56 t3wn06 kernel: Node 0 Normal: 0*4kB 1*8kB 0*16kB 2*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 4
*4096kB = 16584kB
Oct 14 12:53:56 t3wn06 kernel: Node 0 HighMem: empty
Oct 14 12:53:56 t3wn06 kernel: Swap cache: add 510281, delete 510031, find 23/46, race 0+0
Oct 14 12:53:56 t3wn06 kernel: Free swap: 0kB
Oct 14 12:53:56 t3wn06 kernel: 4325376 pages of RAM
Oct 14 12:53:56 t3wn06 kernel: 220750 reserved pages
Oct 14 12:53:56 t3wn06 kernel: 67642 pages shared
Oct 14 12:53:56 t3wn06 kernel: 251 pages swap cached
Oct 14 12:53:56 t3wn06 kernel: Out of Memory: Killed process 18090 (489).
...
Oct 14 12:53:58 t3wn06 kernel: Out of Memory: Killed process 3503 (gmond).
...
Oct 14 12:53:59 t3wn06 kernel: Out of Memory: Killed process 9646 (rpc.statd).
...
Oct 16 08:29:43 t3wn06 kernel: statd: server localhost not responding, timed out
Oct 16 08:29:43 t3wn06 kernel: lockd: cannot monitor 192.33.123.26
Oct 16 08:29:43 t3wn06 kernel: lockd: failed to monitor 192.33.123.26
--
DerekFeichtinger - 16 Oct 2008
Go to
previous page /
next page of Tier3 site log