Server randomly dying

Hey,

I've got a server that seems to be randomly dying and I can't figure out why. The server has been running fine for almost a year upto around 2 weeks ago.

At random times of the day, the load just shoots up (usually to around 25, but sometimes as high as 70) with no clear indication as to why.

Yesterday the server wen't down completly. It responded to pings, though nothing else responded. Nothing was logged at all during this period.


Here is as much information about the server I can find:
Processor Name Intel(R) Pentium(R) 4 CPU 3.00GHz
Vendor ID GenuineIntel
Processor Speed (MHz) 2999.720
Total Memory 1018692 kB
Free Memory 517700 kB
Total Swap Memory 2104496 kB
Free Swap Memory 1963960 kB
CentOS 4.2
Apache 1.3.34
DirectAdmin 1.26.3
Exim 4.51
MySQL 4.0.26


[root@blue]# top
10:43:47 up 13:56, 1 user, load average: 26.54, 10.34, 3.94
159 processes: 156 sleeping, 3 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 2.9% 0.0% 2.9% 0.9% 2.9% 90.0% 0.0%
Mem: 1018692k av, 1009364k used, 9328k free, 0k shrd, 9572k buff
791888k actv, 147816k in_d, 13336k in_c
Swap: 2104496k av, 1094076k used, 1010420k free 79900k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
7879 root 17 0 3276 3276 2704 D 1.9 0.3 0:00 0 exim
5 root 15 0 0 0 0 SW 0.9 0.0 0:16 0 kscand
7787 root 15 0 1192 1192 760 R 0.9 0.1 0:00 0 top
7855 root 25 10 1656 1652 1136 D N 0.9 0.1 0:00 0 imapd
1 root 15 0 112 80 56 S 0.0 0.0 0:03 0 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
3 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
6 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush
4 root 15 0 0 0 0 SW 0.0 0.0 0:05 0 kswapd
7 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kupdated
8 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
16 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 scsi_eh_0
17 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 scsi_eh_1
21 root 15 0 0 0 0 SW 0.0 0.0 1:02 0 raid1d
22 root 15 0 0 0 0 SW 0.0 0.0 0:17 0 raid1syncd
23 root 15 0 0 0 0 DW 0.0 0.0 0:52 0 kjournald
77 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd
2672 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
2673 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
3057 root 15 0 200 164 116 D 0.0 0.0 0:03 0 syslogd
3061 root 15 0 60 4 0 S 0.0 0.0 0:00 0 klogd
3087 rpc 19 0 76 4 0 S 0.0 0.0 0:00 0 portmap
3106 rpcuser 25 0 80 4 0 S 0.0 0.0 0:00 0 rpc.statd
3117 root 15 0 912 168 120 S 0.0 0.0 0:00 0 mdadm
3133 root RT 0 188 92 52 D 0.0 0.0 0:08 0 auditd
3215 root 15 0 612 236 144 S 0.0 0.0 0:00 0 cupsd
3236 root 16 0 248 4 0 S 0.0 0.0 0:00 0 sshd
3250 root 15 0 328 288 208 S 0.0 0.0 0:00 0 xinetd
3259 root 25 0 220 4 0 S 0.0 0.0 0:00 0 mysqld_safe
3288 mysql 15 0 23620 4848 1080 S 0.0 0.4 0:02 0 mysqld
3307 mysql 15 0 23620 4848 1080 S 0.0 0.4 0:03 0 mysqld
3308 mysql 20 0 23620 4848 1080 S 0.0 0.4 0:00 0 mysqld
3309 mysql 25 0 23620 4848 1080 S 0.0 0.4 0:00 0 mysqld
3310 mysql 25 0 23620 4848 1080 S 0.0 0.4 0:00 0 mysqld
3311 mysql 20 0 23620 4848 1080 S 0.0 0.4 0:00 0 mysqld

Anyone got any ideas?

Peter Verrill

 

 

 

 

Top