Saturday, June 7, 2008

Updated: Ubuntu 7.10 and Gateway 6400 Server Stability Issues

Since upgrading my ancient Gateway 6400 server to Ubuntu 7.10 (Gutsy), the system has experienced a number of random lockups. I have been working on trying to resolve the problem for a solid week with limited success. I hope this article may be of some use to those experiencing similar problems with the latest Ubuntu server distribution.

My Gateway 6400 server hardware is as follows:
  • 2x 1GHz Intel Pentium III (Coppermine) CPUs
  • 1GB ECC memory
  • 1x LITEON LTN301 ATAPI IDE CD-ROM
  • 1x Python 06240-XXX 8160 DAT Drive
  • 4x IBM DPSS-318350N Fast Wide SCSI Drives (ServerWorks SCSI Controller)
  • Intel(R) PRO/100 NIC
  • RealTek RTL8139 NIC
  • Generic embedded VGA display
Prior to the upgrade, the system had been running Ubuntu 6.10 Server followed by 7.04. These were completely standard installs running LAMP, SVN, BIND and Postfix. The system was a console only affair that did not include any GUI interface packages and had been running rock solid for a year, rebooting only for kernel upgrades.

Last week I ran a distribution upgrade from 7.04 to 7.10. which completed without any errors. I also carefully documented any configuration changes suggested by the upgrade process. After reviewing and implementing a short list of application configuration changes, I re-booted the system. I logged in and performed a quick scan of the system logs looking for any problems, there were none. Thinking that another upgrade had been successful, I logged off and didn't give the system another thought until...

I tried logging on to my web mail application the next morning and was greeted with "unable to establish connection to server". A quick ping of the server resulted in zip, nothing. The lock was hard enough that one could not login using the local keyboard and monitor (no virtual consoles either). A hard reset brought the system back up and again the logs revealed nothing. The next several days resulted in totally random lockups, with system staying up as long a two days and as short as a few hours. I added a cron job to monitor system memory and disabled all my server applications. The result was no apparent memory leaks and the system was still freezing at random intervals.

I tweaked some of the kernel parameters I had used with this system in the past:
ro quiet splash agp=off acpi=off pci=noacpi apm=off
Once again this made no difference. Somewhere I ran across a lengthy thread regarding the stability of the kernel (2.6.22-14) that shipped with 7.10. I downloaded the latest stable kernel (2.6.23-8) and now have the server running on this version. Time will tell if this fixes my stability issues and I will post an update with the results in the next few days.

Update - 12/01/2007: The kernel upgrade has not solved the stability issues with my server. This time the system went 6 days before locking up. I did a hard reset, the system came up, ran for 2 about minutes and locked again.

Update - 12/07/2007: Re-installed the old 2.6.20-16 kernel that shipped with 7.04 (Feisty). This may be a possible solution as it worked for at least one Gutsy user. I will post an update with the results in the near future.

Update - 12/17/2007: 10 days uptime without a single glitch! I believe that I can confirm this as a kernel issue. Something has changed since the 2.6.20 kernel that causes hard locks on my server. Hopefully the Ubuntu kernel developers have a fix in the pipeline as I am not the only one experiencing this problem.

Update - 6/7/2008: Since upgrading to Ubuntu Hardy 8.04, I am no longer experiencing any hard kernel locks on my server. The new 2.6.24 kernel appears to have addressed the problems that I was having in the past allowing me to remove the old 2.6.20 kernel that was installed as a temporary solution.