Tuesday we moved our OpenSuse 11.4 users to a VM copy of the new GNOME desktop and we wiped the physical machine completely. We pulled the drives from the RAID array and put them in a new order, and then re-added a new RAID (1+0) partition. Our customizations are always placed in /u and /u2 so a few well placed tarballs and tweaks and we had a brand new machine with the same functionality as the old one.
Very sadly, the same issues have come back. I appreciated all of the ideas and tips presented in the comments area in prior blogs. But I do have a bit more information and it's very odd. This server should run 200 users easily, but as we get over about 20-30 and especially at 40 we see performance get very poor. Here is the new information: If I vi /etc/services it blinks for a few seconds and then opens. If I copy /etc/services to /tmp/services and /home/services and vi those files...it opens immediately. So there is some kind of contention or lock on the /etc/ directory. This contention seems to be the core of the problem. So many services are constantly looking at those files, and they are somehow bottlenecked. If you have any ideas to assist, the
bug report is here.
Networking is still sub-optimal; note the dropped RX packets.
eth0 Link encap:Ethernet HWaddr 00:1C:C4:93:DF:72
inet addr:128.222.99.243 Bcast:128.222.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:12191295 errors:0 dropped:73239 overruns:0 frame:0
TX packets:12731836 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5112551787 (4875.7 Mb) TX bytes:10726315924 (10229.4 Mb)
Interrupt:16 Memory:f8000000-f8012800
eth1 Link encap:Ethernet HWaddr 00:1C:C4:93:DF:74
inet addr:172.23.1.235 Bcast:172.23.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:73278 errors:0 dropped:238 overruns:0 frame:0
TX packets:8009 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:15639151 (14.9 Mb) TX bytes:827822 (808.4 Kb)
Interrupt:17 Memory:fa000000-fa012800
There are a few more steps that we can take, including installing the 3.0 kernel to see if that helps. We might have to start looking at other distributions if we don't see progress soon.
Other projects continue: Writing some code for the support portal, testing NX, looking at upgrading our Moin Wiki, WiFi upgrades. I also have been poking at some ideas to improve NX performance to our thin clients, will blog about that if it pans out.