Wednesday, February 29, 2012

250 Concurrent Users, Tuning and Citrix

Over the last week the server loads have grown and it's interesting to see how well it's running GNOME and all of those sessions. The shot below is how it looks right around 250 concurrent users, and for the most part everything is working well. I've marked some areas in color.

I'm seeing a high level of communication between polkit and dbus (purple); I'm not sure what's happening here but I'll try and sniff it with dbus-monitor and see exactly what's chewing CPU. This activity is not slowing the machine terribly.

Evince-thumbnailer had a feature added that is not in OpenSuse 11.4 that terminates the thumbnailer after about 5 seconds if it cannot finish. On certain PDFs, it appears that it either hangs or that the PDF is so huge it takes more than a few minutes to complete. This is causing a small spike in CPU. As users navigate, these thumbnails get cached so we'll see fewer and fewer of them as time goes by.

Host based Citrix sessions are chewing CPU as I have mentioned previously, the canvas rewrites are expensive.. As marked in red, you can see that they're collectively using a good amount. More information below on this project.



The server has 64G of memory and right about at 250 users it's starting to begin to have to write some cache files to disk and there are sometimes very short pauses a few times a day while this happens. We're going to throw in a few more memory sticks and that should allow us to run closer to 300. This is not a serious problem, and is only happening for a few seconds a few times a day and only under very heavy conditions.

The users have discovered ways to leave "dead" gdm child processes behind that don't seem to halt on their own. I suspect people are doing things like powering off thin clients with sessions running, starting to log in and then turning off thin clients and also allowing the server to sever their connections because they don't log off at night. Our server kicks users off after 13 hours. I'm writing a little script that will run nightly and remove these processes. They aren't affecting speed, but probably are consuming a bit of memory.

As mentioned, I'm making progress with running Citrix locally on the thin clients. When the user picks a MS Windows application that uses Citrix, a signal is passed to the thin client and Citrix is then initiated and forms a connection to the Windows server. Once this is handed off, no additional resources are consumed by the GNOME server and the users have a compressed stream directly to the thin client; no more X11 traffic running over the network. Early testing already indicates this will run faster. I'll mount a section of memory for the Citrix cache so they aren't hammering the flash drives of the thin clients and this should give them optimal speeds.

Other concurrent projects: Connecting LibreOffice to our infrastructure and testing it fully, reviewing Evolution crashes and working with Novell on fixes.

Tuesday, February 21, 2012

GNOME Migration Almost Complete, New Projects

It's been a busy, but productive few weeks. We have slowly been flashing more and more thin clients with the latest release of customized software and it's going very well. This morning the upgrades were performed on the Police department and they are the last major group to bring over to the new servers. Today we hit 210 concurrent GNOME users, and the "top" is below. It's very interesting to watch how memory is handled as the capacities grow and it's doing a great job. It tried to keep user processes and active disk IO in memory. When physical memory was getting low, it started to write disk IO out and then take cache memory away and use it for user processes. Very cool, and the server is running like a champ! Anyone that in years past ever had to tune SCO Unix on Intel hardware and recompile the kernel over and over again will appreciate how this all works. :)



I marked in red that we are seeing higher CPU usage on Citrix sessions, which are still running on the host. The canvas repaints of Citrix appear "expensive". So I have opened a new R&D project to begin the process of developing our next thin client upgrade; and this one will feature a local Citrix client. This will move all of this CPU activity to the local workstation and drop the server load. Running clients locally is always done with caution because once done, that means all updates (and exploits) have to be pushed to 500+ devices instead of being done once on the server. But in this case the benefit is just too great -- We have another 1-2 years of Citrix usage.

Our thin client build right now runs on the HP 5725, 5735 and 5745 thin clients and I spent some time last week getting it to understand and run in a VMware player. NX/Nomachine does a great job with compression, but for those few PCs that we have on our high speed fiber optic network there isn't really a reason to use this technology. The server needlessly has to run a whole X session, when it could be offloaded to the native (Windows, Mac) operating systems. So we are going to experimentally put a few test users on this new build and see how it works. Almost all of our users are on thin clients, but there are maybe about 25 PCs and Macs on the network for various functions and they'll benefit from this technology change. They also would get local RDP (and future Citrix) sessions which will run faster than the current host based design.

All in all, it's been a great few weeks. Next up for me: Working with Novell on some technical advances in Evolution and also moving ahead with converting from OpenOffice to LibreOffice.

Wednesday, February 08, 2012

135 Concurrent Users And Growing

So we're into the migration process to the new GNOME server and are hitting concurrent users loads of around 135. We pushed the thin client updates Monday night and found one tiny regression Tuesday. It was quickly fixed and we re-pushed to that department and then upgraded yet another department last night. So far no major issues in scaling. The upgrade process will conclude on the 23rd, lots more devices to go. Once you get past the initial tuning issues, it's amazing how well this stuff scales.

For those interested, here is top running with the 135 users....very nice.

Monday, February 06, 2012

And One More Desktop Change

This is really a small change, but thought I'd publish it in case it's useful to anyone else for deployments. Users really, really struggle with three areas: File types, File Size and Folder Locations. I've blogged many times about steps taken to make interaction with files simple and not require file managers. Now that the major technology pieces are finished for us to go live, I can get back into some ideas that have been in my head. I had development something like this for use with our USB sticks on the thin clients, and brought it over to the GNOME desktop. When you double-click on an image now, it allows you to change the size before it's placed into the clipboard. They can shoot in 10megapixel all they want, and with a single click make it "email friendly" for delivery. Having people go into GIMP and reduce and then copy and paste is just way too many steps.

In the shot below the tux picture is double-clicked and the MIME helper bar appears. By selecting "OriginalSize" the image is then pasted into Evolution at 1024x768. This is the default.



However, if if they want to copy it 320x240, all they do is select the ComboBox setting and then put it into the clipboard and python reduces it automatically and then it goes right into Evolution.



It's nice to have some time to make these changes, perhaps it's the calm before the storm :)

GNOME Desktop, Away We Go

After prepping and testing the new GNOME desktop, we have closed up all of the beta code and it's being moved into production. The new thin client release is moved into our FOG server and we'll be pushing it out to all users starting today. 60 of the 525 devices will get new code today and be disconnected from the old GNOME server and be moved to the new one. That should net us about 50 more concurrent users to stress test the server. Right now the machine is running great around 100 users, and I think it will absorb another 50 with no problems. Additional departments will be touched Tuesday, Wednesday and Thursday nights.

In other news, looks like SuseCON is being held this fall in Orlando. That's only about 1 1/2 hours from Largo, so seems like a good place to meet more of you.

I am a person that LOVES data. I love to accumulate it, and do research and see what types of things people are doing on the servers. I have been writing data on certain trigger events into flat CSV files and starting to connect the data to our support portal. When the infrastructure is in place and fleshed out, I'll probably move it to sqlite or something. The user detail screen has been modified to show you at a glance all information concerning our employees. The left edge shows their information and shows their active sessions. The right side shows this new data.

In the UI below, I'm showing all of the Alerts. This includes double and triple clicking on icons (which often indicates some kind of problem), logging off the server with software still running (which yields OpenOffice recovery dialogs on next login), instances of forcing software to quit and software crashes. It's very cool watching this all in real time, and we already are finding users having problems that never called. One bad cable can ruin a users entire perception of the servers...and with these tools we should be able to help locate issues.



The next tab shows all authentications, device names and technology used. Very often it's difficult to know exactly where a user is in our buildings and their current connection technique. It's now all available at a glance.



And we are collecting all data concerning *.desktop icons clicked from GNOME. We can see all of their clicks through the data. Very often users describe software to you, and with hundreds of icons it's hard to tell what they are running. Now we can see it at a glance.



Exciting times as all of this technology is pushed live and it's my hope it lives up to its potential.