Tuesday, March 27, 2012

Offloading To Thin Clients

When we work with outside vendors, it's always interesting to find out their misconceptions about what is a "thin client" and how it's run. When we call in issues, they always want to know the operating system of the thin client even though for the most part it doesn't come into play. Once you log into a GNOME server with XDMCP all you are doing is remote displaying the software. This concept for some reason seems to baffle people. Debian is on the thin client, but we aren't running Debian based GNOME.

In years past we always ran 100% of the software on the server and it's worked great. But in the last year we offloaded the RDP/Rdesktop client with success. The software to which we connect with RDP is obviously still host based, but now we run the client on the local workstation. In the case of RDP the users got a huge performance gain.

I'm reviewing all possible ideas for what should be offloaded to make use of distributed CPUs, while still maintaining our low costs of centralized servers. I have mentioned in the past that for sure we'll be doing the same thing with the ICA/Citrix client. The shot below was taken from our current GNOME server and you can see that wfica (the citrix client) chews CPU as the canvas is repainted. The server is certainly not taxed and we could get by running it in this manner; but I have a mind that tinkers and tunes and this really should be offloaded. Maybe a part of me wants to see 250 users running at 1% busy. :) For sure, it will increase the speed of the responsiveness and crispness of their UI interaction. RDP will communicate with the server instead of using X11 to deliver the presentation.

A few times a month, our users will stumble into a page that just does not play well over remote display. It's usually Flash and for some reason the player has problems and it just cannot keep up. I have always suspected that the video was encoded at a very high frame rate, but never have taken the time to verify that fact. As part of this whole process of reviewing offloading certain functions I have experimentally loaded Firefox 11 and Flash on the physical thin client. I wondered how it would work having access to the local video card.

I modified the master thin client to accept a request for starting a browser from the server side. Those with this new build can click on an icon on the GNOME server as picture below.

Firefox then runs locally. I didn't really know what to expect and the results were interesting. Firefox was slower in starting on the thin client than over the network from the big servers. I know that many people have complained about thin client speeds at other organizations; and I think this shows why. Software should be tested as both host based and local based and the right fit deployed. The local version of Firefox didn't have the crisp response time in the pulldown menus and UI interaction. It was certainly usable however. So the big test was then playing Flash content. My testing indicates that videos are not faster playing locally, and in fact might be a bit slower. In the shot below you can see "top" running on the thin client and Flash is just hammering the thin client. It's clear that Flash will consume as much CPU as it can get. I was testing on the older 5725 HP thin clients and once I pack up another beta build, I'll test it on the 5745s which I believe will provide a slightly better experience. But all in all I don't see any advantage to this design. Flash and Firefox constantly need upgrades and the devices would probably have to be upgraded every few weeks. I can upgrade the server in just minutes and it's deployed for all users; we'd need a strong case for offloading browsers to the local device and right now I'm not seeing it being worthwhile.

Current Projects: Installed and testing LibreOffice 3.5.2 and one by one crossing off items on my list for the next thin client release.

Friday, March 23, 2012

Software Portal Changes Pushed Live

With increasing workloads, I have been doing everything possible to increase what we can do remotely to eliminate having to send IT staff to users desks. Very often a big part of the problem is not knowing exactly what equipment they have. So I have reworked the "Thin Client Detail" screen in our support portal to give us more information than ever. All of these changes will also allow me to now begin adding new features for the end users.

I've always posted lots of pictures, I think they make it easy to understand how it's working. I'm sure that UI could be better, but it's working and will improve. When you bring up a thin client device it gets the server side configuration files and displays what it *thinks* should be configured on the workstation. This is done because very often the devices are powered off and now we'll be able to see how it's configured even if it's not currently running. The status line indicates that the information is based on configuration files and not the physical device. The REFRESH button is enabled in this case because it has done a ping and detected the device is powered on.

Once REFRESH is pressed, it physically queries the thin client and obtains as much information as possible. It detects version, function and other settings. It also then polls the Xserver and detects the resolution. One great new features is that the EDID is tested and it displays the exact make of the monitor. VG1930wm is an old ViewSonic monitor.

The next tab obtains the rest of the settings and detects their version. If they are running a Beta release of the thin client OS this is clearly indicated; otherwise it just displays a thumbs up symbol indicating they are current.

NX creates an interesting issue, the server thinks they are a thin client but it's really a virtual Xserver. So in this case the UI clearly shows that it's NX and then tries to find their connection IP.

Here is another feature now implemented. Some users have the ability to rotate their monitors as they desire. So when you enter the detail screen it detects that the monitor is configured in this manner (Rotate) but doesn't know how it's currently being used.

However when the REFRESH button is pressed, it does a query and detects the current orientation and displays it accordingly along with color depth and the exact cable (HDMI) being used. Being able to see the cable will allow us to upgrade users from VGA as time allows.

Another new features that is wonderful for us is to know the exact capabilities of the monitor. Are they configured for optimal resolution or running it in the wrong aspect ratio? Does the monitor go into a higher resolution? The portal queries xrandr and gets all of the supported resolutions and displays them as a tooltip when you hover your mouse over the monitor. Previously support would have to check this by physically going to the workstation.

The last major feature that will come is the ability to get a quick thumbnail of their session and a few other cosmetic fixes...but all in all it's working well. Centralized support has been a real time saver for us.

Other projects this week: Prepped Firefox 11 for release, tested Flash 11.2 RC, checked some pages that are having problems with sound, QA'd LibreOffice , reviewed all Evolution crashers and tried to organize them into groups and reviewed log files to look for errant networking cables and bad UPSs

Next week: The next thin client OS feature set will be started now that we have the host infrastructure to support it.

Monday, March 19, 2012

Infrastructure Upgrades

As I have mentioned previously, the new GNOME servers are live and are considered "in production". I have been doing a small amount of final tuning, but really they're running beautifully. The big areas of customization beyond base OpenSuse were 1) file cleanups; lots of software leave behind files mostly in /tmp. A single user would not have any problems, but with hundreds of users it becomes unwieldy. 2) Flushing Cache; each morning I force a sync and then force all cache to disk. Certain processes seem to leak and this frees lots of memory for user processes each day. 3) User process cleanups; certain software packages leave behind errant or stray processes. No big deal with one person, but add up quickly with many users.

I'm back on infrastructure upgrades again. In the time since the last specifications were drawn, we've had a lot of new requests. Now that we're pushing all thin client updates from the server; I have started the process of simplifying screens as they appear on the thin clients. They only display for a few seconds on first reboot after update and are not seen by employees nor IT staff. They would only be used for troubleshooting.

The UI work has been done mostly in our "support portal" software. The screens below are my works in progress, please no UI nazi type comments. :) I'm placing widgets and working through the flow; but it's starting to work. One new requirement is the ability to have dual monitors each in their own resolution and possibly in differing orientation. Previously, we would only allow dual screens in the same resolution. This requires a change in the UI for configuring and then the thin clients had to be modified to understand how to accept all of these new settings. The show below shows the thin client screen as seen by our support staff that allows them to configure monitors. It will eventually poll the devices too and obtain information concerning types of cables used and manufacturer of hardware.

The Configuration tab will obtain information about the thin client, and allow you to configure it's function/purpose. It now also will allow you to configure a local RDP application. This feature will allow users to connect right to specialized point of sale type software without first connecting to the GNOME desktop. Some of our sites have users that move around frequently between devices and assist citizens, and this will help them in that process. I'm also exploring ideas to indicate to our support staff that the thin client is on the latest release. Mockup shows a thumbs down.

We're storing lots of data concerning alerts and authentications and now when you review a thin client, it shows you all of this activity regardless of specific user. This will help us find trends where a certain device is having problems. Pinched networking cables? Bad UPS? We should be able to find it easier with this data.

On the thin client side; after reboot with an operating system upgrade the UI will display the settings that were pushed from the server for 10 seconds and then reboot and use them. This is done in the wee hours of the morning and never seen;

This very simple screen shows the current thin client settings for the monitors:

Here the simple thin client UI has received the settings for running a local RDP application

This is the new UI that users see when the thin clients are powered. As before they were able to log into two GNOME desktop servers (A and B). But the circled space shows the local RDP application that displays when configured. If they don't have access to an application from their workstation, this area grays out.

Other things that I have done in the new thin client build: * Disabled XZap (Alt-Control-Backspace), * Fixed an issue with HP 5745s where the CPU was running through the code so fast that some data was not being saved correctly on update, * Continued adding support for this to work on VMWare.

I pushed an early alpha release out for testing by the end users and now am going to continue adding the rest of the features that we wanted to include in the upgrade.

I also have continued to QA LibreOffice with some beta testers and tracked and reviewed Evolution crashers on SLED 11.

Tuesday, March 13, 2012

Projects And Starting New Thin Client Release

I'm back in the office after an extended weekend and right back into projects. All of the new GNOME servers ran great while I was gone. I have started to add a few more scripts and crons to tune it further based on how it's running and processes that are being left behind by various user techniques. For the most part I consider the migration to the new desktop/GNOME server to be finished and I have already started moving into new areas.

I added a bit of code post authentication that polls the users thin clients and requests it to send a version number string back to the server. This will allow us to see if any devices were missed during the recent upgrades. 525 thin clients in multiple buildings...there are always times when a handful will escape upgrade for whatever reason. Now we'll see them. This data displays in real time in our support portal software.

We continue to beta test LibreOffice with a small group of users and we're making progress on testing and the QA process. There were some settings that were changed in OpenOffice years ago, and you kind of forget about some of them and then have to make changes. We'll build a standard template of settings for users which will be pushed into $HOME on first launch. From there, they'll be responsible for their own settings. We have found a few issues, but so far nothing major. Bug reports are being filed and it's moving along.

I have started to gather more detailed information on Evolution crashes based on the backtraces we're auto-generating. I wrote a little script that tries to group them together based on the crash location and we're finding that many of them are the same issue. A few well placed patches in SLED 11 should fix many of them at once.

I have started the process of creating the next release of our thin client software. This upgrade will improve the end user experience and also benefit IT in lowering support calls. This will be QA'd for several months before being deployed and it's still at an early stage. Here is a list of features that will be implemented:

+ Disable X-Zap; Alt-Control-Backspace resets your Xserver and we think that users are finding this by accident. No reason for it to be enabled, all it does is kick them off the server.
+ Fix configuration issue with HP 5745 thin clients; the latest HP thin clients have an odd timing issue related to receiving a FOG update that is corrected by powering off the device and rebooting. I want to hunt this down because it's causing support to have to touch a small percentage of devices after update.
+ Local Citrix: We still have at least a year left of using Citrix and right now it's running on our GNOME server. Offloading this will drop CPU cycles used on the hosts greatly and give the users a faster experience. This replicates what was done in the last release with RDP which has worked out very well.
+ Fix Xclients bug: Users are finding certain techniques that allow them to drop off the server and have running xclients that messes with their logging back in again. This sometimes is just a failing UPS and power dip -- It's enough to return them to the system chooser, but the host reconnects and their software continues to run. I'll check for clients and xkill them. Right now they have to reboot when this happens.
+ Rdesktop 1.7.1; it's out and includes some bug fixes.
+ USB Device scanning; we're going to try and use lsusb to look for devices they have plugged in that might affect our updates and cause them problems. Some people apparently are bringing in their own pointer and keyboard devices. We want to be aware of them when troubleshooting problems.
+ Local applications; We want to allow a kiosk mode to be available to start up a Windows application via RDP. This will allow them to log into point of sale software without having to log fully into GNOME.
+ VM support; the thin clients will run in VMPlayer and understand that infrastructure. This will allow us to replace NX running on computers on our high speed network. NX will of course continue to be used where we don't have enough bandwidth.
+ Photo management improvements; the simple UI that allows them to move photos into our software will be improved and be more robust.
+ Monitor support; We'll allow for monitors of different resolutions side by side, and also for a portrait monitor to work side by side with a landscape one. Right now this was disabled in order to keep the configurations simple and consistent. But there are some needs for this design.
+ The thin clients will connect to our time server and sync clocks at boot. Previously this was not a big deal because all software was host based, but we want the local apps to know the right time.
+ Local email; the thin clients will email us when they have certain problems. If the Xserver crashes, it will grab a copy of the log files before they're deleted.
+ Detecting power off; if the users power off their thin clients while logged into GNOME we really don't know they have done this. The button will detect server connection and log this activity.

I'm looking forward to these changes and the challenges ahead in implementing them.

Friday, March 02, 2012

LibreOffice QA Creates Better User Interaction

I have been testing LibreOffice 3.5.1 on our server with anticipation of a migration from OpenOffice in the next 30 days. The users are going to like the advancements made in the code base since OpenOffice 3.3 and the migration should go pretty painlessly.

This project has given me time to revisit one of the drawbacks of server based computing from the user perspective -- control over recycling power and resetting software. There are many drawbacks to running software at their desk, but one thing it does is allow them to "reboot" the computer. If LibreOffice locks on a file, all children processes thereafter are dead as well and the end user cannot "fix" this issue. On a PC they'd reboot. Doing something with a process killer is beyond the scope of what most people can do. They don't know what to kill, and how to pick and troubleshoot errant processes. When a user requests a document while another document is already open, the server really can't tell if the first process is running correctly. Consideration is also needed to the fact that we are only staffed from 7am until 5pm with people that can assist with these matters. The previous design was to give them a dialog when a second document opened and ask them if they wanted to kill the previous document. Clunky, but required to ensure off hour users the ability to fix their sessions. My mindset since OpenOffice went live was that I needed something like notify-send that had a pushbutton trigger on a timer. With resources allocated to installing LibreOffice, I created my own with Python/Glade. Instead of something intrusive, they get a popup in the lower left corner when they launch a second document. If everything is working fine, they leave the dialog alone and it closes after 5 seconds. If the first process is locked, they can click on [ Terminate All LibreOffice Sessions ] and it does exactly that.

As time allows, I'm going to see if I can file a feature request to make it so that children process of LibreOffice can "poll" the parent and see if he is alive and then reset itself if required. This would solve issues and remove the dialogs completely.

I have been revisiting other dialogs and messages on the servers and have made changes. Many of these changes have been made because we are now tracking a more information regarding how they are reacting to and using our dialogs. I'll blog more about that in the coming days.