Tuesday, July 31, 2007

ClickOnce - or: how to instafuck your whole installed base with one click of the mouse!

Fuck me that was fucked.

We are rolling out a new version of a program which is updated via ClickOnce. We currently (luckily) have about 12 of 80 machines installed - the rest are using the old, manually updated software. I am actually pretty glad that we had today's problem today and not in a couple of days time when the whole thing was rolled out - _that_ would have been a major shit fight.

This morning I got a request for changes. Nothing huge. Thought it would be a nice demonstration for the live update feature of the new system. 10 minutes to code. Took my time testing, 1 hour. Published the changes onto the beta site, let the test machines auto-update. Cool. I sign off on the mods, the client give the go-ahead for live. I publish on the live server. Sweet. Now I keep a test machine connected up to the live publication just so I can have a last minute test to make sure all is well. I go and run the client on that machine. The auto-update barfs. Crap. I go to look in the program group for the program. The machine freezes. Crap crap. I reboot and try again. No go. Now I am starting to get a little worried. I connect up to one of the client's live machines which is not being used at the moment and let the auto-update run. Barf. Crap crap crap. Machine freezes. Fuck. I try to uninstall the program, the item does not disappear from the control panel. I try again. Freeze. Reboot. Uninstall. Okay. Now I reinstall the new version, and it works. However the calls have started to come into the call centre... “We logged out and logged in again and it did its auto-update and now the computer is broken”. Fuuuck. So the technicians get to work some overtime uninstalling the broken update then reinstalling it from scratch, and I get to do some overtime to work out why the fuck it did not work.

After a 30 or so re-deployments in different configurations I find out that some kind soul has installed Windows Installer 3.1 on my testbed. I add it as a prerequisite so it gets installed first - now the auto-update works. So that is why my initial deployment test did not fail - there seems to be some kind of problem with the default version of Windows Installer, and I didn't see it because the testbed already was using a later version. I go check in with the technicians and we run a test on a machine that was not yet fucked up - installed Windows Installer 3.1, then let the auto-update run - no problemo. Fuck. I wanna break someone's fingers!

So after an hour of downtime for > 10% of the system I know what we have to do - re-do the install from scratch with Windows Installer 3.1 as a prerequisite. We have to start again because it has to be installed as Administrator. I am not going to be popular with the technicians...

I used to like ClickOnce. But now I am not so sure. It is very difficult to integrate into a responsible deployment strategy. When the whole system is deployed that way, how can you test a new update? I have never seen a case where the re-install (rather than the auto-update) does not work, so you can't just say that if the clean install works the auto-update will work. So imagine that all the machines are deployed from the live site - the auto-update is kind of all or nothing, you can't point a few machines at a different server to make sure the update works. I guess you could copy the live site to a test site, install from there, then update the test site and let an update go through. There is also no rollback. Once you let the update go, it's gone baby.

Just goes to show you - must have test machine that is identical to the live. Well, I'm off to padlock up the test machine... where did I put those thumbscrews?

No comments:

Post a Comment