VMware ESX/ESXi 3.5 Update 2 users around the globe are experiencing huge problems with their infrastructure. A bug in the latest version of the VMware enterprise virtualization software might render thousands, if not more, virtual machines unbootable.
This incredible situation is caused by a licensing bug, that basically expires all licenses on August 12th, 2008. When a VMware ESX lisense expires like this the implications are that VMotion and DRS no longer works and perhaps more serious in the short term is that you will not be able to power on any VMs that are powered off. Rebooting a running VM from inside the guest OS should work fine, but do not try running a virtual power cycle on any running VMs.
VMware are working on a fix that should be be available for download by noon, PST on Aug 13th.
VMware KB 1006716 should have more info on the issue, but is not available at the time of this post. Perhaps the VMware KB is running on a VM hosted on a VMware ESX/ESXi 3.5 Update 2 itself? Most likely it's due to traffic issues though.
A suggested workaround for the issue is to disable NTP on the ESX hosts and manually set the date to some date before the 12th. I would not recommend this unless you really have to, as you might get into a world of hurt if your VMware Tools are set to sync time with the Host server. Imagine a Windows 2008 Server Domain Controller that suddenly gets it clock set to an earlier date. Active Directory does not enjoy this.
For now, if you are running ESX/ESXi 3.5 Update 2 keep your VMs running and do not power them off. Hopefully you don't have critical VMs turned off anyway, so the impact is likely to be small unless you have a critical need for DRS/HA. For those of you that haven't upgraded to ESX/ESXi 3.5 Update 2 yet, wait until the new installation media is available. I downloaded the media kits yesterday in preparation of my upgrade here at work, I'm very happy I didn't do the upgrade last night like I planned to.
Update 1:
VMware has issued a statement outlining their current status and what to expect in the time to come. They also promise an update ever two hours.
The problem is caused by a build timeout that was mistakenly left enabled for the release build.
as well as an express update
VMware Engineering has isolated the root cause and is working to produce an express patch for impacted customers today. The target timeframe is 6pm, August 12, 2008 PST.
For full details, read the whole update at the source.
Update 2:
The promised express patches has now been released. More details and downloads available at vmware.com.
New iso and tarballs are to be released sometime this week, so of you are planning an upgrade hold your horses a few more days.
VMware CEO Paul Maritz has also issued an official statement on the problems customers have experienced, and how VMware is handling it.
Update 3:
New tarballs and ISO's has now also been released and are available at the link above.