vmaware.net goes live

My new site vmaware.net has finally gone live. This is my new digital playground for all things virtualization and related topics.

Not much content yet, but that is due to change pretty quickly. :)

---

Update: Well, turns out vm-aware.net was already occupied by someone with very similar interests as myself, so I've renamed the site to vNinja.net to avoid confusion. Enjoy!

July 21, 2010 at 12:12pm | 1 Comment
Tagged: , , and

New Photo Blog: blog.opticalpork.com

In an effort to try and clean-up my online presence, I've set up a new site blog.opticalpork.com. This site will, from now on, be my main photography/portfolio site.

This is the first of several planned changes in how/where I publish my online content, more stuff to come later on!

June 6, 2010 at 12:55am | 0 Comments
Tagged: , , and

Tape is dead, long live tape or how water can ruin your weekend

A little while ago, disaster struck. What seemed like a normal day at work, suddenly turned into a frenzy I have yet to experience anything similar to.

What happened? We realized something was wrong when we lost contact with one of our non-virtualized servers.
I couldn’t contact it at all; it had just vanished from the face of our network.

My natural reaction was to run into our server room, to check what had happened. I figured it would be a power supply failure, or NIC failure.

Boy, was I wrong.

It turns out that a plastic pipe going through the wall, providing shielding for the power cables that provide power for the outdoor unit of the cooling system, led water straight into the server room. When I entered the server room, and heard splish-splashy sounds as soon as my feet hit the floor, I immediately grabbed a bucket and held it under the aforementioned pipe. While I stood there, trying to do some damage control, several other people rushed to my assistance.

As soon as there were enough hands on deck trying to get rid of the water, I grabbed the file server and brought it downstairs for some open heart surgery.

It’s a well known fact that water and servers don’t really mix that well. Even less so when the water in question flows down the walls in your server room, right on top of your main file server. That’s right; water meet server.

Of course, the very last of our non-rack based servers was located in a straight line below the pipe. Everything else was fine; the rack servers aren’t located directly on the floor, nor is anything else. We did have a good 2cm of water on the floor, but that wasn’t enough to hit the rack servers or UPS’s.

So, what was the end result? One pretty dead server. It did try to get our hopes up, and initially it did.
At first, things looked good. I removed the HDDs and the power supplies, opened the cabinet and looked for water damage. The power supplies seemed to have gotten a bit wet, which is probably why the server went MIA in the first place. Other than that, everything looked good. I still had some hope that the data on the HDDs was undamaged. Considering that I had removed the HDDs, I tried powering on the server. Any, yay, it started up, went through the BIOS OK and generally seemed like a happy little server again.

I let it run for a while with no apparent errors or hiccups, so I decided to try and boot it with disks in it again. At first, the RAID controller complained that its logical drive(s) was missing, but that was expected after I had started it without the drives in it. I tried setting the logical drive to online, but then it complained about missing information. My next move was to copy the RAID/Logical Drive information from the drives to the controller, and that worked perfectly. The server rebooted, and started without problems. I let it run for a while, no problem what so ever, it seemed we caught a lucky break and could continue running.

Sadly that was not the case, as it only lasted a good 20 minutes before the server died completely, breaking the RAID as a result. The drives died, the power supply died, and our inventory is now one physical file server smaller.
Next, restore from backup. As most small companies/IT-depts. we do backups to tape. We even have a pretty decent LTO3 based changer, and we run Tivoli Storage Manager as out backup software. As this was a physical server that was due to be replaced with a VM, we decided to restore its data to a new pre-provisioned VM. That should be a breeze, right?

As anyone that has attempted to restore large amounts of data from a tape library will attest to, things can, and will, fail. Tapes can go bad, drives can go nuts and changers can decide that they don’t want to change anymore. We experienced two of the above;

  • Bad Tape
    One of the tapes we were going to recover data from was broken, and we could not recover data from it. Thankfully TSM lets us have a copypool of tapes, so we did work around it by collecting the replacement tape from that pool.
  • Nutty Drive
    Drive 2 in the changer decided that after the initial restore job, a small subset of critical data, it wouldn’t play ball anymore. Now, TSM only uses one drive at a time to restore data with, but it does use the other drive in the changer to prepare the next tape with. So, we were reduced to all the action happening on one drive, which of course means that the restore time was significantly increased.

In the end, we were 100% successful in recovering the data from our latest backup set. We restored nearly 1 000 000 files (which also increased the restore time by a huge amount), but the entire restore process took us close to 56 hours in total.
Of course, in hindsight this whole mess could pretty easily have been avoided, on several different levels:

  • The pipe should not have been able to lead water directly into the server room.
    When we do risk assessments, do we identify problems like this? I for one did not see this one coming, and I’ve practically lived in that server room the last few years.
  • We should have installed some sort of water detection system in the server room.
    This might not have prevented the server crash, but we could potentially have identified that water was present and been able to shut down the server before it fried.
  • Why was the server still located on the floor?
    The fileserver should have been virtualized a long time ago, and plans were in place to do so. In fact, the VM that should replace it was already provisioned and semi-configured.

The most significant thing we could have done, before disaster struck, was to have a proper disaster recovery site in place. Irony has it that we got the quote on the hardware from HP, and software from Veeam, on Tuesday, two days before “the incident”. We have the DR location in place, and the lease contracts have been signed. We even have 100Mbit direct access to the DR site being installed as we speak. If this had happened a month or two from now, we would have been up and running through the whole ordeal. Of course, it could not have happened at a worse time, but when would something like this be well timed, really?

Now, we were already in the process of getting a DR site in place, so both IT and Management knew about the need for a secondary location. What surprised us though, was the sheer amount of files we had to restore from tape, and how much time it took. 56 hours is an extremely long time, especially when you are looking at restore jobs...

This means that our DR site setup, won’t be based on tape based backups. We can’t rely on tape medium as a primary medium for restore processes, it simply takes too long and is too error prone for us to base our business on. The fact of the matter is that even small businesses now have so many files and so much critical data floating around, that tape just isn’t feasible anymore. Don’t get me wrong, I’m glad we had tape backups, as we don’t really have the storage space available to do disk based backups right now.

As soon as the DR site is up and running, tape is dead as far as I’m concerned.

I’ll outline our DR site setup later, when we have it in place, but I’m definitely looking into using Virtual Tape Libraries (VTL) with dedup built-in for the new setup. And of course, snapshot based VM backups using Veeam Backup and Replication to the DR location, you know, for those really critical VMs that we can’t live without.

I for one will have backups everywhere from now on.

March 18, 2010 at 12:27am | 1 Comment
Tagged: , , , , , and

HP Proliant ML 115 G5, Windows Server 2008 and nvstor.sys

I initially bought a HP Proliant ML 115 server as a cheap test/lab server for VMware vSphere and miscellaneous rollout projects at work, but all of a sudden I needed it for some other project that required that I install Windows Server 2008 directly on the hardware itself.

As is the story with most HP Proliant servers, you should install it with the tools that HP provides. In the case of the ML 115, you can't use the normal SmartStart setup, but it's little cousin Easy Set-up CD.

The installation started fine, after running through the initial HP wizard, but when the time came to actually get the installation started it went all blue screened on me, complaining about nvstor.sys.
I knew that the Windows 2008 installation medium doesn't include support for the built-in nVidia NFP3400 SATA storage controller in RAID mode, but I wasn't running a RAID based setup on it anyway so that shouldn't cause the problem.

Next I tried installing Windows Server 2008 without using the Easy Set-up CD, in other words just plain old booting of the Windows Server 2008 installation CD and initially it seemed like it was running ok. Thats until it just stopped at 0% progress at the "Expanding files" section of the installation.

So, there I was. Using the HP tools, the installation ends in a big old BSOD, using "native" Windows Server 2008 installation it just stops without any indication on what might be wrong.

As it turns out, the solution was pretty weird. The HDD shipped with the server causes the problem (160GB NHP SATA). I have no idea how, but replacing it with another SATA drive and starting the installation again, with the Easy Set-up CD, fixed it.

The HDD shipped with the server makes the installation of Windows Server 2008 crash, replacing it with a "generic" Western Digital AV-GP 1.5TB SATA drive lets me install without problems.

Obviously the nvstor.sys driver shipped with Windows Server 2008 has problems with some drives, but not all. Imagine that a cheap server, that can run VMware ESX/ESXi right out of the box, can't run Windows Server 2008 with the HDD it came shipped with.

Now, how weird is that? Note that that wasn't tested with Windows Server 2008 R2, so the nvstor.sys file shipped with that version might not have the same problem. Also, I did not try loading newer nVidia drivers during the Windows installation procedure, because a) when using the Easy Setup CD you don't get the option to load third party drivers, and b) because after I figured out that changing the HDD helped I didn't want to try another manual installation.

Remind me again, why don't we just virtualize everything? In this instance, it would actually be easier (and quicker!) to install ESXi on the bare metal hardware, create a VM and install Windows Server 2008 in that instead of installing Windows Server 2008 on the hardware directly. How the world has indeed changed.

Update 10. March 2010:

After finishing the installation, I did run into another problem that quite possibly is also related to the nvstor.sys driver. Windows would fail in creating partitions, of the amount of space used by the partitions exceeded approximately 1TB in total.

Upgrading the server to Windows Server 2008 R2 fixed this issue, and I was able to utilize the full disk. This leads me to think that had I installed Server 2008 R2 from the get-go I would not have seen the installation issues with the original drive at all.

March 9, 2010 at 10:33pm | 1 Comment
Tagged: , , , , , , and

Windows 7 USB/DVD Download Tool Re-Released

Microsoft has re-released the previously revoked Windows 7 USB/DVD Download Tool. This time around, it's GPL licensed with source-code.

The tool has previously been released and subsequently revoked again after Microsoft was made aware that the tool, developed by a third party, included GPL licensed code in the compiled binary.

Personally I'm happy that the tool is available again, and that Microsoft "did the right thing ®" and released it with the proper license.

December 10, 2009 at 3:12pm | 0 Comments
Tagged: , , , , , , , and

 1 2 3 … 110 Next →

Recent Comments