I.T. Proctology: 03/01/2009

Monday, March 23, 2009

HA VMs and the Failover Clustering wizard

Here is a gotcha that just came out of the Hyper-V forums.

I have been assisting a person with a problem that he was having with his VM shaving to failover as a group. We walked through the most common issue; the VMs share a single storage LUN. But the issue persisted. So we went backward.

In the end it was all about the process of making a VM Highly Available in the Failover Clustering manager.

Now, the process has been well documented and blogged about - so I am not going to show screen shots or outline the process. But I will mention the gotcha.

Here is rule of thumb that us crusty clustering folks don't think of anymore. We just do it, and in doing that I didn't even think about the root of the issue that this individual ran into.

Here is the rule of thumb: One run of the Failover Cluster "add resource" wizard equals one Highly Available resource.

You might think, well yea. And for those of us that began with Clustering under NT4, this was a requirement. However, with the ease of use of the new wizard in Failover Cluster Manager I just don't consciously think about it anymore.

Why is this an issue?

Ah, here is the scenario.

I open the wizard to make a number of VMs Highly Available. To safe some time, I add multiple VMs and complete the wizard. I think, great, I am done.

Now, on the backside, Failover Clustering has actually just grouped all of these VM resources together as a single Highly Available entity that is composed of many workloads.

This is where Failover Clustering takes over, and Hyper-V is just an engine.

Failover Clustering deals in 'workloads.'

And in the workload world that can be a website, plus a COM service, plus a LUN, plus a database server - all distinct entities, but all dependent upon each other, and in Failover Clustering you would set these up as a single workload and that they would come online in a specific order.

When we talk VMs - Failover Clustering only has a rule that says, 'Oh, a VM is composed of a configuration file, plus a VHD, plus the volume the VHD resides on' and it is the logic in the wizard that takes care of setting all three of these items as a single Highly Available workload.
(Go ahead; take a look at the details of a Highly Available VM).

The point is that each VM must be set up individually. One invoke of the wizard = one VM.
If you add multiple VMs at the same time then Failover Clustering considers them multiple components of the same workload and will keep them together and fail them over between hosts as a single unit.

This can lead to all kinds of confusion. Such as: why can't i have each VM on its own host? Why can't i fail them over individually?

One problem is the LUN - when multiple VMs share a single LUN, but that is not the issue here.
When might this be something that you want to do?

You would want to do this if you had VM entities that were related or required each other or required an internal virtual network to communicate.

One example is an IIS server that must sit behind a separate VM firewall. In this case, make them fail together, as a single workload.

In the end, I hope this helps broaden the understanding of how Hyper-V extends and depends on other Windows services to provide features.

Saturday, March 14, 2009

Migration types de-mystified

Recently I have been trying to help folks out with understanding the infrastructure required for various type of migrations using SCVMM and Hyper-V.

There is Network Migration, Quick Migration, SAN Migration, and soon - Live Migration.

Most people get confused when they start talking about infrastructure and what is required for each to work. Then someone mentions the VDS Hardware Provider and Windows Storage Server, and the discussion usually goes to 'Why do I require Storage Server? I don't get it.'

My quick and dirty response is:

The hitch is SAN Migration. Quick / Live Migration is easy - Failover Clustering does that.

SAN Migration requires that you have a SAN and SAN management software that hooks into VDS (the Virtual Disk Service).

You install the SAN management agent on the SCVMM server.

It is here where most folks being talking about Storage Server - as it has a VDS Hardware Provider (SAN management agent) that is VDS capable.

Like I mentioned - Quick / Live Migration is easy - it is built in, it is SAN Migration that requires infrastructure.

Now, let me get into the greater details of each type of migration and who is performing it. (yes, these are marketing terms)

Network Migration and SAN Migration are specific to SCVMM.

Network Migration is the act of copying a VM over the wire, using BITS between two points (either between the Library and a Host, or a Host and a Host).

SAN Migration is the act of moving a VM between two points by detaching and reattaching a LUN (Library / Host or Host / Host)

The requirements for SAN Migration are: One VM per LUN, the SAN can be managed by SCVMM through VDS Hardware Provider software installed on the SCVMM server, all entities can talk to the SAN. SAN Migration generally involves Fiber Channel SAN connections.

Quick Migration and Live Migration are specific to Hyper-V.

Both use Failover Clustering to move a VM between two Hosts. (The SCVMM Library is not involved at all). All the requirements of Failover Clustering apply (shared storage, similar config, similar hardware, etc.).

In both cases the VM must be manged by Failover Clustering (which is included in all flavors of Hyper-V), also referred to as making the VM Highly Available.

Quick Migration is available with the v1 of Hyper-V. When a Quick Migration is triggered, Failover Clustering saves the VM, moves the ownership to the failover host, then starts the VM on the new host.

Live Migration will be available in the R2 release of Hyper-V. It is very similar to Quick Migration except that the VM is not saved - it is running during the entire operation. I am not going to go into the details in this post.

I hope that helps a few folks with clearing up the confusion between the terms, the high level technicals, and the infrastructure that you might need.

Friday, March 13, 2009

Damn Small Linux as a VM for Capacity Testing a virtual managment system

Here is a little something that I did over a year ago.

My objective was to look at virtual machine management systems (such as SCVMM and Virtual Center) and to consider how many VMs they could manage.

The potential items that you can run into are many.

1) Overhead on the management system just because of the number of VMs.
2) Limits to the number of VMs on a single host.
3) Host storage

And this is without actually powering on a single VM.

Now, begin to consider performing actions on virtual machines and we get into more complexity:
1) How many VMs can a host support (the RAM implications)
2) Can we deploy more VMs than can be booted.
3) How long do the boot operations take?

And I could keep going.

When I was first challenged with this I tried to make it as simple as possible. A single VM with an operating system installed to a virtual disk (I would concern myself with PXE VMs later on). And, I only had three servers (one I needed for the management server, so I couldn't use it as a host).

I began looking at various operating systems (Windows, Linux, etc.). All failed due to the size of the virtual disk. However, I did fine one really good candidate. And there are a couple different ways that you can set it up.

This is where Damn Small Linux comes in.

By default DSL boots from an ISO, detects the hardware and runs.
The only hitch now is installing, and RAM.

In this scenario the DSL VM can be created in two different ways.

1) As many VMs sharing a common boot ISO file. (one boot ISO per host).
2) As many single VMs with unique virtual disks.

In my case I began with option 1 but later switched to option 2.

In the case of option 2, there is a special option to install DSL to the hard disk. I found that here:
http://www.damnsmalllinux.org/f/topic-3-5-18888-0.html

And, through trial and error I learned that you can set the RAM on a DSL VM as low as 24Mb of RAM and the VM would still boot. This gave me many possibilities. My requirement was that the VM was able to boot, and that it could run without crashing. The VM ran better at 36Mb of RAM and really good at 128Mb. But for density, I went to as low as I could.

What did this end up giving me?
(other than claims that - I did it!)

This gave me an environment with a virtualization manager server and 1500 virtual machines.
At the time that I did this, just the act of having that many VMs caused a great deal of overhead on the management system.

And I also learned that there was another system that only allowed 355 VMs per host (an undocumented feature) and the host would 'fall silent' as soon as you deployed VM 356 - the host didn't crash, but its internal management daemon did, and then refused to start. I rebuilt a few hosts before I narrowed down that problem.

Wednesday, March 11, 2009

Full Hyper-V Patch List

Here is a simple and short one that I think many will find useful.
Live Search threw this at me today.

The Comprehensive List of Hyper-V Updates:
http://technet.microsoft.com/en-us/library/dd430893.aspx

Wednesday, March 4, 2009

Hyper-V and VLANs

Hyper-V supports VLAN tagging, it has been brought to my attention that some folks might not be familar with this concept; lets cover the basics VLAN tagging and how it works and relate it to Hyper-V.

If you are not already familiar with VLAN tagging; it can be described as a way to segment traffic on your network. VLAN is just an abbreviation for Virtual LAN. Is this specific to Hyper-V? No. VLAN tagging has been around for a while now. And was origionally implemented on physical switches and routers before it first appeared in virtual switches.

A Virtual LAN is accomplished by adding a VLAN tag to the header of an IP packet. In most environments this action is actually performed by a network switch.

Before switch and router vendors implemented VLAN tags, it was more common to acheive traffic isolation through subnetting (creating different subnets) or through physical traffic isolation (hard wired isolation).

The important part to know about VLAN tags is that for them to work, all of your networking infrastructure must know how to handle them - since they affect the routing of the packets and are part of the TCP/IP packet itself.

Therefore, your routers and your switches must know how to evaluate a VLAN tag in order to determine where to route a particular packet.

There are many ways to implement VLAN tags in an infrastructure - the most common is to apply a VLAN tag to a switch port. In this model the switch adds the VLAN tag to all IP packets that flow through that port.

Just like physical switches, the virtual switches within Hyper-V can do this the same way.

In the virtual machine setting dialog you can assign a VLAN tag to the virtual machine. What this does is actually apply that tag to the port of the virtual switch in the same way a physical switch does.

The only difference is that in the virtual world, the setting moves with the VM so you might not realize that the VLAN tag is actually a setting that is used by the virtual switch that runs within Hyper-V.

Now, can this be used to route traffic between virtual machines? Yes. Even if those virtual machines reside on different hosts.

(If you have VMs on the same host that need to talk privately, use and internal or private network).

However - when the VM traffic leaves your Hyper-V host, it is now traveling on the physical wire. When it passes through a physical switch or router - those devices must know how to deal with a TCP/IP packet with a VLAN tag.

This is where the physical and virtual networking must support each other. If you decide to implement VLAN tags, then you are doing it across both the virtual and physical network devices.

Do you have to use VLAN tags? Not at all.

Like I mentioned, before VLAN tags we used subnetting, or physical isolation - all of these scenarios can be used.

Be sure to talk to your networking counterparts before you begin going down the road of VLAN tags as it can get complex pretty quickly.