How Many VHDs can I Store in a Single CSV Drive in Windows Server 2008 R2 Failover Clusters?

This question has come up in recent email conversations between the Cluster MVPs and the Microsoft Cluster team. There are several concerns, such as:

  • A single disk in Windows will use a single command queue.
  • If you have several VHDs in the same CSV, then they will all share the same disk command queue, and the performance will possibly be impacted in a very negative way.
  • It is possible to have multiple VMs that are disk intensive and the disk subsystem is not able to support the needs of the VMs.
  • It is possible that a chkdsk (because of a dirty shut-down) may be required, and that means that all of the VMs in that CSV will be unavailable until the chkdsk is completed.
  • Microsoft has tested up to 384 VMs per host machine (assuming it has the resources to support it), so anything above and beyond that on a single host may be a support problem. I am sure that a host can handle more, it just has not been tested yet.
  • Another currently discussed limit (discussed at TechEd 2010) is a maximum of 1,000 VMs in a cluster.

So, what do we need to do?

  1. Know your IOPS requirements for each VM and make sure you properly allocate VMs on the CSVs so that they can receive the resources necessary to run smoothly.
  2. You need to know the performance profile of your applications when running them on physical hardware, so it makes perfect sense that you need to know your applications’ requirements for the VMs before combining them on the same host.
  3. Don’t forget about your VSS backups and your Snapshots.
  4. Test, test, and test some more as each implementation will be different.

Several of the experts have recommended 8-10 VMs per CSV as a rule of thumb, but it will absolutely made a huge difference based on the IO profile of each application running on the VMs.

Question: "How many VMs can we put on a CSV?"

Answer: "No more than 384 for support reasons, but it really depends on the VMs, the hardware being used, and the performance SLAs for your individual applications running on the VMs."

UPDATE: At TechEd 2010, the cluster team released some new information. Basically, they are re-iterating the information that was discussed above and they have also included some new constraints to the support.

Number of Nodes in Cluster

Max Number of VMs per Node

Max # VMs in Cluster

2 Nodes (1 active + 1 failover)

384

384

3 Nodes (2 active + 1 failover)

384

768

4 Nodes (3 active + 1 failover)

333

1000

5 Nodes (4 active + 1 failover)

250

1000

6 Nodes (5 active + 1 failover)

200

1000

7 Nodes (6 active + 1 failover)

166

1000

8 Nodes (7 active + 1 failover)

142

1000

9 Nodes (8 active + 1 failover)

125

1000

10 Nodes (9 active + 1 failover)

111

1000

11 Nodes (10 active + 1 failover)

100

1000

12 Nodes (11 active + 1 failover)

90

1000

13 Nodes (12 active + 1 failover)

83

1000

14 Nodes (13 active + 1 failover)

76

1000

15 Nodes (14 active + 1 failover)

71

1000

16 Nodes (15 active + 1 failover)

66

1000

Again, it is vital that we understand that increasing the number of VMs per node and per cluster just makes it even more important that we understand the resource requirements of our VMs and keep a close eye on IOPS since it is the biggest constraint that we face.

Advertisements
This entry was posted in Clustering. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s