VMware high availabilityWith the massive number of servers running SQL Server within virtual machines (VMs), it's critically important for DBAs to understand the high availability options available when SQL Server is running within a VM. When VMware's vSphere platform is configured within a cluster at the host level, vSphere provides a certain degree of high availability, without any high availability configurations within the guest OS or SQL Server. Because the high availability options are provided by the vSphere host, high availability is provided for all versions and editions of SQL Server, from the Express edition to the Enterprise edition and from SQL Server 6.5 to SQL Server 2012.

After discussing the level of protection provided by the vSphere platform, I'll discuss the high availability features that make this protection possible. They include vMotion, vSphere Distributed Resource Scheduler (DRS), vSphere HA, Fault Tolerance, and maintenance mode.

Understanding the Level of Protection

The beauty of the vSphere platform is that every VM within the vSphere cluster gets the same level of protection automatically. However, the level of protection that's provided when running SQL Server within a VM isn't designed to handle all potential failures. For example, if the guest OS needs to be rebooted, the host-level protection offered by the vSphere platform won't offer any protection.

Where the vSphere platform excels is its ability to protect the VM from host failures and host restarts. It'll bring the SQL Server instance back up pretty quickly, although it won't protect SQL Server from terminated transactions.

Balancing Workloads with vMotion

The Enterprise and Enterprise Plus editions of vSphere include the ability to balance workloads between physical hosts. This ensures that all VMs running within the vSphere environment have the maximum amount of CPU resources available at all times. To accomplish this, vSphere automatically triggers the vMotion process to move VMs from one host to another, based on the current CPU and memory workload on each server. This behind-the-scenes process is totally invisible to the software running within the VM being moved.

Failing Over Manually and Automatically with vSphere DRS and vSphere HA

VMs that are running under vSphere can be failed over manually or automatically. Manual failovers between hosts involve manually selecting the VM and selecting the host to which it should be moved. Automatic failover can be triggered by resource constraints at the host level as well as by the failure of the host.

Manual failover. Manual failover is achieved by right-clicking the VM from vCenter and selecting the Migrate option from the context menu. The wizard that appears will give you several options, including the ability to move the VM to another host. When the wizard is complete, the hosts will complete the vMotion process. When the VM is migrated, there's no downtime or service interruption. Transactions that are in progress within the SQL Server instance will continue to be processed, without any transactions being rolled back or any error messages being sent to the end users or the application.

Automatic failover. There are two types of automatic failover within the vSphere platform. The first is the automatic rebalancing of workloads between physical host servers. This is done through the DRS system. DRS can be configured for a variety of automation sensitivities, from conservative to aggressive. When DRS is configured to the conservative setting, normal server-load monitoring remains enabled, but the DRS system doesn't take any action based on this information. When DRS is configured to the aggressive setting, the DRS system constantly monitors the CPU and memory loads on all the servers in the vCenter farm. If one of the hosts is using most or all of the host's resources, the DRS system will automatically migrate the other VMs running on that host to other machines. The end result is that the DRS system will attempt to isolate the VM that's using all of the host's CPU and memory resources. If the VM can't be isolated, as many VMs as needed will be removed from the host in order to release the CPU and memory pressure on the impacted host.

The second type of automatic failover protects against physical server failures. This portion of the vSphere system, named vSphere HA, automatically restarts VMs in the event that the server running the VMs goes offline unexpectedly. Within the vCenter configuration, administrators can set priority levels so that the VMs are brought online in a specific order. That way, the most critical servers are the first machines to come back online, with less important servers being brought online last or left offline. Machines such as Active Directory (AD) domain controllers (DCs) and database servers should be the first machines to come back online. Next, machines running Internet-facing applications such as web servers should be brought online. Back-end processing servers and virtual workstations should be the last machines to come back online.

With this automatic VM restart functionally, the VM will go offline when a physical server fails. Typically, this downtime will only be a few minutes, but the amount of downtime varies, depending on the number of VMs on the host and how the automatic restart feature is configured.

Ensuring Outage-less Failover with Fault Tolerance

Fault Tolerance is another high availability feature in vSphere. When using the Fault Tolerance feature, a VM is run on two hosts at the same time in lockstep with each other. If there's a failure on the host running the active copy of the VM, the passive machine running in lockstep takes over processing for the VM. This high availability feature allows a VM to remain online, even when the host running the VM fails.

There are several very specific requirements for using the Fault Tolerance feature. For example, the VM is limited to a single virtual CPU (vCPU) and specific versions of the BIOS are required. (The requirements vary based on the vSphere version.) However, if the VM meets these requirements, this is an excellent way to keep the VM online in the event of a server failure.

Using the Maintenance Mode

One of the great features of vSphere is that VMs don't need to be manually moved from one physical host to another when a physical host needs maintenance (e.g., rebooted, patched). What makes this possible is the vSphere system's maintenance mode. When a host server is placed into maintenance mode, the host will automatically transfer all of its VMs to other hosts within the farm. After all the VMs have been migrated to another host, the host's transition from running mode to maintenance mode is complete. At that point, the host won't allow any VMs to be run on it.

To place a physical host into maintenance mode, you simply right-click the server from the vSphere management client, then select Enter Maintenance Mode from the context menu. The vCenter client will ask if the VMs should be evacuated from the host. Clicking the Yes button will cause the vMotion process to move the VMs to other servers in the farm. After the maintenance on the host is done, you must manually remove the system from maintenance mode and place it back into an operational state.

Configuring vSphere High Availability

All the high availability settings can be found within the vSphere client application. This vSphere management tool is used for managing and configuring the entire vSphere platform. To access the high availability management settings, you need to be in the Hosts and Clusters view in the vSphere application. To get to that view, you need to select Home on the navigation bar, then select Hosts and Clusters from within the Inventory section. The navigation bar will then look like the one shown in Figure 1.

Figure 1: Checking the Navigation Bar to Make Sure You're in the Hosts and Clusters View

Configuring DRS. You need to configure DRS from within the properties of the specific cluster. To edit a cluster's properties, go to the Hosts and Clusters view. Locate the cluster from the object list on the right side of the application. Right-click the cluster and select Edit Settings from the context menu. In the Settings dialog box, select vSphere DRS, as shown in Figure 2. You can then configure the amount of automation that should be used by the DRS system as well as how conservative or aggressive the migration threshold should be.

Figure 2: Configuring DRS

Under vSphere DRS in the Settings dialog box, there are links to the DRS Groups Manager, Rules, Virtual Machine Options, and Power Management pages. You use the DRS Groups Manager page when you want to configure VMs into groups. The DRS Groups Manager feature is used to bind specific VMs to a specific host. Typically, this feature wouldn't be used unless a VM requires that the host has a specific characteristic (e.g., a device that's physically connected to the host).

On the Rules page, you can configure specific rules for DRS. You can create three different kinds of rules:

  • Rules to keep specific VMs on different hosts at all times. You typically use this kind of rule to ensure that the different nodes in a virtual server cluster (e.g., SQL Server failover cluster) are always running on different hosts.
  • Rules to force VMs to exist on the same host. You typically use this kind of rule to minimize the network traffic between different VMs, such as between an application server and a database server.
  • Rules to bind specific VMs to a specific host using DRS groups. You configure the DRS groups on the DRS Group Manager page.

On the Virtual Machine Options page, you can configure the amount of automation you want the vMotion process to use when moving VMs. Typically, nothing on this page needs to be configured other than to ensure that the automation is enabled.

On the Power Management page, you can configure the automatic shutting down and powering up of hosts to save on power costs. During periods of low resource utilization on the hosts, VMs are consolidated into a few hosts, then the hosts with no VMs are powered down. When the workload increases again, the hosts are automatically powered back up and the VMs are moved to the hosts that were just powered up to ensure that enough physical resources are available to the VMs.

Configuring vSphere HA restart priority. Configuring the VMs' automatic restart priority is done from the Settings dialog box shown in Figure 2. Under vSphere HA, you need to click the Virtual Machine Options link. On the Virtual Machine Options page, you can set each VM to a restart priority of Disabled, Low, Medium, High, or Use Cluster Setting. If you add new VMs and their priorities differ from the cluster setting, you need to manually adjust the restart priority settings.

Configuring Fault Tolerance. To enable the Fault Tolerance feature for a specific VM, right-click the VM, select Fault Tolerance from the context menu, and select Turn On Fault Tolerance on the pop-out menu, as shown in Figure 3.

Figure 3: Configuring Fault Tolerance

The warning shown in Figure 4 will then appear. This warning tells you that by enabling the Fault Tolerance feature for the VM, any disks that were thin provisioned will be completely zeroed out, causing the VM to take up the full amount of space for which it was configured. In addition, the VM will be reconfigured with a memory reservation so that the VM will always have the memory available when the VM restarts. After clicking the Yes button in the warning dialog box, the Fault Tolerance feature will be enabled for that VM.

Figure 4: Understanding the Warning When Using Fault Tolerance

The vSphere High Availability Advantages

Using vSphere's built-in high availability options has its advantages. Many organizations have small applications (e.g., server management applications, server monitoring applications) that use SQL Server Express databases. The Express edition has no high availability features by default. By hosting such applications within a VM in an enterprise-class vSphere cluster, they can have high availability. Similarly, vSphere's built-in high availability options can be of benefit to organizations that don't have the budget for high-end SQL Server editions for implementation projects or that don't have the budget for software upgrades or edition changes of older applications. In all these cases, vSphere's high availability options will keep the applications' services online. However, there might be failed transactions, just like there might be if you used SQL Server's native high availability solutions.

From the DBA's point of view, the advantage of using vSphere's built-in high availability options is that there's a single server running a standalone instance of SQL Server. Because the high availability configurations are handled at the host level of the virtualization environment, the DBA doesn't need to deal with more complex high availability solutions, some of which are available only in the Enterprise edition of SQL Server.