Availability Groups – a hybrid of SQL Server Database Mirroring and Windows Failover Clustering – are becoming the de-facto standard for implementation of high availability solutions for Microsoft SQL Server. While the learning curve can be steep it’s also a fairly short learning curve. This article does not purport to discuss the nuances of Availability Groups (AGs) nor the processes for installing and configuring “AG’s” but rather focuses on introducing you to the Dynamic Management Objects (DMOs) useful in assessing the architecture and state of your AGs using the SQL Server Dynamic Management Views.

The Dynamic Management Objects we’ll be reviewing in this article are categorized into two areas: those focused on the server level of granularity and those that drill down into the specific state of databases inside the Availability Groups hosted in your environments.  Regardless of scope you must be granted VIEW SERVER STATE permissions in order to view results from any of these DMV’s.


Server Scoped

  • sys.dm_hadr_cluster
  • sys.dm_hadr_cluster_members
  • sys.dm_hadr_cluster_networks
  • sys.dm_hadr_instance_node_map
  • dm_hadr_name_id_map
  • dm_hadr_availability_replica_cluster_nodes
  • dm_hadr_availability_replica_cluster_states
  • dm_hadr_availability_replica_states
  • dm_hadr_availability_group_states

Database Scoped

  • dm_hadr_database_replica_states
  • dm_hadr_database_replica_cluster_states

These DMVs are consistent across SQL Server versions 2012 through 2016. SQL Server 2016 introduced two new DMV’s pertaining to the new feature/method of seeding replicas without a backup/restore process:

  • dm_hadr_physical_seeding_stats
  • dm_hadr_physical_seeding_stats

We are not covering these now but will do so in a future article where I show how to use this functionality to avoid the process of backup and restore to seed a replica – similar to how SQL Azure creates replicas on AG creation already.

We will break these down showing an overview of what each DMV provides in terms of information (columns returned) and some of the questions that each answers individually or with the assistance of other DMVs or System Catalog Views.  Some of these objects are only good for resolving object ID’s (ag_id, group_id, replica_id, etc.) to a logical name but most serve valid purposes for helping to identify synchronization state, AG health, architecture, configuration, and more. You should not feel limited by the graphical user interface offerings (GUI) inside SQL Server Management Studio. Thanks to the SQL Server Dynamic Management Views you have countless options for building your own monitoring scripts and dashboards for dealing with Availability Groups in your SQL Server environments.

sys.dm_hadr_cluster

Type: Dynamic Management View

Columns returned:

  • cluster_name
  • quorum_type
  • quorum_type_desc
  • quorum_state
  • quorum_state_desc

This DMV provides quorum information for each node in a WSFC that is supporting an AG. If the node has quorum it will display quorum information for that node. If there is no quorum no results are returned.

sys.dm_hadr_cluster_members

Type: Dynamic Management View

Columns returned:

  • member_name
  • member_type
    • 0 = Cluster Node
    • 1 = Disk Witness
    • 2 = File Share Witness
  • member_type_description
  • member_state
    • 0 = Offline
    • 1 = Online
  • member_state_desc
  • number_of_quorum_votes

sys.dm_hadr_cluster_members provides information about all the nodes in the clustering ecosystem that builds the foundations for your Availability Groups. It identifies the server name of each participating node, the type of participant (node or witness), and the state of the participant. The columns of primary interest are going to be provided through the following query – not all columns provide value.

                              
SELECT member_name
     , member_type_desc
     , member_state_desc
FROM sys.dm_hadr_cluster_members;

Be warned though, if the Windows Server Failover Cluster (WSFC) node that is participating in the WSFC doesn’t have quorum then you’ll not see a row returned for the node.

dm_hadr_cluster_networks

Type: Dynamic Management View

Columns returned:

  • member_name
  • network_sub_ip
  • network_subnet_ipv4_mask
  • network_subnet_prefix_length
  • is_ipv4

This Dynamic Management View returns networking metadata for each WSFC cluster member participating in an availability group's subnet configuration. While limited in usefulness this DMV does allow you to validate the network virtual IP that is configured for each availability replica and lets you know if it’s on an IPv4 or IPv6 subnet and whether the network is public or (hopefully) private. Based upon the standpoint of amount of rows you can expect to return for a given cluster and the purpose of queries running against this DMV to be used for general networking metadata a SELECT * query for this DMV is perfectly acceptable:

                              
SELECT *
FROM sys.dm_hadr_cluster_networks;

Due to cardinality the primary key for this DMV is a composite key of member_name, network_subnet_ip, and network_subject_name_prefix_length.

sys.dm_hadr_instance_node_map

Type: Dynamic Management View

Columns returned:

  • ag_resource_id
  • instance_name
  • node_name

This DMV exposes the instance name and WSFC node name for all instances participating in the WSFC. You’ll also see the ag_resource_id assigned to the instance. This allows you to relate a node name and SQL instance name back to any DMV exposing ag_resource_id for rationalizing the ag_resource_id into something meaningful for identification. You’ll likely never query this DMV directly, but rather as part of a larger query using it to provide a naming context when otherwise only an ag_resource_id exists.

Additionally, as Microsoft points out in their own documentation,  “this dynamic management view is useful for detecting an availability group with multiple availability replicas that are hosted on the same WSFC node, which is an unsupported configuration that could occur after an FCI failover if the availability group is incorrectly configured.”

That condition is identifiable through the following query:

                              
SELECT node_name
FROM sys.dm_hadr_instance_node_map
GROUP BY node_name
HAVING COUNT(ag_resource_id) > 1;

dm_hadr_name_id_map

Type: Dynamic Management View

Columns returned:

  • ag_name
  • ag_id
  • ag_resource_id
  • ag_group_id

This DMV exists solely to allow for linking and name resolution should the Availability Group be renamed. Just like logins and users in SQL are associated with a sid that uniquely identifies the entity (rather than the login name or user name identifying the entity) so is the case with the ag_id. Likewise as you’ll see in subsequent DMVs later in this article resources and groups that comprise the AG ecosystem – while having a name property that can be changed – also have an underlying ID value that cannot be changed and will always be used to maintain relationships between the various objects and levels of granularity in the AG. This DMV is used throughout joins that afford relationships between many of the DMVs discussed in this article. It will be rare that you would query this DMV directly but common to use it as an intermediate join between two or more DMVs. The resource_id and group_id map back to WSFC resources and resource groups.

It should be noted – because it’s confusing – that ag_id is associated with the identity of the Availability Group whereas the ag_group_id identifies the WSFC Group Id of the Availability Group.

dm_hadr_availability_replica_cluster_nodes

Type: Dynamic Management View

Columns returned:

  • group_name
  • replica_server_name
  • node_name

Returns a row for every availability replica (regardless of join state) of the Availability Groups in the Windows Server Failover Cluster.  Since the AG’s id is not exposed in this DMV you have to join to sys.dm_hadr_availability_replica_cluster_nodes via sys.dm_hadr_availability_replica_cluster_states.replica_server_name or sys. dm_hadr_availability_replica_cluster_states.node_name. Likewise you can also join to dm_hadr_availability_group_states.primary_replica on replica_server_name to get the group_id for the Availability Group or to sys.dm_hadr_name_id_map.ag_name = dm_hadr_availability_replica_cluster_nodes.group_name to also return the ag_id value. This is one area where Microsoft was not consistent in it’s Dynamic Management Object / System Catalog View naming conventions. In one DMV you see a column called ag_name – in another it’s referred to as group_name.  You can use any of these joins in reverse to humanize an exposed ag_id, say in sys.dm_hadr_availability_group_states, to an ag_name.

dm_hadr_availability_replica_cluster_states

Type: Dynamic Management View

Columns returned:

  • replica_id
  • replica_server_name
  • group_id
  • join_state
  • join_state_desc

 This DMV is most useful when needing to rationalize the replica_id in various DMVs back to a meaningful name – in this case the server name for the replica as represented in the replica_server_name column. An example of using this DMV is provided under dm_hadr_availability_group_states, which is covered further in this article.

dm_hadr_availability_replica_states

Type: Dynamic Management View

Columns returned:

  • replica_id
  • group_id
  • is_local
  • role
  • role_desc
  • operational_state
  • operational_state_desc
  • connected_state
  • connected_state_desc
  • recovery_health
  • recovery_health_desc
  • synchronization_health
  • synchronization_health_desc
  • last_connect_error_number
  • last_connect_error_description
  • last_connect_error_timestamp

 

This DMV returns a row showing the state of each local and remote availability replica in the same Availability Group as the local replica against which the query is executed. It’s best to execute this against the host for the primary replica in order to assure the most complete picture possible of the state of all replicas participating in the AG. If you execute queries on sys.dm_hadr_availability_replica_states against the host for any of the secondary replicas you’ll only return local information for the availability group (the secondary node being queried against.)

To return qualified replica state information for an AG:

                              
SELECT hadrNIM.ag_name
, hadrARCS.replica_server_name
     , hadrARS.is_local
     , hadrARS.role_desc
     , hadrARS.operational_state_desc
     , hadrARS.connected_state_desc
     , hadrARS.recovery_health_desc
     , hadrARS.synchronization_health_desc
     , hadrARCS.join_state_desc
FROM sys.dm_hadr_availability_replica_states AS hadrARS
INNER JOIN sys.dm_hadr_availability_replica_cluster_states AS hadrARCS
          ON hadrARS.replica_id = hadrARCS.replica_id
     INNER JOIN sys. dm_hadr_name_id_map AS hadrNIM
          ON hadrARS.group_id = hadrNIM.ag_id;

dm_hadr_availability_group_states

Type: Dynamic Management View

Columns returned:

  • group_id
  • primary_replica
  • primary_recovery_health
  • secondary_recovery_health
  • primary_recovery_health_desc
  • secondary_recovery_health_desc,
  • synchronization_health,
  • synchronization_health_desc

sys.dm_hadr_availability_group_states provides health information for the AG as well as for the node you’re executing your query against as shown in the following query:

                              
SELECT ag_name
     , primary_replica
     , primary_recovery_health_desc
     , secondary_recovery_health_desc
     , synchronization_health_desc
FROM sys.dm_hadr_availability_group_states AS hadrAGS
     INNER JOIN sys.dm_hadr_name_id_map AS hadrNIM
          ON hadrAGS.group_id = hadrNIM.ag_id;

 

DATABASE SCOPED LEVEL AG-CENTRIC DMVs

dm_hadr_database_replica_states

Type: Dynamic Management View

Columns returned:

  • database_id
  • group_id
  • replica_id
  • group_database_id
  • is_local
  • synchronization_state
  • synchronization_state_desc
  • is_commit_participant
  • synchronization_health
  • synchronization_health_desc
  • database_state
  • database_state_desc
  • is_suspended
  • suspend_reason
  • suspend_reason_desc
  • recovery_lsn
  • truncation_lsn
  • last_sent_lsn
  • last_sent_time
  • last_received_lsn
  • last_received_time
  • last_hardened_lsn
  • last_hardened_time
  • last_redone_lsn
  • last_redone_time
  • log_send_queue_size
  • log_send_rate
  • redo_queue_size
  • redo_rate
  • filestream_send_rate
  • end_of_log_lsn
  • last_commit_lsn
  • last_commit_time
  • low_water_mark_for_ghosts

sys.dm_hadr_database_replica_states returns a row for each database participating in an Availability Group and provides state information on both the primary and secondary replicas.  On a secondary replica, this view returns a row for every secondary database on the server instance. On the primary replica, this view returns a row for each primary database and an additional row for the corresponding secondary database. So expect two rows returned for each database when executing on a primary replica but only one – and for the values associated with the secondary replica of the database – when executed against the secondary replica’s host. Also, Microsoft makes clear that the values of the end_of_log_lsn, last_hardened_lsn, last_received_lsn, last_sent_lsn, recovery_lsn, and truncation_lsn columns are not actual log sequence numbers (LSNs). Instead these are actually a log-block ID padded with zeroes. Log blocks are the physical containers for those Virtual Log Files (or VLF’s) that you keep hearing about. Of the LSN values returned by sys.dm_hadr_database_replica_states, only last_redone_lsn is a real LSN.

The following query provides a snapshot of database-level health for all AGs on a SQL instance, rationalizing identifying IDs for all objects to their name:

                              
SELECT hadrNIM.ag_name
     , hadrARCS.replica_server_name
     , hadrDRCS.[database_name]
     , hadrDRS.is_local
     , hadrDRS.synchronization_state_desc
     , hadrDRS.synchronization_health_desc
     , hadrDRS.database_state_desc
     , hadrDRS.is_suspended
     , hadrDRS.suspend_reason_desc
     , hadrDRS.last_sent_time
     , hadrDRS.last_received_time
     , hadrDRS.last_hardened_time
     , hadrDRS.last_redone_time
     , hadrDRS.last_commit_time
     , hadrDRS.log_send_queue_size
     , hadrDRS.log_send_rate
FROM sys.dm_hadr_database_replica_states AS hadrDRS
     INNER JOIN sys.dm_hadr_name_id_map AS hadrNIM
          ON hadrDRS.group_id = hadrNIM.ag_id
     INNER JOIN sys.dm_hadr_database_replica_cluster_states AS hadrDRCS
          ON hadrDRS.group_database_id = hadrDRCS.group_database_id
     INNER JOIN sys.dm_hadr_availability_replica_cluster_states AS hadrARCS
          ON hadrDRS.replica_id = hadrARCS.replica_id
ORDER BY hadrNIM.ag_name
     , hadrDRCS.[database_name]
     , hadrDRS.is_local;

 

dm_hadr_database_replica_cluster_states

Type: Dynamic Management View

Columns returned:

  • replica_id
  • group_database_id
  • database_name
  • is_failover_ready
  • is_pending_secondary_suspend
  • is_database_joined
  • recovery_lsn
  • truncation_lsn

This DMV is scoped at the database level as was sys.dm_hadr_database_replica_states but is focused solely on failover readiness. It provides information for each database participating in an AG and its corresponding failover preparedness state expressed as a bit value of 1 (ready) and 0 (not ready).

At any time you can get a glimpse of those databases participating in an AG that are not prepared for failover by querying for is_failover_ready value of 0:

                              
SELECT *
FROM sys.dm_hadr_database_replica_cluster_states
WHERE is_failover_ready = 0;

 

Identify replicas causing the log_reuse_wait_desc value of “AVAILABILITY_REPLICA” when you see transaction log size(s) spin out of control because truncation can’t occur:

 

                              
SELECT hadrARCS.replica_server_name
     , hadrDRCS.database_name
FROM sys.dm_hadr_database_replica_cluster_states AS hadrDRCS
     INNER JOIN sys.dm_hadr_availability_replica_cluster_states AS hadrARCS
          ON hadrDRCS.replica_id = hadrARCS.replica_id
     INNER JOIN
          (
               SELECT group_database_id
               FROM sys.dm_hadr_database_replica_states
               WHERE is_local = 0
                     AND synchronization_state_desc != 'SYNCHRONIZED'
          ) AS hadrDRS
               ON hadrDRCS.group_database_id = hadrDRS.group_database_id
     INNER JOIN sys.databases AS D
          ON hadrDRCS.database_name = D.name
WHERE D.log_reuse_wait_desc = 'AVAILABILITY_REPLICA'
     AND hadrDRCS.is_database_joined = 1;

 

Conclusion

Availability Groups are a powerful replacement to SQL Server Database Mirroring and though complex in configuration and creation once configured you do have insights into their state thanks not only to the GUI objects inside SQL Server Management Studio – but also through Transact SQL and the SQL Server Dynamic Management Views as well.  I hope you enjoyed this overview. Going forward expect to see how we can use each of these DMOs together and separately to assess and solve your high availability implementations in SQL Server 2016... and beyond.