AlwaysOn Availability Groups and SQL Server Jobs, Part 17: Creating Jobs to Check on Synchronization

In my previous post, I provided a high-level outline of the core logic (and rationale behind that logic) that would be needed to set up regular synchronization checks on SQL Server Agent Jobs for servers where AlwaysOn Availability Groups have been deployed. In this post, I’ll walk through the steps--and the code--needed to setup those checks.

Step 1: Start with Linked Servers

In order for the logic outlined in this post to work, you’ll need to set up Linked Server definitions between all of the servers hosting your AlwaysOn Availability Groups. In Post #15 I outlined some key concerns around Setting Up Linked Servers, which you’ll have to address before any of the code or logic below will work.

Step 2: Define UDF(s) to Determine Primary Replica Host

During checkups it will be important to know which server (or instance) is currently hosting the Primary Replica(s) for a given Availability Group.

As per post #11, A Fork in the Road, if you’re using the option/technique where you’ll be enabling/disabling jobs depending upon whether they’re on the same server as the Primary Replica(s), this logic will be indispensable. On the other hand, if you’re just using if/else logic within your job steps, you’ll still want to know which server is hosting the Primary Replica(s) in your target Availability Group when running synchronization checks so that you can simplify checks and run them only on/from the primary, as outlined below.

Technically speaking, there are two ways to check for "primacy"of your Availability Groups: by database name (for example, “Is the suchAndSuch database hosted on this server currently the Primary?”) or by the name of the Availability Group (“Is the blahBlah Availability Group currently hosting Primary Replicas on this server?”). Scripts for both techniques are listed below.

USE master;
GO

IF OBJECT_ID('dbo.fn_hadr_database_is_primary','FN') IS NOT NULL
        DROP FUNCTION dbo.fn_hadr_database_is_primary;
GO

CREATE FUNCTION dbo.fn_hadr_database_is_primary (@DBName sysname)
RETURNS bit
AS
        BEGIN
                DECLARE @description sysname;
               
                SELECT
                        @description =  hars.role_desc
                FROM
                        sys.DATABASES d
                        INNER JOIN sys.dm_hadr_availability_replica_states hars ON d.replica_id = hars.replica_id
                WHERE
                        database_id = DB_ID(@DBName);
       
                IF @description = 'PRIMARY'
                        RETURN 1;
       
                RETURN 0;
        END;
GO

USE master;
GO

IF OBJECT_ID('dbo.fn_hadr_group_is_primary','FN') IS NOT NULL
        DROP FUNCTION dbo.fn_hadr_group_is_primary;
GO

CREATE FUNCTION dbo.fn_hadr_group_is_primary (@AGName sysname)
RETURNS bit
AS
        BEGIN
               
                DECLARE @PrimaryReplica sysname;

                SELECT @PrimaryReplica = hags.primary_replica
                FROM
                        sys.dm_hadr_availability_group_states hags
                        INNER JOIN sys.availability_groups ag ON ag.group_id = hags.group_id
                WHERE
                        ag.name = @AGName;

                IF UPPER(@PrimaryReplica) =  UPPER(@@SERVERNAME)
                        RETURN 1; -- primary

                RETURN 0; -- not primary
               
        END;
GO

The naming "convention" around both of these UDFs is patterned after that of sys.fn_hadr_backup_is_preferred_replica.  As with that UDF, if you specify the name of the @DBName or @AGName parameters above incorrectly, you’ll get a FALSE result instead of an error (which is by design).

Step 3: Deploy a Sproc to Check Sever-Level Jobs

With Linked Servers in place, it’s now time to check up on Server-Level Jobs--or SQL Server Agent Jobs that operate at the Server Level or which target "server level" details (such as checking available disk space, cycling the error logs or truncating msdb history). This logic will also (likely) be used to checkup on backup routines.

The idea behind this code is pretty simple: It just requests details about SQL Server Agent Jobs on one server, grabs details about Jobs on another server, and then compares relevant bits of information to see if there are any differences. Or, at least, that’s what it’s doing at a high level.

USE master;
GO


IF OBJECT_ID('dbo.dba_SyncCheckupForInstanceLevelJobs','P') IS NOT NULL
        DROP PROC dbo.dba_SyncCheckupForInstanceLevelJobs
GO

CREATE PROC dbo.dba_SyncCheckupForInstanceLevelJobs
        @MailProfileName        sysname = NULL, -- e.g., 'General'
        @OperatorName           sysname = NULL, -- e.g., 'Alerts',
        @PrimaryAGAfinity       sysname = NULL, -- see comments above.
        @IgnoredJobs            nvarchar(MAX) = '',
        @ConsoleOnly            bit     = 0             -- output only to console - don't email alerts (for debugging/manual execution, etc.)
AS
        SET NOCOUNT ON;

        IF @PrimaryAGAfinity IS NOT NULL BEGIN
                IF (SELECT dbo.fn_hadr_group_is_primary(@PrimaryAGAfinity)) = 0 BEGIN
                        PRINT 'Server is Not Primary.'
                        RETURN 0;
                END
        END

        -- if we're not manually running this, make sure the server is the primary:
        IF @ConsoleOnly = 0 BEGIN -- if we're not running a 'manual' execution - make sure we have all parameters:
                IF ISNULL(@MailProfileName, '') = '' BEGIN
                        RAISERROR('Procedure expects parameter ''@MailProfileName'' which was not provided.', 16, 4);
                        RETURN -2;
                END

                IF ISNULL(@OperatorName, '') = '' BEGIN
                        RAISERROR('Procedure expects parameter ''@OperatorName'' which was not provided.', 16, 4);
                        RETURN -3;
                END
        END;

        CREATE TABLE #IgnoredJobs (
                job_name sysname
        );

        -- single-line / in-line 'split' function:
        DECLARE @DeserializedJobs nvarchar(MAX) = N'SELECT ' + REPLACE(REPLACE(REPLACE(N'''{0}''','{0}',@IgnoredJobs), ',', ''','''), ',', ' UNION SELECT ');

        INSERT INTO #IgnoredJobs
        EXEC(@DeserializedJobs);
       
        DECLARE @LocalServerName sysname = @@SERVERNAME;
        DECLARE @RemoteServerName sysname;
        SET @RemoteServerName = (SELECT TOP 1 name FROM PARTNER.master.sys.servers WHERE server_id = 0);

        ----------------------------------------------------------------------------
        -- Start by checking all jobs that aren't 'mapped' to categories representing their HA name...
        DECLARE @AvailabilityGroupDatabases TABLE (
                name sysname NOT NULL
        );

        INSERT INTO @AvailabilityGroupDatabases
        SELECT name FROM sys.availability_groups

        UNION
        SELECT name FROM PARTNER.master.sys.availability_groups;
       
        CREATE TABLE #LocalJobs (
                job_id uniqueidentifier,
                name sysname,
                [enabled] tinyint,
                [description] nvarchar(512),
                start_step_id INT,
                owner_sid varbinary(85),
                notify_level_email INT,
                notify_email_operator_id INT
        );

        CREATE TABLE #RemoteJobs (
                job_id uniqueidentifier,
                name sysname,
                [enabled] tinyint,
                [description] nvarchar(512),
                start_step_id INT,
                owner_sid varbinary(85),
                notify_level_email INT,
                notify_email_operator_id INT
        );

        INSERT INTO #LocalJobs
        SELECT
                sj.job_id,
                sj.name,
                sj.[enabled],
                sj.[description],
                sj.start_step_id,
                sj.owner_sid,
                sj.notify_level_email,
                sj.notify_email_operator_id
        FROM
                msdb.dbo.sysjobs sj
                INNER JOIN msdb.dbo.syscategories sc ON sj.category_id = sc.category_id
        WHERE
                -- categoryName of the Job in question != an AG Group Name
                UPPER(sc.name) NOT IN (SELECT UPPER(name) FROM @AvailabilityGroupDatabases);

        INSERT INTO #RemoteJobs
        SELECT
                sj.job_id,
                sj.name,
                sj.[enabled],
                sj.[description],
                sj.start_step_id,
                sj.owner_sid,
                sj.notify_level_email,
                sj.notify_email_operator_id
        FROM
                PARTNER.msdb.dbo.sysjobs sj
                INNER JOIN PARTNER.msdb.dbo.syscategories sc ON sj.category_id = sc.category_id
        WHERE
                UPPER(sc.name) NOT IN (SELECT UPPER(name) FROM @AvailabilityGroupDatabases);

        CREATE TABLE #Divergence (
                name sysname,
                diff_type sysname
        );

        INSERT INTO #Divergence
        SELECT
                name,
                'Job Exists on ' + @LocalServerName + ' Only.'
        FROM
                #LocalJobs
        WHERE
                name NOT IN (SELECT name FROM #RemoteJobs);

        INSERT INTO #Divergence
        SELECT
                name,
                'Job Exists on ' + @RemoteServerName + ' Only.'
        FROM
                #RemoteJobs
        WHERE
                name NOT IN (SELECT name FROM #LocalJobs);

        INSERT INTO #Divergence
        SELECT
                lj.name,
                'Job-Level Differences (enabled, start-step, notification, etc)'
        FROM
                #LocalJobs lj
                INNER JOIN #RemoteJobs rj ON rj.name = lj.name
        WHERE
                lj.[enabled] != rj.[enabled]
                OR lj.start_step_id != rj.start_step_id
                OR lj.notify_email_operator_id != rj.notify_email_operator_id
                OR lj.notify_level_email != rj.notify_level_email;

        CREATE TABLE #LocalJobSteps (
                step_id INT,
                [checksum] INT
        );

        CREATE TABLE #RemoteJobSteps (
                step_id INT,
                [checksum] INT
        );

        CREATE TABLE #LocalJobSchedules (
                schedule_name sysname,
                [checksum] INT
        );

        CREATE TABLE #RemoteJobSchedules (
                schedule_name sysname,
                [checksum] INT
        );

        -- Now, (matching job by job) go through and check schedules and job steps:
        DECLARE checker CURSOR FAST_FORWARD FOR
        SELECT
                lj.job_id local_job_id,
                rj.job_id remote_job_id,
                lj.name
        FROM
                #LocalJobs lj
                INNER JOIN #RemoteJobs rj ON lj.name = rj.name;

        DECLARE @LocalJobId uniqueidentifier, @RemoteJobId uniqueidentifier, @JobName sysname;
        DECLARE @LocalCount INT, @RemoteCount INT;

        OPEN checker;
        FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName;

        WHILE @@FETCH_STATUS = 0 BEGIN
       
                -- check jobsteps first:
                DELETE FROM #LocalJobSteps;
                DELETE FROM #RemoteJobSteps;

                INSERT INTO #LocalJobSteps
                SELECT
                step_id,
                BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail]
                FROM msdb.dbo.sysjobsteps
                WHERE job_id = @LocalJobId;

                INSERT INTO #RemoteJobSteps
                SELECT
                step_id,
                BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail]
                FROM PARTNER.msdb.dbo.sysjobsteps
                WHERE job_id = @RemoteJobId;

                SELECT @LocalCount = COUNT(*) FROM #LocalJobSteps;
                SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSteps;

                IF @LocalCount != @RemoteCount
                        INSERT INTO #Divergence VALUES (@JobName, 'Different Job Step Counts between Servers');
                ELSE BEGIN
                        INSERT INTO #Divergence
                        SELECT
                                @JobName,
                                'Different Job Step Details between Servers'
                        FROM
                                #LocalJobSteps ljs
                                INNER JOIN #RemoteJobSteps rjs ON rjs.step_id = ljs.step_id
                        WHERE  
                                ljs.[checksum] != rjs.[checksum];
                END;

                -- Now Check Schedules:
                DELETE FROM #LocalJobSchedules;
                DELETE FROM #RemoteJobSchedules;

                INSERT INTO #LocalJobSchedules
                SELECT
                        ss.name,
                        BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type +
                        ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor +
                        ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details]
                FROM
                        msdb.dbo.sysjobschedules sjs
                        INNER JOIN msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_id
                WHERE
                        sjs.job_id = @LocalJobId;


                INSERT INTO #RemoteJobSchedules
                SELECT
                        ss.name,
                        BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type +
                        ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor +
                        ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details]
                FROM
                        PARTNER.msdb.dbo.sysjobschedules sjs
                        INNER JOIN PARTNER.msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_id
                WHERE
                        sjs.job_id = @RemoteJobId;

                SELECT @LocalCount = COUNT(*) FROM #LocalJobSchedules;
                SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSchedules;

                IF @LocalCount != @RemoteCount
                        INSERT INTO #Divergence VALUES (@JobName, 'Different Job Schedule Counts between Servers');
                ELSE BEGIN
                        INSERT INTO #Divergence
                        SELECT
                                @JobName,
                                'Different Schedule Details between Servers.'
                        FROM
                                #LocalJobSchedules ljs
                                INNER JOIN #RemoteJobSchedules rjs ON rjs.schedule_name = ljs.schedule_name
                        WHERE
                                ljs.[checksum] != rjs.[checksum];

                END;

                FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName;
        END;

        CLOSE checker;
        DEALLOCATE checker;


        IF(SELECT COUNT(*) FROM #Divergence WHERE name NOT IN(SELECT job_name FROM #IgnoredJobs)) > 0 BEGIN

                DECLARE @subject sysname = 'SQL Server Agent Job Synchronization Problems';
                DECLARE @crlf CHAR(2) = CHAR(13) + CHAR(10);
                DECLARE @tab CHAR(1) = CHAR(9);
                DECLARE @message nvarchar(MAX) = 'Problems detected with the following SQL Server Agent Jobs: '
                + @crlf;

                SELECT
                        @message = @message + @tab + name + ': ' + @crlf + @tab + @tab + diff_type + @crlf
                FROM
                        #Divergence
                WHERE
                        name NOT IN (SELECT job_name FROM #IgnoredJobs WHERE job_name != '')
                ORDER BY
                        NAME;

                SELECT @message += @crlf + @crlf + 'Jobs can be synchronized by scripting them on the Primary and running scripts on the Seconary.'
                        + @crlf + @tab + 'To Script Multiple Jobs at once, SSMS > SQL Server Agent Jobs > F7 -> then shift/ctrl + click to select multiple jobs simultaneously.'

                IF @ConsoleOnly = 1 BEGIN
                        -- just Print out details:
                        PRINT 'SUBJECT: ' + @subject;
                        PRINT 'BODY: ' + @crlf + @message;

                  END
                ELSE BEGIN
                        -- send a message:
                        EXEC msdb..sp_notify_operator
                                @profile_name = @MailProfileName,
                                @name = @OperatorName,
                                @subject = @subject,
                                @body = @message;
                END;
        END;

        DROP TABLE #LocalJobs;
        DROP TABLE #RemoteJobs;
        DROP TABLE #Divergence;
        DROP TABLE #LocalJobSteps;
        DROP TABLE #RemoteJobSteps;
        DROP TABLE #LocalJobSchedules;
        DROP TABLE #RemoteJobSchedules;
        DROP TABLE #IgnoredJobs;

        RETURN 0;

Behind the scenes there’s a bit more going on. For example, there’s an @PrimaryAGAfinity parameter, which is used to specify "affinity" or correlation with/against a specified Availability Group being hosted on your servers. The idea is that if you specify this value (for example, “MyProductionDbs”) and SQL Server a.) detects that AG is defined on the host where this code is run and b.) that AG is currently not hosting Primary Replicas, then this checkup logic will exit because it will assume that we’ll be doing the checkups on the Server hosting the Primary Replica(s).

Likewise, there are parameters for a MailProfileName (or the name of the Mail Profile to use if there are synchronization issues detected), as well as the name of the Operator to notify.

Finally, there’s also an @IngoredJobs parameter, where you can comma-delimit a list of Job names that you wish to ignore (or not have checked for synchronization), and an @ConsoleOnly flag, which you can use for running/testing this sproc within SSMS--where it won’t bother sending an email alert as it would when scheduled.

Step 4: Deploy a Sproc to Check on AlwaysOn Availability Group Jobs

Another key thing that the sproc above does, though, is IGNORE any SQL Server Agent Job where the name of the Job Category assigned to that Job matches the name of an Availability Group on the host--in keeping with the convention outlined in Post #12, Issues with Enabling and Disabling Batch Jobs. As such, we’ll need additional, follow-up, logic that checks on AG-Level Jobs and their synchronization state.

Ultimately, this logic is pretty similar to the logic above, in that we’ll grab job details from one server and compare them against the details on another server. Only, in this case, we’ll only be grabbing details for jobs that "belong" to a specific SQL Server Agent Job Category (that is, to a given Availability Group), and we’ll, optionally, be looking to see if we need to "toggle" the Enabled/Disabled status for these jobs--based upon where they’re currently hosted and whether they’re on the same host as the Primary Replica.

Code is as follows:

USE master;
GO

IF OBJECT_ID('dbo.dba_SyncCheckupForAGLevelJobs','P') IS NOT NULL
        DROP PROC dbo.dba_SyncCheckupForAGLevelJobs;
GO

CREATE PROC dbo.dba_SyncCheckupForAGLevelJobs
        @AGName                         sysname,                                -- Required
        @JobsTableDBName        sysname,                                -- Required (tells us which DB the list of enabled/disabled jobs is in).
        @MailProfileName        sysname = NULL,                 -- optional - only needed if @ConsoleOnly = 0
        @OperatorName           sysname = NULL,                 -- as per above
        @IgnoredJobs            nvarchar(MAX) = NULL,   -- jobs to ignore (in the category with name of MirroredDB)
        @ConsoleOnly            bit = 0 -- output details to console (1), or send alerts/output via email (0).
AS     
        SET     NOCOUNT ON;

        IF ISNULL(@AGName, '') = '' BEGIN
                RAISERROR('Procedure expects parameter ''@AGName'' which was not provided.', 16, 4);
                RETURN -1;
        END

        IF ISNULL(@JobsTableDBName, '') = '' BEGIN
                RAISERROR('Procedure expects parameter ''@JobsTableDBName'' which was not provided.', 16, 4);
                RETURN -2;
        END

        IF @ConsoleOnly = 0 BEGIN -- if we're not running a 'manual' execution - make sure we have all parameters:

                IF ISNULL(@MailProfileName, '') = '' BEGIN
                        RAISERROR('Procedure expects parameter ''@MailProfileName'' which was not provided.', 16, 4);
                        RETURN -5;
                END

                IF ISNULL(@OperatorName, '') = '' BEGIN
                        RAISERROR('Procedure expects parameter ''@OperatorName'' which was not provided.', 16, 4);
                        RETURN -6;
                END


        END;

        -- Start by running the Job that syncs/updates Job Enabled/Disabled Status first:
        EXEC master.dbo.dba_EnsureAGLevelJobEnabledStatuses @AGName, @JobsTableDBName;

        -- Remaining checks/reporting only needs to be run from/against the Primary - so bail if that's not this server.
        IF(SELECT master.dbo.fn_hadr_group_is_primary(@AGName)) = 0 BEGIN
                PRINT 'Server is Not Primary';
                RETURN 0;
        END

        CREATE TABLE #IgnoredJobs (
                job_name sysname
        );

        -- single-line / in-line 'split' function:
        DECLARE @DeserializedJobs nvarchar(MAX) = N'SELECT ' + REPLACE(REPLACE(REPLACE(N'''{0}''','{0}',@IgnoredJobs), ',', ''','''), ',', ' UNION SELECT ');

        INSERT INTO #IgnoredJobs
        EXEC(@DeserializedJobs);

        CREATE TABLE #LocalJobs (
                job_id uniqueidentifier,
                name sysname,
                [enabled] tinyint,
                [description] nvarchar(512),
                start_step_id INT,
                owner_sid varbinary(85),
                notify_level_email INT,
                notify_email_operator_id INT
        );

        CREATE TABLE #RemoteJobs (
                job_id uniqueidentifier,
                name sysname,
                [enabled] tinyint,
                [description] nvarchar(512),
                start_step_id INT,
                owner_sid varbinary(85),
                notify_level_email INT,
                notify_email_operator_id INT
        );

        INSERT INTO #LocalJobs
        SELECT
                sj.job_id,
                sj.name,
                sj.[enabled],
                sj.[description],
                sj.start_step_id,
                sj.owner_sid,
                sj.notify_level_email,
                sj.notify_email_operator_id
        FROM
                msdb.dbo.sysjobs sj
                INNER JOIN msdb.dbo.syscategories sc ON sj.category_id = sc.category_id
        WHERE
                UPPER(sc.name) = UPPER(@AGName);

        INSERT INTO #RemoteJobs
        SELECT
                sj.job_id,
                sj.name,
                sj.[enabled],
                sj.[description],
                sj.start_step_id,
                sj.owner_sid,
                sj.notify_level_email,
                sj.notify_email_operator_id
        FROM
                PARTNER.msdb.dbo.sysjobs sj
                INNER JOIN PARTNER.msdb.dbo.syscategories sc ON sj.category_id = sc.category_id
        WHERE
                UPPER(sc.name) = UPPER(@AGName);

        -- Now start comparing differences:
        CREATE TABLE #Divergence (
                name sysname,
                diff_type nvarchar(300)
        );

        INSERT INTO #Divergence
        SELECT
                name,
                'Exists on Primary Server Only'
        FROM
                #LocalJobs
        WHERE
                name NOT IN (SELECT name FROM #RemoteJobs);

        INSERT INTO #Divergence
        SELECT
                name,
                'Exists on Secondary Server Only'
        FROM
                #RemoteJobs
        WHERE
                name NOT IN (SELECT name FROM #LocalJobs);

        -- check meta-data properties:
        INSERT INTO #Divergence
        SELECT
                lj.name,
                'Job-Level Differences (owner, start-step, notification, etc)'
        FROM
                #LocalJobs lj
                INNER JOIN #RemoteJobs rj ON rj.name = lj.name
        WHERE
                lj.owner_sid != rj.owner_sid
                OR lj.start_step_id != rj.start_step_id
                OR lj.notify_email_operator_id != rj.notify_email_operator_id
                OR lj.notify_level_email != rj.notify_level_email;


        -- For Job Enabled Status we have two checks:
        --              A: No Jobs that have a Job.CategoryName = @AGName should be enabled on the SECONDARY server.
        --              B: we can't infer that on the PRIMARY server. So, instead, if Job.CategoryName = @AGName and
        --                      enabled/disabled != Enabled bit-flag from the <AGNAME>_JobEnabledStates table, then we need
        --                      to raise an error because a job that should be enabled is NOT. (Or vice versa.)
       
        -- A) Check for jobs on the remote server:
        INSERT INTO #Divergence
        SELECT
                name,
                'Job Enabled on SECONDARY SERVER (Jobs should only be enabled on PRIMARY).'
        FROM
                #RemoteJobs
        WHERE
                enabled = 1;

        -- B) Check for jobs on the local server:
        DECLARE @Jobs TABLE (
                JobName sysname NOT NULL,
                [Enabled] bit NOT NULL
        )

        DECLARE @SQL nvarchar(MAX) = N'SELECT JobName, [Enabled] FROM ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates];'

        INSERT INTO @Jobs
        EXEC(@SQL);

        -- Report on Jobs that are not documented:
        INSERT INTO #Divergence
        SELECT
                lj.name,
                'Job Enabled/Disabled Status is not currently defined. Please check job Enabled/Disabled status then run "EXEC master.dbo.dba_DocumentJobEnabledStatuses ''' + @AGName + ''';" to document Job Enabled/Disabled state.'
        FROM
                #LocalJobs lj
        WHERE
                lj.name NOT IN (SELECT JobName FROM @Jobs);

        -- Report on Jobs that are Enabled but shouldn't be:
        INSERT INTO #Divergence
        SELECT
                lj.name,
                'Job is ENABLED on Primary Replica Server but ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates] specifies that Job should be DISABLED. (Update Table or Job Status to correct this issue.)'
        FROM
                #LocalJobs lj
                LEFT OUTER JOIN @Jobs j ON lj.name = j.JobName
        WHERE
                ISNULL(j.[Enabled],0) = 0 AND lj.[enabled] = 1; -- job is listed/defaulted to disabled, but is currently enabled

        -- Report on Jobs that are Disabled but shouldn't be:
        INSERT INTO #Divergence
        SELECT
                lj.name,
                'Job is DISABLED on Primary Replica Server but ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates] specifies that Job should be _ENABLED_ (Running). (Update Table or Job Status to correct this issue.)'
        FROM
                #LocalJobs lj
                LEFT OUTER JOIN @Jobs j ON lj.name = j.JobName
        WHERE
                ISNULL(j.[Enabled], 0) = 1 AND lj.[enabled] = 0;  -- job is listed/defaulted to enabled, but is currently disabled


        -- Now, (matching job by job) go through and check schedules and job steps:
        CREATE TABLE #LocalJobSteps (
                step_id INT,
                [checksum] INT
        );

        CREATE TABLE #RemoteJobSteps (
                step_id INT,
                [checksum] INT
        );

        CREATE TABLE #LocalJobSchedules (
                schedule_name sysname,
                [checksum] INT
        );

        CREATE TABLE #RemoteJobSchedules (
                schedule_name sysname,
                [checksum] INT
        );

        DECLARE checker CURSOR FAST_FORWARD FOR
        SELECT
                lj.job_id local_job_id,
                rj.job_id remote_job_id,
                lj.name
        FROM
                #LocalJobs lj
                INNER JOIN #RemoteJobs rj ON lj.name = rj.name;

        DECLARE @LocalJobId uniqueidentifier, @RemoteJobId uniqueidentifier, @JobName sysname;
        DECLARE @LocalCount INT, @RemoteCount INT;

        OPEN checker;
        FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName;

        WHILE @@FETCH_STATUS = 0 BEGIN
       
                -- check jobsteps first:
                DELETE FROM #LocalJobSteps;
                DELETE FROM #RemoteJobSteps;

                INSERT INTO #LocalJobSteps
                SELECT
                step_id,
                BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail]
                FROM msdb.dbo.sysjobsteps
                WHERE job_id = @LocalJobId;

                INSERT INTO #RemoteJobSteps
                SELECT
                step_id,
                BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail]
                FROM PARTNER.msdb.dbo.sysjobsteps
                WHERE job_id = @RemoteJobId;

                SELECT @LocalCount = COUNT(*) FROM #LocalJobSteps;
                SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSteps;

                IF @LocalCount != @RemoteCount
                        INSERT INTO #Divergence VALUES (@JobName, 'Different Job Step Counts between Servers');
                ELSE BEGIN
                        INSERT INTO #Divergence
                        SELECT
                                @JobName,
                                'Different Job Step Details between Servers'
                        FROM
                                #LocalJobSteps ljs
                                INNER JOIN #RemoteJobSteps rjs ON rjs.step_id = ljs.step_id
                        WHERE  
                                ljs.[checksum] != rjs.[checksum];
                END;

                -- Now Check Schedules:
                DELETE FROM #LocalJobSchedules;
                DELETE FROM #RemoteJobSchedules;

                INSERT INTO #LocalJobSchedules
                SELECT
                        ss.name,
                        BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type +
                        ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor +
                        ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details]
                FROM
                        msdb.dbo.sysjobschedules sjs
                        INNER JOIN msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_id
                WHERE
                        sjs.job_id = @LocalJobId;


                INSERT INTO #RemoteJobSchedules
                SELECT
                        ss.name,
                        BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type +
                        ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor +
                        ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details]
                FROM
                        PARTNER.msdb.dbo.sysjobschedules sjs
                        INNER JOIN PARTNER.msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_id
                WHERE
                        sjs.job_id = @RemoteJobId;

                SELECT @LocalCount = COUNT(*) FROM #LocalJobSchedules;
                SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSchedules;

                IF @LocalCount != @RemoteCount
                        INSERT INTO #Divergence VALUES (@JobName, 'Different Job Schedule Counts between Servers');
                ELSE BEGIN
                        INSERT INTO #Divergence
                        SELECT
                                @JobName,
                                'Different Schedule Details between servers.'
                        FROM
                                #LocalJobSchedules ljs
                                INNER JOIN #RemoteJobSchedules rjs ON rjs.schedule_name = ljs.schedule_name
                        WHERE
                                ljs.[checksum] != rjs.[checksum];

                END;

                FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName;
        END;

        CLOSE checker;
        DEALLOCATE checker;

        IF(SELECT COUNT(*) FROM #Divergence WHERE name NOT IN(SELECT job_name FROM #IgnoredJobs)) > 0 BEGIN

                DECLARE @subject sysname = 'SQL Server Agent Job Synchronization Problems';
                DECLARE @crlf CHAR(2) = CHAR(13) + CHAR(10);
                DECLARE @tab CHAR(1) = CHAR(9);
                DECLARE @message nvarchar(MAX) = 'Problems detected with the following SQL Server Agent Jobs: '
                + @crlf;

                SELECT
                        @message = @message + @tab + name + ': ' + @crlf + @tab + @tab + diff_type + @crlf
                FROM
                        #Divergence
                WHERE
                        name NOT IN (SELECT job_name FROM #IgnoredJobs WHERE job_name != '')
                ORDER BY
                        NAME;

                SELECT @message += @crlf + @crlf + 'Jobs can be synchronized by scripting them on the Primary and running scripts on the Seconary.'
                        + @crlf + @tab + 'To Script Multiple Jobs at once, SSMS > SQL Server Agent Jobs > F7 -> then shift/ctrl + click to select multiple jobs simultaneously.'

                IF @ConsoleOnly = 1 BEGIN
                        -- just Print out details:
                        PRINT 'SUBJECT: ' + @subject;
                        PRINT 'BODY: ' + @crlf + @message;

                  END
                ELSE BEGIN
                        -- send a message:
                        EXEC msdb..sp_notify_operator
                                @profile_name = @MailProfileName,
                                @name = @OperatorName,
                                @subject = @subject,
                                @body = @message;
                END;
        END;

        DROP TABLE #LocalJobs;
        DROP TABLE #RemoteJobs;
        DROP TABLE #Divergence;
        DROP TABLE #LocalJobSteps;
        DROP TABLE #RemoteJobSteps;
        DROP TABLE #LocalJobSchedules;
        DROP TABLE #RemoteJobSchedules;
        DROP TABLE #IgnoredJobs;

        RETURN 0;
GO

If you decided to implement the Enable/Disable approach to managing ‘Batch Jobs’ (as outlined in Post #12) the code above will work as-is and will try to Enable/Disable jobs on your servers as needed--by making a call to the following sproc (which makes use of the Jobs State Table discussed and outlined in Post #13):

USE master;
GO

IF OBJECT_ID('dbo.dba_EnsureAGLevelJobEnabledStatuses','P') IS NOT NULL
        DROP PROC dbo.dba_EnsureAGLevelJobEnabledStatuses
GO

CREATE PROC dbo.dba_EnsureAGLevelJobEnabledStatuses
        @AGName                         sysname, -- name of the AG to failover.
        @JobsTableDBName        sysname -- name of the db with the jobs-enabled-details table
AS
        SET NOCOUNT ON;

        IF(SELECT master.dbo.fn_hadr_group_is_primary(@AGName)) = 0 BEGIN
               
                -- If we're not on the PRIMARY server, then just make sure all jobs with categoryName = @AGName are disabled
                DECLARE disabler CURSOR FAST_FORWARD FOR
                SELECT
                        j.job_id,
                        j.name
                FROM
                        msdb.dbo.sysjobs  j
                        INNER JOIN msdb.dbo.syscategories c ON c.category_id = j.category_id
                WHERE
                        j.[enabled] = 1 AND
                        c.name = @AGName;

                DECLARE @JobId uniqueidentifier;
                DECLARE @JobName sysname;

                OPEN disabler;
                FETCH NEXT FROM disabler INTO @JobId, @JobName;
                WHILE @@FETCH_STATUS = 0 BEGIN

                        EXEC msdb.dbo.sp_update_job
                                @job_id = @JobId,
                                @enabled = 0;

                        PRINT 'Disabled Job: [' + @JobName + '] ON Non-Primary Server.';

                        FETCH NEXT FROM disabler INTO @JobId, @JobName;
                END;

                CLOSE disabler;
                DEALLOCATE disabler;

          END ;

       
        ELSE BEGIN
               
                -- otherwise, if we ARE on the primary, then we should ENABLE jobs that need to be enabled.
                DECLARE @Error sysname;

                DECLARE @Jobs TABLE (
                        JobName sysname NOT NULL,
                        [Enabled] bit NOT NULL
                );

                DECLARE @SQL nvarchar(MAX) = N'SELECT JobName, [Enabled] FROM ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates];'

                INSERT INTO @Jobs
                EXEC(@SQL);

                DECLARE enabler CURSOR FAST_FORWARD FOR
                SELECT
                        JobName
                FROM
                        @Jobs
                WHERE
                        [Enabled] = 1;

                OPEN enabler;
                FETCH NEXT FROM enabler INTO @JobName;

                WHILE @@FETCH_STATUS = 0 BEGIN
               
                        SELECT @JobId = job_id FROM msdb.dbo.sysjobs WHERE name = @JobName;

                        IF @JobId IS NULL BEGIN
                                SET @Error = 'Unable to Locate (and set ENABLED) on Job: ' + @JobName + '.';
                                THROW 50001, @Error, 1;
                                RETURN -1; -- we're done. (should we send out an email alert? probably)
                        END
                       
                        EXEC msdb.dbo.sp_update_job
                                @job_id = @JobId,
                                @enabled = 1; -- enable the job here...

                        FETCH NEXT FROM enabler INTO @JobName;

                END;

                CLOSE enabler;
                DEALLOCATE enabler;

        END;

        RETURN 0;
GO

However, if you choose to go with the IF/ELSE logic embedded in your Batch Jobs, then you’ll want to disable (that is, comment-out) the code in the ‘parent’ sproc that attempts these operations. Likewise, it should be called out that all of the code in this post is for a scenario where you’ve just got TWO servers in your Availability Group (hence the notion of calls to a ‘Partner’ Linked Server, as per post #15). If you’ve got more than two servers in your topology, the basic logic outlined in these posts will work--but you’ll need to modify it accordingly (which we’ll look at a tiny bit in a future post).

Step 5: Create a SQL Server Agent Job to Execute Regular Checkups

With code in place, you’re now ready to set up a Job that’ll execute regular synchronization checks. Basic logic for setting up this job is that you’ll want to set up a recurring schedule, then add a single job step to run ‘Server Level’ checks, and then add in an additional Job Step for each AlwaysOn Availability Group that you want to check up on. So, if you’ve got one Availability Group on your servers, you’d end up with a Job with two Job Steps; whereas if you have three AGs, you’ll end up with a Job with four Job Steps (one for Server-Level checkups and a distinct job step, or call to dba_SyncCheckupForAGLevelJobs for each of your Availability Groups).

In my next post, we’ll take a more concrete look at a sample/example Synchronization Job--and discuss some options and best practices for scheduling and review how these checks will "behave" in the "wild."

Please or Register to post comments.

What's Practical SQL Server?

Practical advice, insight, and help for core SQL Server considerations.

Contributors

Michael K. Campbell

Michael K. Campbell is a contributing editor for SQL Server Pro and Dev Pro and is an ASPInsider. Michael is the president of OverAchiever Productions, a consultancy dedicated to technical evangelism...
Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×