Setting Up Additional Checks to Ensure Regular Transaction Log Backups

There’s simply no way to overstate the importance of regular Transaction Log Backups. Not only do they help protect from disaster, but regular execution of T-Log backups on Full (and Bulk-Logged) Recovery databases helps keep thing “fit and trim”. Most of the time, setting up a Notification for when T-Log Backup Jobs fail is enough to let you know when something goes wrong.

However, I have seen a couple of edge cases where this simply wasn’t enough. Typically this has been in multi-tenant systems and/or in systems with a high volume of transactions where any of the following things can or might have happened:

  • The UNC path or destination that the MIRROR TO clause (an Enterprise Edition only feature that lets you save backups to two locations (i.e., locally and to an off-box secondary location)) is pointing to is having issues and has slowed things down enough that while backups haven’t exactly failed, they’re no longer running on a regular schedule and it might therefore have been an hour or more since some databases have been adequately backed up.
  • Some third party SQL Server Backup Solutions have a GREAT ‘retry’ feature that lets you specify that if a backup fails for some reason, it should be retried N times with X amount of seconds to wait between each try. This is a great feature – unless you’ve run into problems, have too long of a wait between retries, and have GOBs of retries going on against multiple databases – at which point your backup job won’t actually ‘fail’ but can end up being seemingly ‘stalled’ for hours at a time. (e.g., if I’ve got a job set up to backup T-Logs against 20 databases every 10 minutes, and there’s some sort of problem executing the backups ‘retry’ attempts against a single database might end up taking 20 minutes – then the job will move on to the next database and so on – until you’ve spent hours with retry logic slowly sabotaging your backup chain before it’ll ever raise an error or alert)

In cases like this, and where it’s even more essential that T-Log backups are happening regularly, I’ve created a simple script that can be used to checkup on both the job that runs your backups (to make sure it has successfully completed within the last N minutes) and to checkup on the last backup dates of your FULL recovery databases.

The one obvious limitation with this job is that it runs via the SQL Server Agent – which means that if the SQL Server Agent (which is handling your backups) somehow crashes, then this ‘watchdog’ won’t be able to run either. (That said, in years of working with SQL Server I’ve never seen the SQL Server Agent seriously crash or unable to run – well, except when some enterprising soul explicitly shut it down. So if any of this is a concern, you might want to try and run this script via some OTHER scheduling mechanism than simply dropping it into a new SQL Server Agent Job and setting up a regular schedule for it to run under.)

Otherwise, the script is pretty simple. Just specify how far back you’d like your checks to run, and the name of the Job that’s handling your T-Log backups. (If you’ve got multiple jobs you could easily tweak this to account for an IN() clause.) Then, specify the name of an operator and the Database Mail Profile to use when sending alerts and this script will do the rest - once you schedule it for regular execution.

SET NOCOUNT ON;

-- Variables:
DECLARE @NumberOfPastMinutesToCheck int = 30;
DECLARE @LogBackupJobName sysname = N'Customer Databases - TLOG';

-- Alerting details:
DECLARE @OperatorName sysname = 'Alerts';
DECLARE @DbMailProfileName sysname = 'General';

------------------------------------------------------------------------------
-- Logic/Implementation:

DECLARE @JobID uniqueidentifier;
SELECT @JobID = job_id
FROM msdb..sysjobs
WHERE
        name = @LogBackupJobName;


DECLARE @Subject nvarchar(200);
DECLARE @Body nvarchar(200);

-- Start with review of Backup Job - check to make sure it has
--              run (successfully) within the last @NumberOfPastMinutesToCheck
DECLARE @ago datetime = DATEADD(n, - @NumberOfPastMinutesToCheck, GETDATE());

DECLARE @DateAsInt int;
DECLARE @TimeAsInt int;

SELECT @DateAsInt = CAST(CONVERT(varchar(20), @ago, 112) AS int);
SELECT @TimeAsInt = CAST(REPLACE(CONVERT(varchar(20), @ago, 108),':','') AS int);

DECLARE @states TABLE (
        run_date int,
        run_time int,
        run_status int
);

INSERT INTO @states
SELECT
        run_date,
        run_time,
        run_status
FROM
        msdb..sysjobhistory
WHERE
        run_date >= @DateAsInt
        AND run_time >= @TimeAsInt
        AND job_id = @JobId
        AND step_id = 0 -- job outcome (only)
        AND run_status = 1; -- succeeded (only)

DECLARE @successCounts int;
SELECT @successCounts = COUNT(*) FROM @states;

IF @successCounts < 1 BEGIN
        SET @Subject =
                '[SQL Server Backups] Transaction Log Backup Failure'
        SET @Body = 'The Transaction Backup Job (with ID '
                + CAST(@JobId AS varchar(36))
                + ' has not successfully completed in the last '
                + CAST(@NumberOfPastMinutesToCheck AS varchar(20))
                + ' minutes.';

        EXEC msdb..sp_notify_operator
                @profile_name = @DbMailProfileName,
                @name = @OperatorName,
                @subject = @Subject,
                @body = @Body;
END

------------------------------------------------------------------
-- Now check to see if any databases haven't
--              been backed up in last @NumberOfPastMinutesToCheck:

DECLARE @databases TABLE (
        [name] sysname
);

INSERT INTO @databases
SELECT
        name
FROM
        master.sys.databases
WHERE  
        recovery_model_desc = 'FULL'
        AND name NOT IN ('model')
        AND name NOT IN (SELECT
                DISTINCT database_name
                FROM msdb.dbo.backupset
                WHERE [type] = 'L' AND backup_finish_date > @ago);

DECLARE @failedDbs nvarchar(2000) = '';

SELECT @failedDbs = @failedDbs + name + ','
        FROM @databases ORDER BY name;

IF @failedDbs != '' BEGIN
        SET @Subject = '[SQL Server Backups] Transaction Log Backup Failure';
        SET @Body = 'The following databases have not been backed up within the last '
                + CAST(@NumberOfPastMinutesToCheck AS varchar(20)) + ' minutes: '
                + CHAR(13) + CHAR(10) + CHAR(9) + @failedDbs + '.';

        EXEC msdb..sp_notify_operator
                @profile_name = @DbMailProfileName,
                @name = @OperatorName,
                @subject = @Subject,
                @body = @Body;
END
GO

 

Discuss this Blog Entry 2

on Sep 24, 2015

There is a free maintenance tool, YourSqlDba, that aside doing all backups (complete and recurrent logs backups) ensure that a given set of databases are still in full recovery mode and hence have log backups.

This policy is enforced through as a parameter that can be specify multiple like filters and not like parameter on databases names.

YourSqlDba initial setup configures database mail for its maintenance, and send messages for this policy violation as well any other failure in doing backups (log included) or other maintenance actions.

See YourSqlDba on YourSqlDba.CodePlex.com

on Dec 4, 2015

Alternate way to monitor those - we are running a simple script against multiple 100+ servers to check SQL Error logs for backup failure errors and more...

Please or Register to post comments.

What's Practical SQL Server?

Practical advice, insight, and help for core SQL Server considerations.

Contributors

Michael K. Campbell

Michael K. Campbell is a contributing editor for SQL Server Pro and Dev Pro and is an ASPInsider. Michael is the president of OverAchiever Productions, a consultancy dedicated to technical evangelism...
Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×