Deadlocks have received a large amount of coverage—in a wide variety of resources.
As such, the point of this post isn’t to call out anything special about debugging, tracking, or correcting deadlocks. Instead, this post is focused on a single concern: raising or triggering email (or other) alerts when a deadlock occurs on a given server—or, in other words, simply being (proactively) alerted every time a deadlock has occurred on one of your servers. In my mind, such an arrangement can be a great thing to implement in situations where you THINK that deadlocks might be happening or where you’ve heard from a few users that they’re occurring and where many of the applications using your servers aren’t set up to adequately raise alerts or errors when they encounter deadlocks.
Of course, setting up alerts for deadlocks should be pretty easy, right? All you have to do is go into the SQL Server Agent, set up an alert for Error 1205, set up who to email/alert when it happens, and you’re done, right?
Well, unfortunately, setting up alerts for deadlocks isn’t that simple. For starters, while Error 1205 is the most common error number associated with a deadlock, Errors 1211 and 3928 are also associated with deadlocks within SQL Server (though Error 3928 is reserved for a particularly nasty/non-typical issue relating to deadlocks). So. There’s that issue for starters (i.e., if you want to be thorough, you’ll have to set up alerts for all three error numbers.)
Then there’s the fact that once you set up an alert for, say, a 1205 error, and then generate a deadlock on your system, the alert won’t actually fire.
To test this out, you can do the following:
First, create the alert and set up some sort of response. You can do that by running the following script, for example:
Or you can do this by creating a new alert from within SQL Server Management Studio, as the following three screenshots indicate:
Then, once you’ve got the alerts set up, you can easily create a deadlock on your server by creating two simple tables (in a test/meddling database somewhere) :
Then, with these bogus tables created, you’ll have to open up TWO different query windows (which I’ll call Window 1 and Window 2).
In Window 1 paste and run the following:
And in Window 2 paste and run this:
At this point you've got both tables A and B blocked—by two different threads. To create a deadlock, all you need to do is have both of these threads try to 'reach over' and grab the resource that their opposite thread has already locked.
So, paste the following into Window 1, and run (just) it:
And you'll see that it doesn't complete—or that the query stays as 'Executing….'
Now, paste the following into Window 2, and execute (just) it as well:
Then, within about 5 seconds, one of the windows will be picked as the deadlock victim, raise a 1205, and the other will complete without any hint of a problem.
And, now, for good measure: go run a ROLLBACK command in the window that didn’t get picked as the deadlock victim – to clear your long-running transaction.
Then either go check your email or look at the Properties > History of the 1205 Alert you just created and note that you didn’t get an email and that the History tab shows that your alert has never fired—even though you JUST raised a 1205 error on the Server. And, the crazy thing is that you’ll find plenty of people complaining about this particular issue in various forums all over the internet. The interesting thing is though, that in all the forum posts I came across not one had an answer that actually solved the problem of why these alerts won’t raise or get sent when they happen.
After struggling a bit on my own with this problem (i.e., why my servers wouldn't send alerts on Error 1205) and after making a few passes at various forum posts online, I finally came to the conclusion that the issue with 1205 errors is that they're not logged. Stated differently, when these errors do occur on a given SQL Server, NOTHING within SQL Server writes the fact that they occurred out to the log. And, since the log doesn't get written to, the Alert (for 1205s, for example) can't ever be triggered or raised.
Sadly, a couple of forum posts I found online made incorrect mention to needing to have TRACEFLAG 3604 and/or 3605 enabled—which is just plain bad advice. A few others mentioned the need to turn on Trace Flags 1204 and 1222. That's not bad advice, per se, but it also doesn’t solve the problem of raising alerts when deadlocks occur. (That said, if you're interested in having SQL Server drop verbose logging information into the logs whenever a deadlock occurs, then I recommend taking a peek at this tutorial over on MSSQLTips.)
Truth is, trace flags are NOT the answer here. Yes, many trace flags can/will control what gets logged under various circumstances, but to really solve this problem you have to think a bit about how SQL Server actually raises and manages errors. (Or, at least, that’s that I had to do to solve this problem.) And since SQL Server can be run, natively, in a bunch of different languages and since there are a ton of different messages (or specific errors that it can raise), that's where master.sys.messages comes into play—as it's the internal table that SQL Server uses both for 'out of the box' error messages and any custom error messages you may have deployed to your server.
And, as it stands, there's a column in sys.messages that determines (or controls) whether or not a given error is LOGGED when it occurs or not: is_event_logged. So, logically speaking, if you want to be able to raise events on Error 1205 (for example), you just need to toggle the is_event_logged flag to true (or 1) for all 1205 messages. And while changing any default/out-of-the-box behavior for SQL Server is something I always think good-and-hard about doing each and every time I do it, there's (in fact) an easy way to make these changes: sp_altermessage. (And, just so we're clear: using sp_altermessage isn't quite as vile as something like ad-hoc system updates, but it's easily something we could consider close to a 'gateway' script—so I'm not advocating careless use of this functionality.)
So, for example, to modify 1205 errors, you’d just run the following:
Then, once that's done (and you’ve modified 1211 and/or 3928 messages IF you want to), then you can go ahead and re-run the deadlock test defined above (and remember to ROLLBACK or COMMIT your non-victimized thread when you're done), you'll see that your custom 1205 alert fires like it should.