Following up on my previous post, when it comes to the need to create off-box backups, there are really only two (well, three) main reasons you’d want to do Off-Box Backups:
First: Redundancy. As I pointed out in my last post: If you’re only keeping backups and data on the same server or hardware, then you’re DOING IT WRONG. From an elementary Disaster Recovery (DR) standpoint you always need a copy of your backups ‘mirrored’ to at least one other location. Drives can fail, RAID controllers can fail and take drives/data with them, and a host of other REALLY UGLY things can happen to data stored on a single host/server. Without off-box backups, then, you’re a sitting duck. And therefore, the reason, in this case to have off-box backups is a question of simple redundancy. (And, as I pointed out in my last post, an additional benefit of ‘off-box’ backups is that you can commonly store these redundant backups on less-expensive (and more voluminous) storage where you can actually, typically, keep copies of backups longer than you could on your primary server. But again, as I mentioned in my last post – you want to make sure you’re keeping backups locally on your primary server as well – to help avoid incurring the cost of pulling backups over the wire WHEN you need to recover. So, in this case, think redundancy of your data.
Second: Closely related to the first reason for why you’d want to keep backups in off-box locations is the simple fact that sometimes entire servers can fail. A Windows Update, or the addition of a driver might render a box completely non-responsive. In which case, trying to get backups off of that box, or it’s RAIDed HDs is going to be nothing short of a nightmare. That, and the point of this series of posts is to describe how to effectively set up ‘luke warm’ standby servers. So, in that case, knowing that you need a redundant location for your backups, and knowing that you might need a redundant HOST or server for your databases, this second ‘reason’ for wanting or needing redundant, off-box, backups plays in very well with the notion of why you’d want or need off-box backups.
Third: Also very closely related to the notion of failure (are you noticing a trend?) is that IF you can lose RAID arrays and/or entire servers, it’s also possible that you might lose entire data-centers. Natural Disasters and a host of other nasty things can and do happen to data. Yes, these kinds of scenarios are pretty rare. But what happens if all of your data is either lost due to fire/flood or ends up being submerged or inaccessible for 2 weeks? To address cases like this, organizations need what I like to call a ‘smoke and rubble contingency plan’ where they’re keeping some sort of backups off-site.
How you manage to get your backups off-box really depends upon a huge number of factors, such as how far you’re pushing data (are you pushing it to a local server in the same office/subnet, or is it going off-site to the cloud or another data-center?), and what kinds of infrastructure and connectivity you have at your disposal. Security of the data (i.e., who can access your backups and/or potentially peek at them going over the wire) is also another essential concern. So too, of course, is a question of determining just how important this data is in the first place – which is determined by ascertaining how long business can operate WITHOUT access to this data and how expensive it is to business to LOSE some of this data once it’s been recorded. (And figuring these details out (along with quantifying them) is facilitated by means of clearly establishing RTOs and RPOs.)
For simple, local, backups that achieve basic data and host redundancy, you’ll just need simple backups. The destination of these backups can range from targets as diverse as simple File Shares where ‘dumb’ copies of backups are securely stored, or range clear on up to stand-by servers running SQL Server (or where SQL Server could be installed in a pinch) as a means of firing up a failover option should primary hardware fail. (And again, there are a HOST of other options that will provide you with MUCH better recovery-time in the case of disaster – such as High Availability solutions like Log Shipping, Mirroring, and (in some cases) Replication – but I’m not talking about those here. Instead, I’m talking about extending basic DR/Redundancy practices into the notion of giving yourself an additional option for failover if/when that makes more sense than launching into full-blown HA solutions. Or, of course, in cases, where you want additional coverage IN ADDITION to your high-powered HA solution.)
To copy backups from your primary host and backup/redundancy locations, you can use a host of different technologies and solutions. Many solutions for this kind of simple file transfer are available directly from within Windows itself – such as XCOPY/RoboCopy and even Distributed File System Replication. It’s also possible to use a host of different utilities and third-party offerings that will ‘wrap’ XCOPY/RoboCopy commands via GUIs and so on. Personally, I’m a huge fan of SyncBack. It’s dirt cheap to deploy out into the wild, and I’ve been using it for years on my local network for backup purposes. I’ve also got it set up with a couple of clients as a means of both copying files from a redundancy standpoint and as a means of aggregating files into centralized areas so that off-site backups solutions can then push them off-site and so on. I also like how well SyncBack interacts with Volume Shadow Copy as well – a big win in environments where there’s a lot of disk activity going on (even though the use of shadow-copy incurs more resource usage, it does so to help avoid problems with contention that might otherwise ‘break’ or crash backup-copy operations).
Whichever technology you end up using, just make sure you keep tabs on a few things:
To achieve data-center redundancy, you’ll need to push your data off-site. And, depending upon the size of your data (or backups) this is where things can get tricky, complex, and even ugly.
Larger Organizations with Multiple Sites
If you’re a larger business or enterprise you very well may have SAN replication technology and/or dark-fiber stretching between data-centers. When possible, you’ll obviously want to use this connectivity and synchronization as a means of achieving off- box backup. The problem, of course is that it can become ‘political’ or complicated. So, the only thing I can advocate here is to acknowledge that some data (or backups) is more important than others and you may have to adopted a ‘tiered’ approach to off-site synchronization. (And don’t forget that RPOs and RTOs – along with SLAs – can be a great tool in wading through political concerns.) That, and IF you find that you’re in a situation where there’s lots of politics and lots of chefs in the kitchen in terms of orchestrating the synchronization of off-box backups, then not only does regular testing and validation of your backups continue to make sense (as always) but you should really undertake an aggressive policy of regularly checking/verifying backups in your remote locations as well – as it’s entirely too easy for something out of your control or area of expertise to ruin those regular backups – and regularly checking them is the only way to determine their validity.
Regular Tape Backups (Be Skeptical)
Many organizations take regular, taped, backups on a nightly or weekly (or whatever) basis and then ship them offsite to be stored in a vault somewhere. These kinds of backups ARE viable in many cases as part of a smoke-and-rubble contingency plan. However, there are a couple of things to be aware of.
For Small to Medium Businesses with just a single data-center, the creation of smoke-and-rubble backups can be problematic. Traditionally, these companies have only had the option of regular, off-site, taped backups hosted or managed by third parties (as I’ve just described above). And, in far too many cases, these backups are simply not tested adequately to provide the kinds of protection needed and, instead, far too often just become superstitious rituals performed regularly – but with no benefit.
Which, in turn, is why ‘the Cloud’ has become such a great option of late. The only problems of course, to deal with when it comes to doing off-site backups into the cloud become concerns of security, sizing, and throughput for the most part. And, in my experience, throughput is the biggest problem when it comes to off-site backups in the cloud because, obviously, it’s going to be hard, for example, to push (say) 100GB of backups up into the cloud on a daily basis for many organizations – simply because they’re not going to have the upstream needed for that.
That, and not all cloud-backup offerings are the same – to the point where some simply can’t keep up with large-ish amounts of data even when there’s a large enough amount of up-stream. Over the past few years I’ve worked with a number of different cloud-backup offerings to help keep client backups synchronized off site. And while JungleDisk used to do a great job of merely ‘exposing’ Amazon S3 storage as a local drive (which could then be used with something like SyncBack), I’ve found that in the last few years it simply can NOT keep up with decent demands for synchronization – to the point where it couldn’t keep up with 30GB of changed files/backups in a single day. Granted, you can only push so much data through a ‘straw’ at a time – and upstream is your biggest concern here. But, in terms of that same 30GB of ‘churn’ per day, whereas that would literally take LONGER than a day to push up to S3 servers via JungleDisk, I found that DropBox (of all things) was able to push up that exact same data in about 3-4 hours. (Well, the bulk of it – in the form of FULL/DIFFERENTIAL backups taken early morning – then the log-file backups would cruise up all day long – whereas JungleDisk would take > 1 day to push up the FULL/DIFFERNTIAL BACKUPS and never get to the T-Log Backups). And this, of course, was on the exact same hardware and systems.
So, if you’re going to look into off-box backups using the cloud, there are couple of VERY key things to look into:
Of all of the cloud solutions that I’ve used over the years, one stands head and shoulders above the rest: DropBox. Which, frankly, is irritating. Because DropBox is a CONSUMER-targeted solution that comes with a few limitations. First and foremost, it’s got small plans (50GB/100GB) – though it does offer ‘collaborative’ plans for teams with great storage. The bigger issue, though, is that DropBox is NOT designed to run unattended. Instead, it’s specifically designed to run as a user-level process that kicks in when a user logs-in to their machine. Not at all what you want in a server-level backup – especially when this process terminates when you log off (or doesn’t spin-up until an end-user logs-in after a server is rebooted). Happily, there ARE work-arounds around this, but they’re a bit ugly to use. That said, I’ve got a couple of clients using DropBox, and the big benefits that this solution provides are that it’s cheap, it’s fast, and it’s VERY easy to get at uploaded files once they’re in the cloud. (One of my tasks for the next year is to test out how SyncBack 6.0 works in terms of it’s integration with S3 – because I’m guessing that it might end up being the absolutely best option out there just given how fast I’ve found S3’s servers to be, the fact that you CAN encrypt your data when stored on S3 (or when pushing data up), and due to the fact that you can get at data stored on S3 servers from basically anywhere, and because I’ve found SyncBack to be so reliable in the past.)
Of course, as with everything else, merely PICKING a backup solution isn’t enough. If the backup solution is going to worth anything, you’ll have to regularly test whether or not you’re going to be able to pull down data and set it up as a failover source. And in that regard you’ll either be able to get the data back up, or you won’t. But even if you ARE able to get it back up and running, you then need to make sure that doing so is in accordance with your RPOs and RTOs. So, in my next post I’ll look at ways to put this redundant data quickly to use in the case of a disaster or emergency.