This is a pleasant reminder to check your backups. I don’t mean, “Hey, did the backup run last night? Yes? Then all is well.” That’s slightly better than nothing, but not really what you’re checking for. Instead, you’re determining your ability to return a system to a known state by verifying your backups regularly.
Backups are a key part of disaster recovery, where modern disasters include ransomware, catastrophic public cloud failures, and asset exposure by accidental secrets posting.
For folks in IT operations such as network engineers, systems to be concerned about include network devices such as routers, switches, firewalls, load balancers, and VPN concentrators. Public cloud network artifacts also matter. Automation systems matter, too. And don’t forget about special systems like policy engines, SDN controllers, wifi controllers, network monitoring, AAA, and…you get the idea.
Don’t confuse resiliency for backup.
When I talk about backups, I’m talking about having known good copies of crucial data that exist independently of the systems they normally live on.
- Distributed storage is not backup.
- A cluster is not backup.
- An active/active application delivery system spread over geographically diverse data centers is not backup.
The points above are examples of distributed computing. Distributed computing is not about recovering from a system-wide failure or fundamental data corruption. Distributed computing is about maintaining application availability in the face of elastic demand and component failure. This is not backup.
Are you backing up the correct things?
Systems change over time. What you configured the backup software to grab three years ago might not be sufficient today. Are you grabbing the correct device configurations? For all devices that matter? Including the devices you turned up just last month?
If you’re treating infrastructure as code, is version control sufficient? Have your prepared for a version control system failure? After all, that’s where your golden configs reside. What would happen if, say, your GitHub repo was wiped out?
If you run a private ops server of your own with scripts or playbooks on it, is that precious data being backed up? Or is it all in a VM on your laptop representing hours and hours of work you could never recreate if something went sideways? You could at least take a snapshot and copy it offsite now and then.
Is your retention schedule correct?
It’s nice to have last night’s backup. But what if you need last week’s backup? Or last month’s? Bad things can happen to data, and you might be backing up garbage without realizing it. Have you reviewed your retention schedule to know how many backups over what period of time you have access to?
It could also be that the retention parameters you defined back in the day are no longer valid due to policy changes. Maybe regulations impacting your company have changed. Maybe there are new SecOps guidelines for data retention that impact your backup routine.
Yet another consideration is storage space, cheap though it is, being wasted with needless ancient backups. You probably don’t need five years of dailies. It’s good to clean up once in a while, and then set the retention schedule to something that cleans up after itself so you don’t have the “five years of dailies” problem again.
Will your backup survive a natural disaster?
One reason we backup is to recover from fires, floods, hurricanes, earthquakes, solar flares, extraterrestrial attack, and the like. If all copies of your backup are in the same physical location as what you’re backing up, that’s no good. Get some copies off site. These days, that probably means a secure location in the cloud.
Note the word “secure”. Please no more crucial data in unsecured S3 buckets, okay?
Can you access the backup media?
Recovering from a system failure is not the time to realize you can’t access the location of the backup data. It’s a good fire drill to go through now and then to be sure you know two things.
- Where the backup data lives. You forget these things if you haven’t thought about it in months.
- How to access to that backup data. Maybe you’re backing up to a shared S3 bucket, but credentials have changed and no one told you. Maybe the Dropbox folder shared with your team is no longer syncing for you because your access was removed.
If it’s been months or years since you’ve thought about this, assume nothing.
Can you decrypt the backup?
If your backup is encrypted, can you decrypt it? Kind of a big deal, and not something you want to have to think about when under the gun, especially if being asked to supply a key or passphrase unexpectedly.
If you’re on vacation, can someone else recover from backup?
This point is really about your emergency runbook, because of course you have one of those. The runbook explains how to recover a variety of device types from backup. Is that runbook up to date? If you’re not entirely sure where the runbook even is, then no, it’s not up to date.
Perhaps once a quarter, it’s a good idea to review your emergency runbook. Follow the procedure as listed. Given the instructions in the book, can you successfully recover a device from backup?
When’s the last time you did a backup health check?
Backups are crucially necessary and incredibly boring all at the same time. We almost never need backups, and so they tend to fall down the task list next to “update interface descriptions to the new standard” and “write the new standard for interface descriptions”. Yet, when disaster strikes, the most important thing in the world might be recovering from that backup data.
Have you thought about backups lately? Maybe it’s time to bump them up the priority list. Oh. Backups weren’t on the priority list? I see…I see.
This post was inspired by my own impromptu backup review of a web server I operate. I discovered that one of my backup file repositories hadn’t been updated since 2018, and the retention schedule was not what I had thought it was. I say this to my shame, but that’s kind of the point. These things are easy to forget when the sun is out and everything is millions of peaches.
If you have your own backup tips or horror stories, you can share them with me @ecbanks on Twitter or via the Packet Pushers Podcast Network Slack channel. Maybe we’ll record a podcast together about it.