Rebuilding and Fake Catastrophes

A coworker stopped by earlier. It seems our service management team is planning the annual disaster recovery exercise, and somehow his name got thrown around for something I’d been involved with previously. I haven’t dealt with DR myself, though, for years since the job change. But the discussion sparked a memory that it seems I never shared.

Back in the day the company had hired a consultant who’s purpose was to help us get all our disaster documentation together. This included getting the “how to rebuild” recovery documents, but also to codify how long they would take to execute, etc. Because our systems are so thoroughly interdependent, most of my stuff would be dependent on other systems being online first, and I had a lot of trouble explaining this.

But the guy sat down with me to go over this. “How long would it take,” he asked, “to rebuild the portal?”

“Well it depends. Just what kind of disaster are we talking? And how close to today does it need to be?”

“Like if (datacenter) went away. No backups, nothing, what does it take to get it back exactly like it is today.”

“Well if PeopleSoft went away, and Active Directory went away, then there’d be no way for user accounts or roles to be provisioned so nobody could log in…”

“Ok skip that.”

“So, what, (datacenter) got nuked but somehow the PeopleSoft servers are still standing?”

“No, but… well let’s just skip that.”

“Ok. Well then we have to consider that the decade of content, images, text, documents, etc. that the Corporate Communications team has loaded into here would need to be recreated, from scratch…”

“Wait, why?”

“… because that’s what the portal is? They log in, to view the content. Without either of those it’s nothing. So they’d need to recreate all of that which would probably be a task for at least six months straight.”

“Ok so what if we just say the portal went away.”

“So somehow the portal servers were targeted, and literally nothing else was harmed or damaged in the process, but it also happens to have gotten the backup system? We’re getting a little specific here…”

DR is important, but the arbitrary tests that the non-technical people come up with to test it is ridiculous.