One morning last October in the wee hours, Joe Peyton, Assistant Vice President of IT Operations at Redwood Credit Union (RCU), got a distressing phone call from his CIO.
“He told me his neighborhood was being evacuated,” Joe recalls. Wildfires were threatening Santa Rosa — not just the CIO’s home, but RCU’s headquarters, too.
Prepare for the worst, hope for the best
Joe, his CIO, and all available RCU IT staff sprang into action. By 5:30 a.m., the RCU CEO, the executive team, and a dozen staff had gathered at the credit union’s branch in a neighboring city about nine miles south of RCU’s Santa Rosa HQ site.
Although information was sketchy, the team knew RCU’s HQ building remained online — but for how long? The disaster for which RCU’s team so dutifully prepared had arrived on their doorstep.
“We immediately made two critical decisions,” recalls Joe. “First, we prepared to fail over to our disaster recovery site. Second, we agreed that until our headquarters building was out of commission, we’d use its infrastructure to run our operations. We wanted to be ready for a failover but actually do it only as a last resort.”
This decision was made with RCU’s member community in mind, because even the smoothest failover process requires some downtime, inevitably disrupting members’ ability to make purchases and access ATMs.
“We were determined,” says Joe, “not to add more stress to an already very difficult time for so many.”
Embers that got the adrenaline pumping
This gutsy call required above-and-beyond performance from both RCU’s IT staff and the people at Quest, RCU’s disaster recovery (DR) site partner.
Even though failover can be accomplished virtually, Joe hoped to run RCU’s operations from the branch. And the fastest way to accomplish that required some distinctly non-virtual human actions — i.e., physically retrieving as much equipment as possible from RCU’s Santa Rosa HQ building and bringing it nine miles south to the branch site.
But could Joe and his colleagues even get to the Santa Rosa location? “We decided to find out by going as far as we could,” he says.
Most traffic was fleeing in the opposite direction, so they made good time, reaching Santa Rosa by about six in the morning.
“It was still dark,” Joe remembers. “The fires were quite visible. We could see about a quarter-mile through the smoke, but it was the wind whipping the ash and embers around that got the adrenaline pumping.”
As RCU’s CIO stood firewatch on the headquarters roof, the team quickly grabbed everything they could — several key servers, some network gear, PCs, laptops, tablets, files — then stashed it all into several trucks and hurried back on the road south. By 7:30 a.m., they’d safely made the nine-mile return to the branch.
Quick response from Quest’s HABC
Soon after, Joe began his journey to the Quest High Availability Business Center (HABC) failover site about 120 miles away.
“I called our Quest account rep, Justin Trammell, at 9 a.m. to let him know I was on the road with a truckload of equipment,” Joe says. “I told Justin that if we had to fail over, we’d need help spinning up our gear and our DR processes. Justin’s response was, ‘OK, we’ll take care of you’ — and they did.”
Within 15 minutes, Joe notes, he received a call from Quest CIO Mike Dillon. Meanwhile, Justin alerted the Quest team, and Quest’s Josh Orchekowski was appointed operations point person for coordinating resources.
“An hour after I informed Josh that we’d require assistance racking in servers as well as expertise in VMware and NetApp, he called to say Quest was standing by. I pulled up to the site at noon, we had a short conference call with some Quest experts, and they took it from there while I worked on other things. I’d check in every thirty minutes and they’d ping me with any questions.”
“The only way we could have done what we did”
Within a few hours, the HABC site was ready to receive a failover, and by the second day, RCU would have been able to run its entire infrastructure from Quest’s HABC site.
“Being able to access those IT resources from Quest was a game-changer,” says Joe. “It was the only way we could have done what we did. We lacked sufficient staff and resources to simultaneously prepare for a full system failover and to keep operations running remotely in real time from a branch. Each of those efforts would have required all the RCU resources we could muster.
“Quest’s people put their lives on hold,” says Joe. “They went truly above and beyond. I was astonished at the resources made available to us — it was phenomenal.”
RCU’s headquarters building survived the fires, as did the CIO’s house (sadly, however, some 23 RCU employees lost their homes). Though critical systems were brought up to run in parallel and some ancillary services were rerouted at Quest’s HABC, RCU never required a full failover.
A new view of DR
Even so, the experience has reshaped RCU’s view of disaster recovery.
“We used to believe a warm-site failover with tiered recovery windows was sufficient,” Joe explains. “But, we’ve learned through first-hand experience that we need active-active failover with all of our services running in multiple locations.
“More importantly, we now regard DR as a core component of everything we do. It’s not a project add-on; it’s part of the initial discussion, just like a rollout. We don’t use DR review sessions to determine whether and how we should recover something — that’s already built into our initial planning. We now use DR reviews to address strategic issues.”
‘We will be here for you’
While the ordeal provided RCU a better understanding of its DR operations priorities, affirmed its trusted relationship with Quest, and highlighted its IT staff’s stellar work ethic, it also renewed RCU’s community commitment. This is reflected in the outpouring of aid for the North Bay Fire Relief Fund, established by RCU in partnership with other community members.
The response to the Fund, says Joe, has been deeply heartening. “We are thrilled to be able to look the community in the eye and say, ‘We will be here for you’.”