A Study Case: When We Need a Disaster Recovery Plan (DRP)?
Last week, while I was working on a college assignment for an Operating System course, I came across a very interesting question. I find this very interesting because the assignment given by my lecturer isn’t only about the theory contained in the textbook, but a deeper analysis that is also related to data, architecture, and system security.
The question is:
“There is a large company with 300 employees and has 10 branch offices (both domestically and abroad). This company has one server in each branch and the headquarter has 3 servers. Because data is very important and must always be accessible, you as an IT Expert are asked to analyze this problem and provide advice to the company to ensure data so that data can always be accessed and the company’s servers are always active.”
Pretty interesting isn’t it?
When doing analysis, it is helpful to break down the questions. In this case, I started asking questions about the big picture first, then more specific, like:
1. What components are in a system architecture?
2. How do all the components in the architecture work to access the data?
3. What preventive measures can be taken so that the data can always be accessed?
And then, I started to answer the questions one by one.
What we know
a. Employees: 300 people
b. Branches: 10 offices (domestic & abroad)
c. Number of servers:
— 1 server for 1 branch, total = 10 servers
— At headquarters, total = 3 servers
How we Resolve
To answer the first question, I tried to make a simple architectural design according to the case above.
- In the system architecture, there needs to be a special server that is used as a backup server. As in the image we created below, the Data Center is intended as a data backup server location. Data backup servers can be on-premise (having our own physical server) or cloud (eg AWS, Microsoft Azure Cloud, etc).
- In the architecture, a Load Balancer is also provided to share load access to a central server. Thus reducing the risk of bottlenecks when server access is high.
In the implementation of the architecture, it is necessary to carry out a DR Drill (Disaster Recovery Drill) to test if there is a server-down scheme. Let’s take an example if the Central Server is all down. So, the server at the branch office must be able to directly connect to the Data Center / Backup Server.
Things to note:
- Recovery Time Objective (RTO). The server must be up again within the time limit that has been set. For example, 6 hours from downtime, the business process has to continue so the server must have been up (or accessible).
- In order to implement the above scheme, a clear configuration document is required for applications, servers, networks, etc.
From the points above, to overcome the case (which is categoric as a disaster), we can put it in a Disaster Recovery Plan (DRP). My reading source, Computer Weekly write that
“ Disaster Recovery (DR) initiatives provide strategies and procedures that can help organizations protect investments in IT systems and infrastructure. The essential mission for disaster recovery is to return IT operations to an acceptable level of performance as quickly as possible following a disruptive event.”
We also can check the picture below from International Standard ISO 27031:2011, developed by the International Organisation for Standardisation (ISO), the IT disaster recovery process has a standard process flow, based on the ISO plan-do-check-act model.
Summary
A Disaster Recovery Plan (DRP) maps out the processes for returning to normal business operations, reconstructing or salvaging critical files and equipment, and other critical matters, as well as being a guide for all managers and employees during and after a disaster occurs. The key elements of the plan fall into three categories:
- Those that are common to all parts of the plan;
- Relating to the resumption of business operations; and
- Relating to the reconstruction or salvage of important company documents.