High Availability With DRBD

by Greg Eckert and Walter Lucero

High Availability is the act of ensuring business continuity with 99.999% up-time, despite a component failure.  The success of your business depends on having access to your data. If you are unable to conduct business, your customers won’t wait long before going someplace else.  We’re all familiar with system outages: “The server is down”; “the E-mail won’t load”; “our payment system isn’t working right now”.  Continuity problems like these are problems that companies need to resolve thus there are two primary goals of High Availability solutions:

  1. Data protection: keeping your data safe, even if a transaction is made during a system failure.
  2. Business continuity: ensure your systems up and running, even during critical failures.

One popular solution is LINBIT’s open source product, DRBD. By connecting two of your existing Linux servers together, and installing DRBD, your company can replicate its data to protect digital information and ensure that services stay up and running. No matter what type of data – customer records, intellectual property, virtual machines, financials  (see figure 1). If your back-end systems run on Linux, and you can write your data to a hard drive, DRBD can replicate the data to ensure that you never lose business because of unexpected downtime and data loss.

26-drbd-schematic-1-600x300
Figure 1: Schematic for a typical 2 server DRBD installation

Backups are not enough. Enterprises and small businesses alike need full High Availability and Disaster Recovery solutions!

Without High Availability, operations will halt during a network failure, or interruption. These technologies are what keep names like Google, IBM, and Cisco up and running at all times.

High availability servers are designed as a completely redundant setup. They are based on standard technologies that achieve an availability of more than 99.999% by eliminating single points of failure. Table 1 demonstrates how much downtime is experienced in reference to percentages of annual uptime.

Table 1: Downtime chart
Annual Uptime (%) Downtime (pa)
99 3d 16h
99.9 8h 46m
99.99 52m
99.999 5m 12s

If an organisation’s systems are up 99% of the time, this actually means that they had over 3 days and 15 hours of downtime the past year – This is a huge cost in lost productivity, employee frustration, and customer satisfaction.

A High Availability IT solution has a certain price. However, these costs are significantly lower than the alternative financial loss following a system failure. A single hour of server downtime can cost far more than an entire High Availability solution.  How much does an hour of downtime cost your business? Don’t just think about the lost productivity, but the loss of data, employee frustration, and customer confidence.

DRBD, is available for download from www.linbit.com. Up-time guarantees, as well as service and support options, bug fixes, and certified binaries are available through LINBIT’s Enterprise Support Subscriptions.

DRBD (High Availability)

DRBD is used for High Availability purposes. It is a software product used to replicate data in real time from one server to another. This ensures business continuity even in the event of hardware failure.

What sets DRBD apart from competitors?

(Shared Everything vs Shared Nothing Replication)

The old method of replicating critical data was to use the “Shared Everything” (shared storage) method using proprietary hardware. Essentially, this means connecting two servers to a central box, which stores replicas of the data you are writing to each server.

What’s the problem? Well, for one, these boxes are expensive. However the problem gets worse: If the central box fails, you lose your data. So essentially, with this method there is still a “single point of failure” in this central storage unit.

That’s not all: when the customer decides to upgrade their device, the data transition takes a very long time. During this time, the systems must be down; meaning that the system meant to prevent downtime actually causes a “planned outage” upon an upgrade to newer hardware... and you’re paying a lot to use it.

The new method is called the “Shared Nothing” approach. This method allows for the use of any Linux commodity hardware. We use software in order to mirror data being written from one server to another, when connected together with a dedicated connection. If one box fails, services simply transition to the other – No data loss, no downtime.

This is otherwise known as the “Linux High Availability Cluster Stack” method. One major benefit to this method is that customers can use their existing Linux servers, and even their existing SAN devices. This makes DRBD extremely affordable, and doesn’t have the downsides mentioned above for hardware based solutions, as rolling software upgrades are seamless.

26-drbd-replication-2-600x348
Figure 2: Replication between nodes using DRBD

The 3 layers of the Linux HA Cluster Stack

1. Replication Layer

DRBD: the software behind Linux data replication. It is built into the mainline Linux kernel, and can be installed to replicate data from one server to another in either “Synchronous” mode (at the same time the data is written to the primary server) or “Asynchronous” mode (just a few microseconds after).

2. Cluster Communication Layer

Heartbeat/Corosync: facilitates communication between the connected servers. Information needs to be shared so each server node recognises each others state, either “Active” (being used) or “Passive” (storing replica data, ready to spring into action if the active server fails).

3. Cluster Monitoring Layer

Pacemaker: the software used to watch services on each node. It handles starting/stopping/monitoring the servers. Pacemaker uses heartbeat to communicate failures automatically. Pacemaker works with most Linux distributions.

Using all three layers, one can easily transition services to a working server when a hardware component fails. It should be noted that DRBD, Heartbeat/Corosync, and Pacemaker are all Open Source products, therefore free to download and test. Setup, configurations, and performance tuning, however, are not trivial.

Therefore, for critical applications we must suggest implementation and support assistance. No one wants to be left alone during a critical failure.

26-drbd-configuration-3
Figure 3: Cluster components

True Disaster Recovery with DRBD Proxy

Like DRBD, DRBD Proxy replicates data in real-time. The difference being that DRBD Proxy allows for data replication across distances and over the internet. When replicating to a separate physical environment, the term “High Availability” changes to “Disaster Recovery”.

Disaster Recovery is the intended purpose of DRBD Proxy. Disaster Recovery is not simply a back-up of your data at an off-site location.

Just because your data is backed up does not necessarily mean that a transition to using this data will be quick and painless; in fact, this process can take anywhere from hours, to days, to weeks! Disaster Recovery is the capability of providing business continuity, even during an outage at your primary site.

DRBD Proxy software allows you to replicate all of your data from one location to another in real time, without hurting local write performance. Outages are automatically detected, and transitions are simple.

This means that even if your primary site is compromised, your secondary site will be up and running in minutes, or even seconds, all using your existing infrastructure.

DRBD Proxy software is the best way to ensure your business is protected from disasters. DRBD Proxy isn’t a disaster recovery plan, it’s a solution that works across data-centres, over long distances, and even in the cloud. Many enterprises are replacing their expensive, proprietary Disaster Recovery solutions.  (Even companies that sell them!)

Benefits

Running DRBD on linux servers offers an array of benefits as shown in Table 2 below, and as figure 4 indicates DRBD is available for a wide range of solutions.

Benefits of using Linux based systems Benefits of using DRBD Problems Avoided by using DRBD
Reduction in hardware costs Acquisition costs low, 2 servers Data loss
Use standard hardware components No license costs
Maintenance costs
Reduction in software costs Manufacturer dependency low Production failure
No license costs for proprietary software Standards open, flexible Revenue loss
Avoid dependencies Use commercial off-the-shelf hardware Staff costs
No vendor-lock-in Open Source software
Delivery delays
Boost to compatibility Flexible infrastructure options
Customer loss
Use of open standards High Stability and availability
Recourse / Penalty
Minimize downtimesHigh stabilityIncrease control and security

Open source codeQuick troubleshooting 

System management easy and fastSynchronous and asynchronous modes of replication
Block level mirroring / Application agnosticTransaction Safe. No data loss
Loss of reputation/Bad press

Table 2: Benefits of the DRBD high availability solution

26-drbd-availability-4-600x398
Figure 4; DRBD is available for a wide range of system solutions

 

Alternative Configurations

DRBD can be used in a number of different server configurations, the simplest (and lowest cost) being a simple 2 node active/passive server pair. Two other 2 node options exist and there are 3 and 4 node options as well as Table 3 (below )shows:

Number of Nodes Description
2 Active/Passive
2 Active/Active: 50% of processes on each server.  No DR site.  Fencing recommended
2 Primary/Primary:  100% of processes active on each server.  Requires Fencing (e.g. Stonith).  No DR site
3 Active/Passive - DR - Passive/Passive
4 Active/Passive - DR - Passive/Passive

As you can see, with DRBD,  high availability is no longer an expensive project.  We have customers with different needs that are:

  • Using SUSE, RHEL, Debian, Ubuntu Server LTS as Xen/KVM servers running many VMs, using DRBD for HA with relocation features.
  • Running critical services without virtualization that need HA (ex. Web servers, Databases, Mail, File server, etc).
  • Replicating expensive storage units e.g. from IBM/Dell/HP, without the need to buy expensive vendor replication licensing.

Of course, if you are building a cluster we certainly recommend SUSE Linux or RHEL because of their corporate services and support.

Next time

This was just a short introduction to the power of DRBD.  In the next article we will describe the technicalities of installing DRBD with SUSE Linux.

[Greg Eckert works for LINBIT in the USA]

(This article was first published in OHM26, Q2/2014, p9-12)

Leave a Reply