Downtime, Data-loss And Cloud Computing

by Greg Eckert and Thomas Kozlowski

The saying goes, “The sign of a good IT team is that they never get noticed.” This means that IT systems stay up and running without disruption, internal and customer facing applications work flawlessly, and data always seems to be accessible when necessary. For better or worse, the world now runs 24x7x365, meaning any amount of downtime and/or data-loss is unacceptable.

IT failures, however, are inevitable. Although devices are getting faster and less expensive, all IT products eventually fail. The key is to mitigate data-loss and downtime during failures. Computer drives arguably fail more than any other IT component, which is why experts created RAID.

However, disks and drives are only one part of the equation and IT admins know that keeping infrastructure up and running means more than having a few RAID arrays.

The concept of mitigating downtime during server failures and outages is called High Availability (HA). Many consider services to be Highly Available when they have 5 9’s of uptime. That is 99.999% uptime, or 5.26 minutes of downtime per year.

Why has High Availability become so important over time? Aberdeen Group surveyed IT managers in May 2013 for IT Business Preparedness: A combination of Business Continuity and Disaster Recovery. They found that medium sized firms lost over $215,000 per year in downtime and data-loss.

Because of these high costs, IT folks have developed impressive HA technologies to keep data safe. Typically, these solutions are proprietary; and since proprietary products generally bring an expensive price tag, someone had to bring High Availability to the masses.

To fill this need back in 2001, LINBIT was founded in order to make solving downtime issues an inexpensive and open source endeavour.

31-drbd-1-schematic
Figure 1: DRBD Schematic

For the last 15 years, LINBIT has been developing the Linux kernel module, DRBD, to mirror data from one server to another in real-time.  With the ability to replicate block data synchronously, it is one of the few software products that can match hardware based replication performance and provide enterprise grade availability, all while keeping cost in mind.

To top it off, its flexibility allows users to run the software underneath any application (or multiple applications). As a testament to its stability, DRBD was accepted into the mainline linux kernel in 2009 with kernel version 2.6.33.

Nowadays LINBIT and their software DRBD, is the world’s leading open source High Availability and Disaster Recovery software. Customers include IBM, T-Mobile, and Porsche to name a few, and LINBIT partners with organisations like SANDISK, RED HAT, SUSE and HP.

DRBD is downloaded 10’s of thousands of times per month, and LINBIT supports enterprise users from their offices in Vienna, Austria and Portland, Oregon (USA). DRBD is one of few replication technologies excelling at both synchronous short distance and asynchronous long distance replication.

A Changing Market Landscape

Over the years, company policies about data have become more stringent. Data is increasing exponentially. For example, large telecoms are under new regulation to hold more data for longer periods of time. Other law-abiding companies follow legal requirements meaning they need to hold on to more data than ever before. Luckily, the commoditisation of storage helps companies keep old data for extremely long periods of time. Now that organisations are storing more data and need quick availability, a popular trend is to turn to resilient cloud hosting providers.

31-drbd-2-replication options-600x392
Figure 2: Replication options

Utilising cloud infrastructures, companies can safely hide behind 100% uptime guarantees and secure data centres to hold the information that they value so dearly. These massive data centres must make a decision on which storage technologies to use for their infrastructure.

One option is to choose a high performance infrastructure and pay a proprietary vendor large sums of money to keep it protected behind SAN or NAS hardware.

Another option is to use a software defined storage model with technologies like Red Hat’s Ceph, which allows the use of commodity hardware.

Although the commodity hardware combined with Ceph offer far greater scalability and less cost than proprietary storage, the downside is that Ceph’s performance is not designed for high performance within a single volume. This means that scalable software-defined storage technologies like Ceph are not a good fit for database workloads.

31-drbd-3-alternatives
Figure 3: The DRBD alternative

This leaves a big gap in user needs vs technologies available. Large datacentres need scalable high performance storage for high IO applications. They also need enterprise level data availability.

LINBIT’s newest DRBD9 release addresses these needs by allowing large datacentres to keep up with IT availability needs, while providing maximum speed at scale. LINBIT is closing the gap between hardware and software defined storage technologies, with its Operating System defined storage technology, DRBD9.

High Performance, Resilient, Scalable Storage

The upgrade from DRBD8 to DRBD9 includes a host of new features. In the old paradigm, companies rarely needed more than 1 copy of their data. With datacentres hosting data for all types of organisations, increased IT regulations, and increased data reliance, 1 copy just isn’t enough.

DRBD9 allows for organisations to hold over 30 data replicas at one time. This means that the important database that your company uses every day can be on 4 servers at once, making extended periods of downtime very unlikely. The same goes for your other services and even custom applications.

In order to manage these larger storage pools, LINBIT has a utility called DRBDManage.

Figure 4  shows an example DRBD9 configuration. The Gray boxes represent servers, either physical or virtual with the coloured boxes representing individual resources/volumes. The user has 5 resources being replicated, demonstrated by the letters A, B, C, D, and E. Each repeat of a letter is a replica of the service located in another server.

31-drbd-4-typical-config-600x222
Figure 4: Typical configuration

There is a DRBD Control volume located on each DRBD server in order for the replication to keep track of all the data.

  • Do you want to access information from resource C on node 1?  No problem, DRBD will reach over to server 3 and grab the information for you.
  • Want to access the information on service E from node 4? DRBDManage will automatically promote it to primary in order to get you the information.
  • Need to eliminate the 3rd replica of resource B in order to save space? This is just as simple as typing a few commands.
  • Need to replicate resource C one more time? Just tell DRBDManage and it will go find the space.

So what happens when a server fails? Notify DRBDManage that the node is not simply down for maintenance, and is, indeed, dead.

DRBD Manage will find the optimal available space on your other servers and reprovision the replicas.

In addition to the scalable functionality, DRBD9 allows you to replicate data asynchronously into the cloud. With the ability to hold 2 sets of data on-site and 2 sets of data off-site (for example), users can extend their HA services across distances to provide Disaster Recovery functionality.

From a school campus that wants data in multiple buildings to a company which needs one copy of the data in their offices and more sets in the cloud, DRBD9 seems to be able to do it all.

In terms of performance, DRBD8 typically has about a 1-3% overhead when being used correctly. With DRBD9, this remains the same, however LINBIT has enabled the use of Remote Direct Memory Access (RDMA) connections.

31-drbd-5-rdma-options-600x314
Figure 5: DRBD offers RDMA options for server to server communication

RDMA is direct memory access from the memory of one computer into that of another without involving either operating system. This permits high-throughput, low-latency networking. When using DRBD over RDMA connections in place of TCP connections, LINBIT saw a 100% increase in throughput and a 50% decrease in CPU load. This makes DRBD one of the fastest synchronous data replication tools on the market.

The move from hosted cloud to private cloud

As most IT folks know though, the cloud is merely someone else’s servers, at an off-site location that isn’t their own. This is causing some organisations to take a second look at their hosted IT costs to see if they can increase security, resiliency, and availability by bringing things into their own cloud. OpenStack is a seamless way to pull all of these things together.

This new scale-out technology is great for hosting providers, but what about those large-scale users who want to roll their own cloud? How does DRBD fit in with the mid to large organisations who want to manage their own infrastructure?

OpenStack is, essentially, a framework for applications to plug-in to.  It allows organisations to host and manage their own services in a scalable fashion, without relying on outside vendors.

The storage component of OpenStack is called Cinder. LINBIT has merged our DRBDManage Cinder driver up-stream in order to make DRBD a compatible solution when using the OpenStack cloud.

Now, users can roll out their applications on a proven platform, used by some of the world’s largest organisations, all without hiring an outside IT team to hold their infrastructure. Thanks to DRBD9’s new functionality, companies can run high performance, high IO, and HA applications on bare metal Linux storage servers, thanks to DRBD9.

Conclusion

The world has gone from a place where people open their emails, chat with colleagues, and enjoy new cat videos on YouTube… to a world where people save e-mails, chats with colleagues, and cat videos from YouTube.

The data that companies store has become part of their competitive advantage and keeping it safe is a priority like no other time in history. Just like self driving cars and recycling initiatives, the world has come up with simple ways to solve complex problems. The same is true for storage replication.

With the ever increasing amount and importance of data, organisations will continue to rely on proven technology to keep it all safe. DRBD has become the market leading software for Highly Available Linux storage clusters, and now is accomplishing the same for scalable, high performance architectures.

LINBIT’s goal is to collaborate with the community and partners to provide next generation storage infrastructures. They are always searching for new partners and collaboration opportunities. Visit LINBIT’s website to learn more about using DRBD as a competitive advantage in your organisation.

www.linbit.com/en/community/drbd-oss-distribution

 

This article was first published in OH Magazine Issue 31, 4/2015, p9-12.  Please SUBSCRIBE if you wish to see the full article.

Leave a Reply