Paul J. Holenstein, Dr. Bruce Holenstein, and Dr. Bill Highleyman
Active/active architectures can extend failure times from years to centuries. Active/active systems comprise multiple independent nodes using a common distributed database cooperating in a common application. This book is the third volume of a three-part series on active/active systems. It gives real-life examples of existing active/active systems that are achieving such extraordinary failure intervals.
As our daily lives and corporate well-being become more dependent upon computers, system reliability grows increasingly important. No longer are frequent system outages acceptable. Often, failure intervals must now be measured in centuries.
In addition, the performance of these systems is analyzed; and a set of commercial off-the-shelf products are described that enable system implementations.
There are many additional advantages of these multi-node architectures, such as unlimited scalability, load balancing, disaster tolerance, efficient use of all capacity, and risk-free failover testing.
Also described are several other technologies related to high availability, such as the grid, virtual tape, and the Real Time Enterprise (RTE). There are many regulations being promulgated that require the use of high-availability computing. Many of these are summarized.
Throughout this volume and the preceding two volumes, dozens of rules are formulated to achieve extreme availabilities. They are summarized here for handy reference.
Dr. Bill Highleyman, Paul J. Holenstein, and Dr. Bruce Holenstein have a combined experience of over 90 years in the implementation of fault-tolerant, highly available computing systems. This experience ranges from the early days of custom redundant systems to today’s fault-tolerant offerings from HP (NonStop) and Stratus.
Dr. Bill Highleyman has done extensive work on the effect of failure mode reduction on system availability. He has built fault-tolerant systems for train control, racetrack wagering, securities trading, message communication, and other applications. He is the Managing Editor of the Availability Digest (availabilitydigest.com).
Paul J. Holenstein and Dr. Bruce Holenstein have architected and implemented the various data replication techniques required for the availability enhancements described in this book. Their company, Gravic, provides the Shadowbase line of data replication products to the fault-tolerant community.
Chapter 14: Benefits of Multiple Nodes in Practice
The active/active architecture that we have been studying is based on an enabling technology – that of data replication. It is data replication that allows us to keep multiple copies of a database distributed across a network in synchronism so that independent processing nodes can all participate simultaneously in a common application. By doing so, the application network is very resilient; and it recovers rapidly from faults with little or no impact on the users.
High availability and fast recovery are only two of the many advantages that can be achieved by building multinode systems in which each node is independent of the other nodes in the application network yet can interoperate with other nodes to implement a common application.
For instance, in addition to failure recovery and disaster tolerance, these architectures can be used in many other ways, such as to eliminate planned downtime, to expand capacity by using symmetric or asymmetric configurations, and to provide data locality. Also, more efficient use is made of the total system capacity than in classic active/backup configurations; and there are innumerable ways to balance availability, cost, and performance.
In this chapter, we talk about some of the many advantages of active/active systems. In Chapter 15, we present some actual examples.
Extreme Availability
Availability is an ever-increasing requirement of today’s systems. As companies go global, serving their customers 24x365, as applications become ever more critical to users, and as costs of downtime escalate, increasing the availability of enterprise applications becomes a paramount requirement.
We have shown that active/active systems can provide orders of magnitude improvement in availability – increasing availability from three or four 9s to six or seven 9s or even more. This is because of redundancy. The more redundancy, the greater is the availability. Adding one redundant node to a single monolithic system doubles its 9s.