I gave a lot of thought before I decided to blog on this one because I know I am going against the established grain with my method of explaining HA. However, I am used to being spanked in public, so I can take some more spankings if needed. Actually, I think I like being spanked.
High Availability is the combination of well defined, planned, tested, and implemented processes, software, and fault tolerant hardware focused on supplying and maintaining application availability.
For Example: As a high level example, consider messaging in an organization.
BAD – A poor implementation of Exchange is usually slapped together by purchasing a server that the administrator feels is about the right size and installing Exchange Server 2003 on it. Messaging clients are installed on network connected desktops and profiles are created. The Exchange server might even be successfully configured to connect to the Internet. I have seen Exchange environments installed in organizations over a short business week and even over night in some cases. It is easy to do it fast and get it done, but lost of important details are missed.
GOOD – In an HA environment, the deployment is well designed. Administrators research organizational messaging requirements. Users are brought into discussions along with admins and managers. Messaging is considered as a possilble solution to many company ills. Research may go on for an extended period as consultants are brought in to help build a design and review the design of others. Vendors are brought in to discuss how their products (Antivirus and content management for example) are going to help keep the messaging environment available and not waste messaging resources processing spam and spreading viruses (or is that virii?). Potential 3rd party software is tested and approved after a large investment of administrator and end user time. Hardware is sized and evaluated based on performance requirements and expected loads. Hardware is also sized and tested for disaster recovery and to meet service level agreements for both performance and time to recovery in the case of a disaster. Hardware selected will often contain fault tolerant components such as redundant memory, drives, network connects, cooling fans, powersupplies, and so on. An HA environment will incorporate lots of design, planning, and testing. An HA environment will often, but not always, include additional features such as server clustering which decreases downtime by allowing for rolling upgrades and allowing a preplanned response to failures. A top-notch HA messaging environment will also consider the messaging client and its potential configurations that lead to increased availability for users. For example, Outlook 2003 offers a cache mode configuration allowing users to create new messages, respond to existing mail in their in-box, and manager their calendars (amongst many other tasks) without having to maintain a constant connection to the Exchange server. Cache mode allows users to continue working even though the Exchange server might be down. It also allows for more efficient use of bandwidth.
The Goal – Now this is where many people disagree. I consider the goal of all HA environments to really be continuous availability (CA) of applications and resources for employees. Doesn’t everyone want email to always be available processing messaging traffic and helping the people in the organization collaborate? Of course that is what we want. We want applications and their entire environment to continue runing forever.
In my opinion, we strive for CA and we settle for HA.
"In information technology, high availability refers to a system or component that is continuously operational for a desirably long length of time. Availability can be measured relative to "100% operational" or "never failing." A widely-held but difficult-to-achieve standard of availability for a system or product is known as "five 9s" (99.999 percent) availability."
Obviously, "continuously operational" just isn’t possible over extremely long periods of time. Hardware will always fail, it is just a matter of when. Software becomes obsolete over time, too. We all need to understand that HA includes not just the hardware and software solution, but it also includes the backup/restore solution, and it includes failover processing. Most HA experts will also add that a true HA environment includes a well documented development, test, and production migration process for any and all changes to be made in production environments. There is much to achieving HA, however, it simply comes down to application availability through well designed, planned, tested, and implemented processes, software, and hardware.
Another Example would be if you use NLB to provide application availability to your users over the Internet for your web based app. NLB helps keep the application available to your users. The same can be said for server clustering, however, you need to take into account the non-availability during the actual failover of your application in the event of hardware or software failures. Sometimes, failover is a matter of seconds, in other cases it can be several minutes. In all cases, a clustering solution will significantly drive down non-availability and increase the uptime of your application as run on your servers. Many experts state that, for any application or system to be highly available, the parts need to be designed around availability and the
individual parts need to be tested before being put into production. As an example, if you are using 3rd party products with your Exchange environment that have not been properly tested, you may find that they are a weak link that results in loss of availability. Implementing a cluster will not necessarily result in HA if there are problems with the software.
I could and maybe should ramble on some more, but I need to focus on some other things right now. To summarize this entire discussion:
HA is so much more than just slapping a couple of servers together in a cluster. Please keep in mind all of the details behind a top-notch HA environment.