Author: Yogi Schulz

The Internet is open for business around the clock. Unfortunately, some of our applications and computing infrastructures are not. The guiding principle in designing for high availability is the avoidance of single points of failure.

Servers

Servers undermine high availability when they crash due to overloading or requiring too many outages for hardware or software upgrades.

High availability is achieved on production servers by operating distinct machines for web serving, application serving and database serving. This separation avoids overloading servers and causes fewer outages when server change-outs or upgrades are required. Ensuring that outages, related to server maintenance activity, are short requires having one or more unallocated servers available for immediate use.

Database

Database software problems do occur despite what advertisements may want us to believe.

Ensuring high availability in the presence of imperfect database software requires the use of a hot standby database or technology like Oracle RAC.

Disk

It’s difficult to achieve high availability with standard disk because the manual management headaches become overwhelming with larger databases or file systems.

High performance disk sub-systems, as offered by Compaq/HP, EMC, Hitachi and IBM, contributes to high availability by allowing a rapid failover to a hot standby database in the event of failure of the primary database. This type of disk also eliminates outages for backups through features like Concurrent Copy and Flash Copy.

Application Testing

More robust applications result in fewer crashes that cause outages. High availability requires the operation of a test environment that duplicates the production environment as much as possible in terms of components and architecture but not in size.

To be really useful, the test environment must be used to test the application. To those who think that’s obvious, I’m disappointed to report that too much code hits production status well before its time.

Application Performance

Applications, that have been not been performance tuned, will stress the computing infrastructure and cause server crashes. We’ll skip over the poor performance that untuned applications impose on the end-user.

Use application profiling tools and SQL*Trace to identify application tuning opportunities and act on them to achieve high availability.

Application Architecture

A common application architectural error is to closely tie the application to the database so that the application must run on the database server. This close tie means that either an application problem or a server problem will cause an outage.

High availability is achieved by separating the application from the database. Then the application can be run on an increasing number of application servers concurrently as the end-user community grows.

Local Area Network

The local area network contributes to high availability if it features redundant paths to isolate failures, subnets to avoid traffic congestion and load-leveling routers that distribute traffic among the application servers or web servers.

Wide Area Network

The wide area network contributes to high availability if it features redundant paths to the Internet backbone through separate ISPs.

Verify the performance of your network using a performance monitoring and reporting tools.

Operations

Too many organizations recognize an outage when an end-user calls to complain. Such incidents do not enhance reputation. Ad hoc development of ping script to monitor the computing environment is insufficient to achieve robust operation.

High availability requires continuous monitoring of the computing environment components and the application. This is achieved by activating SNMP and implementing software such as CA Unicenter, HP OpenView or Tivoli NetView.

Conclusions

The tools and techniques needed to achieve high availability are well known. You’ll help your organization by pursuing the ideas that you have not addressed.