So your Web server is down. Here's how to fix it and avoid further issues.
The Web has become a staple of civilization over the past two decades. Typically ranked just below running water and reliable electricity on the hierarchy of societal needs, the Web server has provided more in the way of societal productivity, individual entertainment and overall quality of life than perhaps any other development over the past half century.
Taken for granted by many modern end users, the Web server has allowed for academic research, personal enrichment and pizza orders, all from within the comfort of the end user's home. In order to compete, most reputable businesses have taken the plunge into the world of websites and, by default, a plunge into the depths of Web server development, configuration and maintenance. As businesses become increasingly reliant on their Web servers and as those servers become increasingly complex, organizations should pay more attention to contingency plans should their server or cluster of servers fail.
Redundancy and the Web server
Before proceeding any further, let's highlight the importance of redundancy. In the past, redundancy solutions were varied, but they all involved the purchasing and implementation of other physical machines. Today, the proliferation of virtualization has created a kind of snow ball effect that has resulted in some effective redundancy platforms. Enterprise environments are still in need of multiple machines -- just not as many. Whatever virtualization platform is chosen, simply choosing one will go a long way toward alleviating any downtime issues associated with a Web server failure.
Step 1: What happened?
As with production servers, administrators must diagnose symptoms as soon as possible when the Web server is down. Some of the questions that need to be resolved are:
- Have there been any reported power outages, generator tests or other such occurrences that could affect the overall physical environment?
- Is access to your Web server completely cut off or can some IP address ranges still hit the server?
- Is management access to the server still possible?
- Have there been any unusual entries into the logs?
These are just a few of the immediate triage questions that need to be answered prior to delving deeper into the problem.
Step 2: The simplest solution is often the best
There have been many times when I have been deeply involved in a troubleshooting scenario where, if I had simply taken a step back and taken a macro view of the problem, countless man hours and costs of finding a solution would have been saved. For example, does the host machine have power, or did someone unknowingly unplug the power cord? If the machine obviously has power, but there doesn't seem to be any network connectivity, check to see if the Ethernet or fiber cable has been inadvertently unplugged. Yes, these seem like painfully obvious solutions, but any experienced system administrator will tell you that these scenarios pop up more often than one would care to think.
Step 3: If basic troubleshooting doesn't work
Now that you've checked all of your cables and other peripheral devices, try to ping the device from within the LAN. Fortunately, the ping command is universal, so this should be straightforward regardless of the platform in use. If you can ping the server from within the LAN, try pinging the server from outside of the LAN. Doing this will go a long way in determining if the problem is at the routing and switching level, rather than at the server level. Also, if your Web server is virtualized, try pinging the IP address of the physical machine itself. This will help you to further isolate the problem. If you can't ping the server at all, and you've definitely checked the network connection, it's time to go deeper.
Step 4: Nope, the Web server is still down
You've checked your cables. You've attempted to ping the box, and you still don't have network access to the server. The good news is that you have essentially isolated the problem to either the physical machine or the operating system. Either way, you don't have to send your minions out on wild goose chases anymore, and everyone can focus on a specific area.
Famous website outages
Dating site stood up by Amazon
Routine goes wrong at Intuit
Gmail signs off for hours
Taking your low hanging fruit mentality to the top of the troubleshooting tree, begin by opening whatever interface you feel comfortable with and checking your local network configuration. Is DHCP enabled? Is the Web server pointed to the correct DNS server? If so, depending on your platform, check to see if the Web server service is turned on. In a Windows environment, you would ensure that the actual Web server role has been enabled. In Linux, the specifics vary a little more, but try to find a file or service labeled httpd and ensure that the service is running.
Step 5: Desperate times call for desperate measures
If none of the above seems to work, check the logs and try to pinpoint the log entries that occurred around the same time as the Web server failure. Delegating this, while some of the more experienced troubleshooters attempt to examine other areas, may prove beneficial. Also, if it has been determined that network connectivity is not the problem, starting a Wireshark capture may be helpful in determining what exactly is traversing the network, thereby assisting in the troubleshooting process.
Overall, the reasons for Web server failure are varied. Power surges, incorrect configuration, errors in how the firewall is set up and even malicious traffic from the Internet could all be sources of failure and headaches for system administrators. All of these should provide that much more motivation for enterprise decision makers to ensure that sufficient redundancy solutions are in place, as well as detailed troubleshooting procedures tailored to the needs of the local network.
About the author:
Brad Casey is an expert on network security with experience in penetration testing, public key infrastructure, VoIP and network packet analysis. He also covers system administration, Active Directory and Windows Server 2008, with interest in Linux virtualization and Wireshark captures. He spent five years in security assessment testing for the U.S. Air Force.
Brad Casey asks:
Is your business reliant on 24/7 website operation?
0 ResponsesJoin the Discussion