Skip to main content

The Internet is down. Or, is it? (PART 1)

The scenario I use here to test troubleshooting skills is one that everyone has run into at one point or another—you try to load a Web site, perhaps even a reliable site like Google, and it won't come up. Because I covered local network troubleshooting in my last column, I'm assuming you already have gone through those steps and are ready to proceed past the local network. Even though this example deals with testing access to the Internet, you can use the same steps to troubleshoot problems accessing any remote network.
Test Your Gateway

For your computer to communicate with any other computer outside your local network, you must have a gateway (router) configured on your local network, and you must be able to reach it. Without getting into heavy-duty network theory, a router connects two or more networks and knows how to route packets between those networks. Your Linux computer has a list of all of the routers it knows about for each network of which it is a member and when it should use those routers all stored in its routing table. You can use the route command to show your computer's current routing table:

$ route -n
Kernel IP routing table
Destination  Gateway   Genmask         Flags Metric Ref  Use Iface
10.1.1.0     *          255.255.255.0   U     0      0     0 eth0
default      10.1.1.1  0.0.0.0          UG    100    0     0 eth0


In the above example, I have one gateway defined: 10.1.1.1. It is listed as my default gateway, which is the router it will use whenever it doesn't have any other routers defined for that network. In my case, it's also the only router in my routing table. That means any time my machine wants to communicate with a remote network (in my example, anything that's not within 10.1.1.0/255.255.255.0 or 10.1.1.1–10.1.1.254), it's going to send the packet to 10.1.1.1 to forward on.

So now that I know my default gateway, I use ping to test whether it's available:

$ ping -c 5 10.1.1.1
PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data.
64 bytes from 10.1.1.1: icmp_seq=1 ttl=64 time=3.13 ms
64 bytes from 10.1.1.1: icmp_seq=2 ttl=64 time=1.43 ms
64 bytes from 10.1.1.1: icmp_seq=3 ttl=64 time=1.79 ms
64 bytes from 10.1.1.1: icmp_seq=5 ttl=64 time=1.50 ms

--- 10.1.1.1 ping statistics ---
5 packets transmitted, 4 received, 20% packet loss, time 4020ms
rtt min/avg/max/mdev = 1.436/1.966/3.132/0.686 ms


In this example, four out of five ping packets were received, so I can be reasonably sure my gateway works. If I couldn't ping the gateway, either my network admin is blocking ICMP packets (I hate when people do that), my switch port is set to the wrong VLAN, or my gateway is truly down. If the gateway is down, fixing the problem might mean rebooting your DSL or wireless router (if that's how you connect to the Internet) or moving your troubleshooting to whatever device is acting as your gateway.
Test DNS

In my case, I was able to ping the gateway, so I'm ready to move on to DNS. Because most of us don't browse the Web by IP address, we need DNS to resolve the hostnames we type into IP addresses. If DNS isn't working correctly, even if we technically can reach that remote IP address, we never will know what the IP address is.

A basic way to test DNS is via the nslookup command:

$ nslookup www.linuxjournal.com
Server:     10.2.2.2
Address:    10.2.2.2#53

Non-authoritative answer:
Name:    www.linuxjournal.com
Address: 76.74.252.198


In this example, DNS is functioning correctly as far as I can tell. I say as far as I can tell, because I'm assuming that 76.74.252.198 is the correct IP address for www.linuxjournal.com. If it were the wrong address, that very well could be the cause of the problem! The DNS server in this case is 10.2.2.2, but in some environments, it could be the same IP address as your gateway.

Even though the DNS server worked, because I want to show how to troubleshoot DNS, I need some examples of how it can fail. To illustrate this, let me show a few different nslookup commands that have failed:

$ nslookup www.linuxjournal.com
;; connection timed out; no servers could be reached


This error tells me that nslookup couldn't communicate with my DNS server. That could be because either I don't have any name servers configured on my system or I just can't reach them. To see whether I have any name servers configured, I would check my /etc/resolv.conf file. This file keeps track of what name servers I should use. In my case, it would look like this:

search example.net
nameserver 10.2.2.2


If your resolv.conf file doesn't have a name server entry, you have found the problem. You need to add the IP address of your name server here. Because I do have a name server defined in resolv.conf, the next step is to attempt to ping the name server's IP with the same ping command that I used for the gateway above. If you can't ping the name server, either a firewall is blocking ICMP (those pesky network administrators!) or there's a routing problem between you and the name server. To rule out the latter, use a tool called traceroute. Traceroute tests the route between you and a remote IP address. To use it, type traceroute followed by the IP address you want to reach. In my case, I would use 10.2.2.2:

$ traceroute 10.2.2.2
traceroute to 10.2.2.2 (10.2.2.2), 30 hops max, 40 byte packets
1  10.1.1.1 (10.1.1.1)  5.432 ms  5.206 ms  5.472 ms
2  10.2.2.2 (10.2.2.2)  8.039 ms  8.348 ms  8.643 ms


In this example, I can route to 10.2.2.2 successfully. To get there, my packets first go to 10.1.1.1 and then move straight to 10.2.2.2. This tells me that 10.1.1.1 is likely the gateway for both networks. If there are more routers between you and your remote server, you will have more hops in between. On the other hand, if you do have a routing problem, your output might look more like the following:

$ traceroute 10.2.2.2
traceroute to 10.2.2.2 (10.2.2.2), 30 hops max, 40 byte packets
1  10.1.1.1 (10.1.1.1)  5.432 ms  5.206 ms  5.472 ms
2  * * *
3  * * *


If you start seeing asterisks in the output, you know the problem likely begins on the last router on the list, so you would need to start troubleshooting from that router. Instead, you might see output like this:

$ traceroute 10.1.2.5
traceroute to 10.1.2.5 (10.1.2.5), 30 hops max, 40 byte packets
1  10.1.1.1 (10.1.1.1)  5.432 ms  5.206 ms  5.472 ms
1  10.1.1.1 (10.1.1.1)  3006.477 ms !H  3006.779 ms !H  3007.072 ms


This means your ping timed out at the gateway, so the remote host could be down, unplugged or otherwise inaccessible, so you would need to troubleshoot its connection to the network.

Note: traceroute relies on ICMP, so if ICMP is blocked on your network, install a tool called tcptraceroute to perform a similar test over TCP (the syntax is the same, you just type tcptraceroute instead of traceroute).

If you can ping the name server but it isn't responding to you, go back to my previous column and perform all the troubleshooting steps to test whether the remote port is open and accessible on the remote host. Keep in mind though that DNS servers use port 53 on TCP and UDP. Again, if you aren't sure what port a service uses, check the /etc/services file on your system. It lists most of the common services you will use.

Comments

Popular posts from this blog

[AIX] How to restart network service in AIX environment?

When network service like telnet connection hangs, then it is possible the inetd subsystem is not working properly. # refresh -s inetd 0513-095 The request for subsystem refresh was completed successfully. If the refresh hangs for some time and comes back with 0513-056 time out waiting for command response, then the inetd subsystem may not be working correctly and should then be killed nicely. Run #ps -ef | grep inetd and do a kill -15 on the process ID on the line that has /usr/sbin/inetd. Once inetd has been killed, type startsrc -s inetd. If inetd starts, try to telnet into the machine. If inetd does not start up successfully, or if telnet still hangs indefinitely, run kill -15 on the PID of inetd again. Back up the original /etc/inetd.conf file to a new file name by typing: # mv /etc/inetd.conf /etc/inetd.conf.backup The original template is in /usr/lpp/bos.net/inst_root/etc/. You can copy it by typing: # cp /usr/lpp/bos.net/inst_root/etc/inetd.conf /etc/inetd.conf Run start...

Windows Server Backup Step-by-Step Guide for Windows Server 2008

The Windows Server Backup feature provides a basic backup and recovery solution for computers running the Windows Server® 2008 operating system. Windows Server Backup introduces new backup and recovery technology and replaces the previous Windows Backup (Ntbackup.exe) feature that was available with earlier versions of the Windows operating system. What is Windows Server Backup? The Windows Server Backup feature in Windows Server 2008 consists of a Microsoft Management Console (MMC) snap-in and command-line tools that provide a complete solution for your day-to-day backup and recovery needs. You can use four wizards to guide you through running backups and recoveries. You can use Windows Server Backup to back up a full server (all volumes), selected volumes, or the system state. You can recover volumes, folders, files, certain applications, and the system state. And, in case of disasters like hard disk failures, you can perform a system recovery, which will rest...

Configure Link based IPMP in Solaris

For long we used Tracking IP based IPMP wherein we track the availability of a gateway on the network using ICMP Echo request. When the gateway IP fails to respond it is considered link is unavailable and NIC fails over to the standby NIC in the IPMP group. The biggest disadvantage here is that we use 3 IP addresses: 1 for the Virtual IP 1 for the active NIC 1 for the Standby NIC and ofcourse the overhead of the ICMP echo requests sent every seconds. This can be overcome using the Link based IPMP configuration where you only need only IP Address and there is no overhead of ICMP messages and the failover delay is lot lesser than that of the tracking method. And the added advantage of a very simple configuration. To configure Link based IPMP, create the hostname.<int name> files for the Active NIC and the standby NIC. For instance here, we use the bge0 and bge3 NICs as the IPMP pair where bge0 is active and bge3 is standby and hence the files hostname.bge0 and hostname.bge3 . To...