Belly of the Beast
Where on Earth am I?
Belly of the Beast – II
The Internet is a vast computer network, where each computer is connected to every other computer using broadcast and point-to-point networks. The underlying communications is carried via media such as telephone lines, satellite links, optical connections, microwaves, satellites and undersea cables. The computers themselves depend on naming and routing schemes in a frantic scurry of transporting data from one point to another.
Last week we discussed the basics of computer networks, protocols, naming and routing. The infrastructure of the Internet is built around routing and along with a healthy dose of protocols. In some sense the traffic flowing through the Internet has striking similarities to the postal system.
Postal systems have been around since the Roman Empire. Managing a smooth and efficient flow of written communications is the stated purpose of the postal system. The techniques used in letter transport are quite intuitive, yet complex. Suppose Alice (living in Malappuram, Kerala, India) wants to send a postcard to Bob (living in Vijlandi, Estonia). She writes Bob’s address at the appropriate place on the postcards, scribbles a message and drops it into a mailbox.
The letter delivery system uses a set of protocols—recall a protocol is a specific language used to communicate commands and data. Some protocols are built using other protocols. For example, the postcard Alice sends to Bob has Bob’s address written in a specific form (name, house number, street, city, country). It also has the message Alice wants to convey. The address is written using the English alphabet—the alphabet is also a language or protocol. The alphabet is written using a lines drawn on paper—the use of paper and ink is also a protocol. The data part is written using a language comprehensible to Alice and Bob, but need not be comprehensible to anyone handling the postal system.
The Internet uses such layering of protocols too. At the bottom end, that is like writing English alphabet on paper is the IP protocol, which is responsible for managing the flow of data through the Internet. Using the IP protocol we can build other protocols such as the DNS protocol that transforms names to IP addresses and the TCP protocol that ensures the data transmissions are error free.
The post office in Malappuram picks up the letter from Alice, and they have never heard of Vijlandi or Estonia. However, they realize the country of Estonia is somewhere up North, maybe Europe or maybe North America. So they forward the letter to Calicut, who sends it to Cohin, who sends it to New Delhi. The folks up in New Delhi look up a list and find Estonia is reachable via Frankfurt or Paris. The maibag to Frankfurt that day was rather full, so the letter gets into the bag bound for Paris. From Paris, the letter goes to Warsaw and then to Riga and then to Tartu and finally reaches Vijlandi.
The Internet is similar. “Routers” do the transport of messages in the Inernet. The topology of the Internet that is what router is connected to what router is not known widely. In fact, the topology is quite unknown, each router knows about its neighbors. Of course, each router is assigned an IP address and hence it knows about the addresses of the neighbors. The routers also have large tables called “routing tables” which lists ranges of IP addresses and the direction (or neighbor) each range should point to. So when a message arrives bound for a particular address, the router looks up the tables and decides where to forward it.
To delve deeper into the mysteries of Internet routing, let us watch Alice use the Internet. She first fires up the computer and dials a number. The dialing software causes the modem in Alice’s computer to call a modem located at the premises of her ISP (Internet Service Provider). As the two modems establish a phone connection they scream and hiss at each other in a vile-sounding courtship, called “training”. The two modems during the training phase agree on the protocol they will speak (there are several) and the speed at which they will work (depends on the quality of the phone connection). Once the training is over, the modems continue to hiss at each other, but the speaker is cut off so that Alice does not have to listen to the horrendous noise.
Once the modems are trained up they are able to send data from one to the other. Since Alice’s computer is connected to its modem, the computer can send data via the modem, over the phone line into the modem at the ISP end. Alice’s computer sends some information using the connection that tells the ISP’s computer that Alice is a registered user.
After the connection is established, the ISP computer provides Alice’s computer with a domain name (such as ppp231.pppker.vsnl.net.in) an IP address (such as 220.127.116.11) both of which are unique to Alice’s computer. The assigning of these names and addresses puts Alice’s computer on the Internet—now it can communicate with any computer on the Internet.
Alice decides to go web surfing and punches up www.google.com. Of course, that is a domain name for the main web server run by Google and is incomprehensible to Alice’s computer. Hence Alice’s computer sends the phrase “www.google.com” to the ISP’s computer, which in turn contacts one of many computers called Domain Name Servers (DNS). A DNS knows many of the translations from domain names to IP address. Google being a well-known site, the DNS finds it rather quickly and returns the number 18.104.22.168, which is one of Google’s computers.
Now that Alice’s computer knows the IP address of Google, it sends a message with the “To” address as 22.214.171.124 and the “From:” address as 126.96.36.199. The message first goes to the ISP’s computer. This computer is not directly connected to any major backbone, so it sends the message to a router connected to the main feed line of the ISP. The router now uses a routing table and decides the address is far away and the data needs to go further upstream The data gets routed to a router connected to a backbone—a wire traversing continents. The little data from Alice now is dumped into a huge rush of data zipping across the backbone connection and eventually it pops up at one of the major Internet routing centers in the United States From such a routing center there are many path leading to the destination. The router handling Alice’s data decides to send it on one of these paths to yet another router, which sends it to another router and so on. This decision is made based on the inter-router traffic congestion statistics.
Finally, the packet arrives at the Internet highway leading into Google’s offices, where yet another router sends it to one of the hundreds of computers used to provide the Google search service. As soon as the destination is reached, the Google machine sends a acknowledgement to Alice’s computer by sending a message to the IP address 188.8.131.52.
The above description of Internet naming may seem daunting, but it is indeed highly oversimplified. We did not consider the Internet topology, we glossed over traffic congestion detection, we did not consider traffic routing due to political and commercial issues and we did not even talk about many baffling stuff that happens inside Internet routing stations. In reality, there are several complicated protocols (OSPF, RIP) that control the routing. Simplistically, at each routing point, the router looks at several possible paths to send the packet through, and then computes the congestion on these links and then decides on a route that is expected to be the fastest. In fact, often such decisions may be wrong and data can get into loops. Suppose the data arrives at A destined for D. A is connected to B and C, both of which are connected to D. A chooses B as the intermediate hop. However when it arrives at B, B notices the path from B to D is very congested and the quicker link to D is actually via A and C, so B sends it back to A. Maybe B is using more recent information and A has not yet realized that B to D is congested, so A will send the data back to B. In reality, when we have to consider hundreds of routers performing routing over the backbone networks of the Internet, the problem becomes quite heinous.
The above discussion makes it sound as if the Internet is a mishmash of wires (or links) and layered with naming and routing protocols. However most Internet users think otherwise. To a user the Internet is a set of web sites, and services and E-Mail, and audio and video sources. This dichotomy is the result of the nature of the beast.
Partha Dasgupta is on the faculty of the Computer Science and Engineering Department at Arizona State University in Tempe. His specializations are in the areas of Operating Systems, Cryptography and Networking. His homepage is at http://cactus.eas.asu.edu/partha.