Open Source, Closed Source
Belly of the Beast - II
The Belly of the Beast
What started during the cold war as a network for waging war became a global information dissemination forum, with applications in peacetime. It is large enough to cover the world and extend into space, yet it is small enough to be squeezed into personal computers and phone wires. It is complex enough to baffle the best minds on earth, yet it is simple enough that ordinary mortals can use it. It is built out of some hardware and some software, yet it is having a profound impact on human society, experience, behavior and life. What is it? Look Ma, it is the Internet.
In some sense, the Internet is just a “computer network” that grew up from a network called he Arpanet. As far as the design of computer networks goes, the Arpanet was not quite the most usable, or the most advanced or the most powerful or even the most scalable computer network. Yet a quirk led to the Arpanet to gain high acceptance, change its name to the “Internet” and become the most popular and the most populated network.
In some sense, the inner workings of the Internet are not quite understood by its creators. It has a life of its own and manages itself in ways that are slightly mysterious. Consider a colony of hundreds of thousands of ants. Each ant is a simple-minded insect that goes about its life based on limited knowledge of its surroundings. However, working together in surprising harmony, the ants form marching lines, ravaging clumps, elaborate nests and do some pretty amazing things. Looking at the working of the each ant, it is not clear how the congregation of ants work they way they do. Similarly, we know what each computer on the Internet does, but the aggregation seems to have power beyond our comprehension.
A computer network, in its simplest form, is a wire connecting two computers. This wire can carry data from one computer to the other. Hence, if computer A sends some data to computer B, B can receive the data. This data-wire does not seem to be very useful or usable or even interesting. To make the computer network useful, first the wire between two computers must enable them to collaborate on some tasks, and second, the wire must connect more then two computers.
A very basic function that would make the network useful is the ability to move a file from one computer to the other. This simple task is not so simple after all. Assume computer A wants to send a file called F to computer B. How does computer B know  when A starts sending the file  what is the name of the file and  when the file transmission is done? Obviously computer A has to tell computer B, “sending block 1 of file F”, followed by a lot of data, followed by “sending block 2 of file F” and finally “sending over”. Hence the network has the need to communicate instructions, (separate from the transmission of data).
The above situation is closely analogous to the communication between people. Suppose Alice calls Bob on the phone, to tell him the temperature in London. If Alice speaks only Swahili, and Bob understands only Croatian, the conversation would fail. If they both spoke the same language, then Alice would tell Bob what she is going to tell him (temperature in London), then actually tell him (a number) and then ask him to repeat the number, to make sure he heard correctly.
When two computers communicate over a wire they use the same technique as Alice and Bob. First the computers must speak the same language. The language of networking is called a “protocol”. There are hundreds of protocols that have been used for computer networking (every software company invented their own protocol for making their computers communicate). The protocol used on the Internet is called TCP-IP. Ever since the Internet became the de-facto standard computer networking systems, all protocols other than TCP-IP have been rudely eliminated.
TCP-IP is really two protocols with one name—these two being TCP and IP. TCP (stands for Transmission Control Protocol) ensures that when one computer sends data to another, both computers work together to make sure they transmit and receive at the same speed and all bits that are sent are correctly received. The IP (stands for Internet Protocol) ensures that the data sent by computer A, bound for computer B actually reaches computer B even when B is very far away. TCP-IP is used just for data transmission and routing. There are other functions of the Internet that use other protocols, and they have all kinds of names reminiscent of alphabet soup. Some examples are OSPF (Open Shortest Path First), MIME (Multipurpose Internet Mail Extensions), SNMP (Simple Network Management Protocol), FTP (File Transfer Protocol), RIP (Routing Information Protocol), ICMP (Internet Control Message Protocol).
Now that we have computers A and B talking, we would like them to talk to more computers. There are two main methods for multi-computer communication—the “broadcast network” and the “point-to-point network”. In the first method the same physical wire connects all computers. When A wants to talk to B, it broadcasts the message on the wire. Every computer hears A, but all but B ignores it. This form of networking is used in the Ethernet system and is the most popular local area networking method. In the point-to-point method, for A to talk to B there has to be a wire from A to B, dedicated for use by A and B. To let computer C join the network we could connect yet another wire from B to C. Now B can talk to A as well as C, but A cannot talk to C. To enable the communication between A and C, we program A to send messages destined for C to B. Then we tell B to relay messages coming from A to C. This is called gatewaying or routing.
Naming and Routing
Just 10 computers on a network can make the networking scheme quite complicated, unless if we use a broadcast network. Of course, broadcast networks can only be used for a small number of machines (typically 10-30) and must be used over short distances (under 1km). Hence point-to-point networks are used to construct larger networks. But, point-to-point networks involves naming and routing—which makes things rather overwhelming.
In any network, if computer A wants to send some data to computer B, A must know B’s name. Then A has to know where B is located. Of course B may be as very close (on a broadcast network connected to A) or quite far away (and the message needs to be routed to C, and then to D and so on).
The naming issue is resolved in the Internet using not one, but two forms of naming—the domain-based name and the IP address. The stupid humans use the domain-based names, while the number savvy computers prefer the IP addresses. The domain name is a set of letters, interspersed with dots, always ending in something like .com. In fact, the ending of the domain name is .com for commercial domains, .org for non-profit organizations, .edu for academic institutions .gov for government agencies (US only) and .mil for military (US only). Non-US sites may use some of the US domains or use names ending in two-letter country codes, such as .in for India, .it for Italy and so on.
The full name of a computer may take the form “cabbage.asu.edu”. That name denotes a computer called “cabbage” in the domain “asu.edu”. The domain, asu.edu is assigned to Arizona State University. It is the responsibility of the people running the asu.edu domain that there cannot be more than one computer called cabbage. In fact, to make such administration easier, larger domains are split into sub-domains. Hence a computer at Arizona State University may be named fruitcake.eas.asu.edu, which mean, “fruitcake” is a computer, that is in the “eas.asu.edu” sub-domain, owned by the Engineering and Applies Science organization, with is a part of the asu.edu domain.
Domain-based names provide a nice way to name every computer on the Internet, but do not provide any way to solve the routing problem. The routing problem is as follows: Suppose we are sitting in a cyber-café in Germany, and want to send some data to a computer called fiasco.member.startup.com. How would the computer in the cyber café know how to send the message—where on earth, really, is the destination?
To enable routing of data the sending computer needs to know the IP-address of the destination computer. The IP address is a number, written in forms such as 18.104.22.168. Every computer on the Internet has a unique IP address (along with a unique name). Given the IP Address there is a rather complicated protocol called OSPF that can manage to carry the data from the source computer to the destination computer, almost correctly, without too much delay, most of the time. In later weeks we will discuss routing and other mysteries of the Internet.
Partha Dasgupta is on the faculty of the Computer Science and Engineering Department at Arizona State University in Tempe. His specializations are in the areas of Operating Systems, Cryptography and Networking. His homepage is at http://cactus.eas.asu.edu/partha.