Cocooning on the Canyon
May 10, 2002
Towers of Babel
Santa Monica Beach is a small but popular stretch of sun and sand on the calm Pacific off the coast of Los Angeles, just a few miles from LAX (Los Angeles International Airport). It is crowded with balls and bodies in summer. It is not a place to hang out with an aviation radio, but it is a great place to do so. If you would like to hear people speaking a strange language you would tune the radio to the LA Center frequency of 119.05 MHZ.
Sounds from aircraft arriving from all over the world would emanate, and a conversation would sound like:
Voice 1: Japan Air six-two, LA Center, with you
Voice 2: LA Center, Japan Air six-two, descend and maintain ten thousand, heading zero-five-zero reduce speed to two-five-zero.
Voice 1: Japan Air six-two, ten thousand, zero-five-zero.
Voice 2: Japan Air six-two, contact Socal approach one-two-eight-point-two, G’day.
Voice 1: Japan Air six-two, one-two-eight-point-two, G’day
Voice 1: Socal Approach, Japan Air six-two, with you.
Voice 3: Japan Air six-two, continue zero-five-zero, descend and maintain eight thousand.
Voice 1: Japan Air six-two, Roger.
While the above sounds quite gibberish, it is of course two well-trained people, speaking a precise language. Such specialized derivatives of English exist not only in aviation, but many professional fields such as Law, Medicine, Science and Engineering. Specializations of language serve many important functions.
One such important derivative of English is the huge family of computer languages. Computer-programming languages are a mix of English words and mathematical symbols. Computer languages form a diverse hierarchy of functionality, syntax and semantics dictates by the diverse needs of programmers, applications and of course the politics, whims and religious beliefs of those who use computers.
Words are sounds that mean things. Words can be put together along the rules of a language to mean more complex things. How words can be combined in order to sound right is the syntax of the language. However these strung up words must mean something, and the meaning is the semantics of the language. Language is the cornerstone of communication. Human language is imprecise and malleable and exists in hundreds of different kinds segregated by region, countries and culture. Derivatives of human languages, meant for specific tasks, are more precise in syntax and semantics.
Linguists have found remarkable similarities in languages from the world over. One of the largest families of languages is the set of Indo-European languages. From some obscure root we can trace the development of widely different tongues such as Latin and Sanskrit. It has been shown the Latin has transformed into many forms, the Germanic form led to English and Swedish, the Italic form led to Spanish and French. Sanskrit forms the roots of many (but not all) Indian languages. Some languages spoken in Europe are not Indo-European, such as Finnish and Hebrew, as are some languages spoken in India (such as Tamil). In fact, it is quite surprising that there are strong connections between Turkish, Korean and Finnish.
Do computers really use languages? The answer is not quite obvious, as it is both “yes” and “no”, depending on your viewpoint. Computers are programmable machines that can perform arithmetic (and some logical) tasks. For example, suppose you want to make a computer add two numbers. First you would place the numbers in its memory at some locations. Each location in memory has an address, and the address is a number. Suppose you placed the two numbers in two locations, having the binary addresses 01100 and 010100. Then you would place a program in the computer’s memory, at some other location. This program would look like 011010, 01100, 010100. Here the number 011010 is the code for “add” and then the other numbers are the positions of the two numbers. When the processor executes this code, the numbers get added.
So where is the language? Apparent answer, there isn’t any.
But, in some obscure, arcane sense, the bits 011010, 01100, 010100 is a sentence consisting of three words. The sentence has a syntax (operator followed by two operands) as well as semantics (“add the two numbers”). Hence it is part of a language. In this language the words have to be binary digits burned into the memory cells of a computer, in very strict adherence to rules of it structure and position. Since we hardly can call this a language it is often referred to as “low-level language” or “machine language”.
Machine language is the only language a computer understands, period. There is nothing else. There is no such thing as C or Lisp or Java, or JXTA, or Pascal, or FORTRAN or whatever, that is comprehensible to a computer. Yet it is also true that there are hundred if not thousands of computer languages. (So where is the disconnect?)
At first, computer programmers had to use machine language. They painstakingly wrote programs on large sheets of paper and then hand entered them into the memory by flipping little switches. It was an experience worse than death, and humans soon realized that programming in machine language was not something God intended humans to do. So they set about fixing that problem.
Humans decided to invent a simple new language called Assembly Language. In assembly language, the sentence 011010, 01100, 010100 can be written as ADD X, Y. Much simpler, much easier to write, read and understand. Yet it is totally incomprehensible to the computer.
Then they wrote another program (and this was about the last program machine language program written by humans) called the Assembler. The assembler makes the computer read assembly language programs and transform them to machine language programs. An assembler is a simple program and was not too difficult to write. Never again, would humans write machine language programs again.
Assembly language became the language of choice for just a few seconds and soon programmers started whining that assembly was too difficult for humans to use. Unreadable, error-prone, too complex, hard to follow and ugly, were few of the compliments thrown at assembly. “Not a good thing, too low-level”, they said, “can’t we have better languages?” And sure enough they could, and “high-level” languages were born..
Some smart guy invented something they called a compiler. A compiler is a cross between black magic and witchcraft. A compiler can take a program written in a more “English-like” language and transform it into an assembly language program which can then be fed into the wonderful assembler, and thus digested into a machine language program. This translation capability of a compiler is the black magic part.
How do we write a compiler? We generally write compilers in the same language as the language it compiles (A C Compiler, is written in C). To make the compiler run on the computer involves making the compiler compile itself. Making the compiler compile itself is the witchcraft part.
Ever since compilers were created, life has been a bed of roses. Early favorite programming languages include FORTRAN (1955) for scientific applications, Lisp (1959) for symbolic computations and COBOL (1960) for business applications. The advances in computer language design led to an explosion of computer languages, each designed for specific tasks and tastes. Soon, wars broke out amongst the computer language developers on the lines of “my language is better than yours”. The wars have not subsided, yet, and never will.
Computer languages are not only the turf battle of developers and users; they are a constant source of politics in the computer industry. Hundreds of very good languages have been cast aside, because the competitor was the baby of some powerful company. Even governments have been embroiled in the mess, the US Government spent unthinkable amounts of money developing a language called ADA. ADA was touted to be the language that would end all language battles. ADA was proposed in 1977, chosen in 1979, re-designed for three years by a committee and then actually implemented on a computer by 1983. It had all the trimmings of such a dubious heritage. Since no one really wanted to use it, the US government decreed that all software written for the Government must be written in ADA. A handful of people learnt ADA and maybe three actually used it. Soon a re-revision effort was started which culminated in more fiasco by 1995 and then it was officially put to rest in 1998.
Today there are innumerable languages in use. Hundreds were invented and discarded. Really elegant languages like Pascal, Modula, Eiffel, Scheme are all gone, victims of neglect. Aging C is still loved (it is quite a bad language) and an abysmal fix to C called C++ is still riding high. Java was invented by Sun, to counteract the market dominance of Microsoft. Strangely Java got much more popular than it should have, and make Microsoft quite nervous. So they shot back with Active-X, which failed, then C# (pronounced C-sharp) and then .NET (pronounced dot-NET). Java seems to have nine lives, in spite of Microsoft clubbing it to death (with fancy names, and many a dubious tactic), it still lives.
Is your cell phone Java-enabled? Maybe so, but under the marketing hype, there is really no such thing as a language that a computer understands. Languages are invented by humans, for humans and compilers make computers reluctantly comprehend them.
Partha Dasgupta is on the faculty of the Computer Science and Engineering Department at Arizona State University in Tempe. His specializations are in the areas of Operating Systems, Cryptography and Networking. His homepage is at