How the ARPANET Protocols Worked (2024)

The ARPANET changed computing forever by proving that computers of wildlydifferent manufacture could be connected using standardized protocols. In mypost on the historical significance of the ARPANET, I mentioned a few of those protocols, but didn’tdescribe them in any detail. So I wanted to take a closer look at them. I alsowanted to see how much of the design of those early protocols survives in theprotocols we use today.

The ARPANET protocols were, like our modern internet protocols, organized intolayers.¹ The protocols in the higher layers ran on top of the protocols inthe lower layers. Today the TCP/IP suite has five layers (the Physical,Link, Network, Transport, and Application layers), but the ARPANET had onlythree layers—or possibly four, depending on how you count them.

I’m going to explain how each of these layers worked, but first an aside aboutwho built what in the ARPANET, which you need to know to understand why thelayers were divided up as they were.

Some Quick Historical Context

The ARPANET was funded by the US federal government, specifically the AdvancedResearch Projects Agency within the Department of Defense (hence the name“ARPANET”). The US government did not directly build the network; instead, itcontracted the work out to a Boston-based consulting firm called Bolt, Beranek,and Newman, more commonly known as BBN.

BBN, in turn, handled many of the responsibilities for implementing the networkbut not all of them. What BBN did was design and maintain a machine known asthe Interface Message Processor, or IMP. The IMP was a customized Honeywellminicomputer, one of which was delivered to each site across the country thatwas to be connected to the ARPANET. The IMP served as a gateway to the ARPANETfor up to four hosts at each host site. It was basically a router. BBNcontrolled the software running on the IMPs that forwarded packets from IMP toIMP, but the firm had no direct control over the machines that would connect tothe IMPs and become the actual hosts on the ARPANET.

The host machines were controlled by the computer scientists that were the endusers of the network. These computer scientists, at host sites across thecountry, were responsible for writing the software that would allow the hoststo talk to each other. The IMPs gave hosts the ability to send messages to eachother, but that was not much use unless the hosts agreed on a format to use forthe messages. To solve that problem, a motley crew consisting in large part ofgraduate students from the various host sites formed themselves into theNetwork Working Group, which sought to specify protocols for the host computersto use.

So if you imagine a single successful network interaction over the ARPANET,(sending an email, say), some bits of engineering that made the interactionsuccessful were the responsibility of one set of people (BBN), while otherbits of engineering were the responsibility of another set of people (theNetwork Working Group and the engineers at each host site). That organizationaland logistical happenstance probably played a big role in motivating thelayered approach used for protocols on the ARPANET, which in turn influencedthe layered approach used for TCP/IP.

Okay, Back to the Protocols

The ARPANET protocol hierarchy.

The protocol layers were organized into a hierarchy. At the very bottom was“level 0.”² This is the layer that in some sense doesn’t count, because onthe ARPANET this layer was controlled entirely by BBN, so there was no needfor a standard protocol. Level 0 governed how data passed betweenthe IMPs. Inside of BBN, there were rules governing how IMPs did this; outsideof BBN, the IMP sub-network was a black box that just passed on any datathat you gave it. So level 0 was a layer without a real protocol, in the senseof a publicly known and agreed-upon set of rules, and its existence could beignored by software running on the ARPANET hosts. Loosely speaking, it handledeverything that falls under the Physical, Link, and Internet layers of theTCP/IP suite today, and even quite a lot of the Transport layer, which issomething I’ll come back to at the end of this post.

The “level 1” layer established the interface between the ARPANET hosts and theIMPs they were connected to. It was an API, if you like, for the black boxlevel 0 that BBN had built. It was also referred to at the time as the IMP-HostProtocol. This protocol had to be written and published because, when theARPANET was first being set up, each host site had to write its own software tointerface with the IMP. They wouldn’t have known how to do that unless BBN gavethem some guidance.

The IMP-Host Protocol was specified by BBN in a lengthy document called BBNReport 1822. Thedocument was revised many times as the ARPANET evolved; what I’m going todescribe here is roughly the way the IMP-Host protocol worked as it wasinitially designed. According to BBN’s rules, hosts could pass messages totheir IMPs no longer than 8095 bits, and each message had a leader thatincluded the destination host number and something called a link number.³The IMP would examine the designation host number and then dutifully forwardthe message into the network. When messages were received from a remote host,the receiving IMP would replace the destination host number with the sourcehost number before passing it on to the local host. Messages were not actuallywhat passed between the IMPs themselves—the IMPs broke the messages down intosmaller packets for transfer over the network—but that detail was hidden fromthe hosts.

The Host-IMP message leader format, as of 1969. Diagram from BBN Report1763.

The link number, which could be any number from 0 to 255, served two purposes.It was used by higher level protocols to establish more than one channel ofcommunication between any two hosts on the network, since it was conceivablethat there might be more than one local user talking to the same destinationhost at any given time. (In other words, the link numbers allowed communicationto be multiplexed between hosts.) But it was also used at the level 1 layer tocontrol the amount of traffic that could be sent between hosts, which wasnecessary to prevent faster computers from overwhelming slower ones. Asinitially designed, the IMP-Host Protocol limited each host to sending just onemessage at a time over each link. Once a given host had sent a message along alink to a remote host, it would have to wait to receive a special kind ofmessage called an RFNM (Request for Next Message) from the remote IMPbefore sending the next message along the same link. Later revisions to thissystem, made to improve performance, allowed a host to have up to eightmessages in transit to another host at a given time.⁴

The “level 2” layer is where things really start to get interesting, because itwas this layer and the one above it that BBN and the Department of Defense leftentirely to the academics and the Network Working Group to invent forthemselves. The level 2 layer comprised the Host-Host Protocol, which was firstsketched in RFC 9 and first officially specified by RFC 54. A more readableexplanation of the Host-Host Protocol is given in the ARPANET ProtocolHandbook.

The Host-Host Protocol governed how hosts created and managed connectionswith each other. A connection was a one-way data pipeline between a writesocket on one host and a read socket on another host. The “socket” conceptwas introduced on top of the limited level-1 link facility (remember that thelink number can only be one of 256 values) to give programs a way of addressinga particular process running on a remote host. Read sockets were even-numberedwhile write sockets were odd-numbered; whether a socket was a read socket or awrite socket was referred to as the socket’s gender. There were no “portnumbers” like in TCP. Connections could be opened, manipulated, and closed byspecially formatted Host-Host control messages sent between hosts using link 0,which was reserved for that purpose. Once control messages were exchanged overlink 0 to establish a connection, further data messages could then be sentusing another link number picked by the receiver.

Host-Host control messages were identified by a three-letter mnemonic. Aconnection was established when two hosts exchanged a STR (sender-to-receiver)message and a matching RTS (receiver-to-sender) message—these control messageswere both known as Request for Connection messages. Connections could be closedby the CLS (close) control message. There were further control messages thatchanged the rate at which data messages were sent from sender to receiver,which were needed to ensure again that faster hosts did not overwhelm slowerhosts. The flow control already provided by the level 1 protocol was apparentlynot sufficient at level 2; I suspect this was because receiving an RFNM from aremote IMP was only a guarantee that the remote IMP had passed the message onto the destination host, not that the host had fully processed the message.There was also an INR (interrupt-by-receiver) control message and an INS(interrupt-by-sender) control message that were primarily for use byhigher-level protocols.

The higher-level protocols all lived in “level 3”, which was the Applicationlayer of the ARPANET. The Telnet protocol, which provided a virtual teletypeconnection to another host, was perhaps the most important of these protocols,but there were many others in this level too, such as FTP for transferringfiles and various experiments with protocols for sending email.

One protocol in this level was not like the others: the Initial ConnectionProtocol (ICP). ICP was considered to be a level-3 protocol, but really it wasa kind of level-2.5 protocol, since other level-3 protocols depended on it. ICPwas needed because the connections provided by the Host-Host Protocol at level2 were only one-way, but most applications required a two-way (i.e.full-duplex) connection to do anything interesting. ICP specified a two-stepprocess whereby a client running on one host could connect to a long-runningserver process on another host. The first step involved establishing a one-wayconnection from the server to the client using the server process’ well-knownsocket number. The server would then send a new socket number to the clientover the established connection. At that point, the existing connection wouldbe discarded and two new connections would be opened, a read connection basedon the transmitted socket number and a write connection based on thetransmitted socket number plus one. This little dance was a necessary preludeto most things—it was the first step in establishing a Telnet connection, forexample.

That finishes our ascent of the ARPANET protocol hierarchy. You may have beenexpecting me to mention a “Network Control Protocol” at some point. Before Isat down to do research for this post and my last one, I definitely thoughtthat the ARPANET ran on a protocol called NCP. The acronym is occasionally usedto refer to the ARPANET protocols as a whole, which might be why I had thatidea. RFC 801, for example, talks abouttransitioning the ARPANET from “NCP” to “TCP” in a way that makes it sound likeNCP is an ARPANET protocol equivalent to TCP. But there has never been a“Network Control Protocol” for the ARPANET (even if Encyclopedia Britannicathinks so), and I suspect peoplehave mistakenly unpacked “NCP” as “Network Control Protocol” when really itstands for “Network Control Program.” The Network Control Program was thekernel-level program running in each host responsible for handling networkcommunication, equivalent to the TCP/IP stack in an operating system today.“NCP”, as it’s used in RFC 801, is a metonym, not a protocol.

A Comparison with TCP/IP

The ARPANET protocols were all later supplanted by the TCP/IP protocols (withthe exception of Telnet and FTP, which were easily adapted to run on top ofTCP). Whereas the ARPANET protocols were all based on the assumption that thenetwork was built and administered by a single entity (BBN), the TCP/IPprotocol suite was designed for an inter-net, a network of networks whereeverything would be more fluid and unreliable. That led to some of the moreimmediately obvious differences between our modern protocol suite and theARPANET protocols, such as how we now distinguish between a Network layer and aTransport layer. The Transport layer-like functionality that in the ARPANET waspartly implemented by the IMPs is now the sole responsibility of the hosts atthe network edge.

What I find most interesting about the ARPANET protocols though is how so muchof the transport-layer functionality now in TCP went through a jankyadolescence on the ARPANET. I’m not a networking expert, so I pulled out mycollege networks textbook (Kurose and Ross, let’s go), and they give a prettygreat outline of what a transport layer is responsible for in general. Tosummarize their explanation, a transport layer protocol must minimally do thefollowing things. Here segment is basically equivalent to message as theterm was used on the ARPANET:

Provide a delivery service between processes and not just host machines(transport layer multiplexing and demultiplexing)
Provide integrity checking on a per-segment basis (i.e. make sure there is nodata corruption in transit)

A transport layer could also, like TCP does, provide reliable data transfer,which means:

Segments are delivered in order
No segments go missing
Segments aren’t delivered so fast that they get dropped by the receiver (flowcontrol)

It seems like there was some confusion on the ARPANET about how to domultiplexing and demultiplexing so that processes could communicate—BBNintroduced the link number to do that at the IMP-Host level, but it turned outthat socket numbers were necessary at the Host-Host level on top of thatanyway. Then the link number was just used for flow control at the IMP-Hostlevel, but BBN seems to have later abandoned that in favor of doing flowcontrol between unique pairs of hosts, meaning that the link number started outas this overloaded thing only to basically became vestigial. TCP now uses portnumbers instead, doing flow control over each TCP connection separately. Theprocess-process multiplexing and demultiplexing lives entirely inside TCP anddoes not leak into a lower layer like on the ARPANET.

It’s also interesting to see, in light of how Kurose and Ross develop the ideasbehind TCP, that the ARPANET started out with what Kurose and Ross would call astrict “stop-and-wait” approach to reliable data transfer at the IMP-Hostlevel. The “stop-and-wait” approach is to transmit a segment and then refuse totransmit any more segments until an acknowledgment for the most recentlytransmitted segment has been received. It’s a simple approach, but it meansthat only one segment is ever in flight across the network, making for a veryslow protocol—which is why Kurose and Ross present “stop-and-wait” as merely astepping stone on the way to a fully featured transport layer protocol. On theARPANET, “stop-and-wait” was how things worked for a while, since, at theIMP-Host level, a Request for Next Message had to be received in response toevery outgoing message before any further messages could be sent. To be fair toBBN, they at first thought this would be necessary to provide flow controlbetween hosts, so the slowdown was intentional. As I’ve already mentioned, theRFNM requirement was later relaxed for the sake of better performance, and theIMPs started attaching sequence numbers to messages and keeping track of a“window” of messages in flight in the more or less the same way that TCPimplementations do today.⁵

So the ARPANET showed that communication between heterogeneous computing systemsis possible if you get everyone to agree on some baseline rules. That is, asI’ve previously argued, the ARPANET’s most important legacy. But what I hopethis closer look at those baseline rules has revealed is just how much theARPANET protocols also influenced the protocols we use today. There wascertainly a lot of awkwardness in the way that transport-layer responsibilitieswere shared between the hosts and the IMPs, sometimes redundantly. And it’sreally almost funny in retrospect that hosts could at first only send eachother a single message at a time over any given link. But the ARPANETexperiment was a unique opportunity to learn those lessons by actually buildingand operating a network, and it seems those lessons were put to good use whenit came time to upgrade to the internet as we know it today.

If you enjoyed this post, more like it come out every four weeks! Follow @TwoBitHistory on Twitter or subscribe to the RSS feedto make sure you know when a new post is out.

Previously on TwoBitHistory…

Trying to get back on this horse!
My latest post is my take (surprising and clever, of course) on why the ARPANET was such an important breakthrough, with a fun focus on the conference where the ARPANET was shown off for the first time:https://t.co/8SRY39c3St
— TwoBitHistory (@TwoBitHistory) February 7, 2021

The protocol layering thing was invented by the Network Working Group. This argument is made in RFC 871. The layering thing was also a natural extension of how BBN divided responsibilities between hosts and IMPs, so BBN deserves some credit too.↩
The “level” terminology was used by the Network Working Group. See e.g. RFC 100.↩
In later revisions of the IMP-Host protocol, the leader was expanded and the link number was upgraded to a message ID. But the Host-Host protocol continued to make use of only the high-order eight bits of the message ID field, treating it as a link number. See the “Host-to-Host” protocol section of the ARPANET Protocol Handbook.↩
John M. McQuillan and David C. Walden. “The ARPA Network Design Decisions,” p. 284, https://www.walden-family.com/public/whole-paper.pdf. Accessed 8 March 2021.↩
Ibid.↩