[ih] History of 127/8 as localhost/loopback addresses?

Sat Jan 2 15:56:07 PST 2021

Here's what I remember from the period when TCP2.5 was being split into
TCP and IP V4.

The short answer is that 127.0.0.1 was not part of the initial IP V4
work.   The 127 value was "reserved for futre use", and someone
(Berkeley?) later used it to adopt 127.0.0.1 as "my address".  More
detail, HTH:

- Loopback was a well-known tool from a decade of ARPANET operations; we
tried to adopt many of the ARPANET tools and techniques into the IP level

- It was a common practice to reserve certain values for protocol
fields.  Zero was often reserved because it often appeared as a value
because of a bug somewhere; it was best not to assign it for any real use.

- "all ones" was another but less frequent error.  Still a good idea not
to assign it, but reserve it as a last-chance choice for some
unpredicted future need.  E.g., 127.x.x.x (the "last" class A network)
might have been used for some IP V5 addressing scheme where a variable
length address appeared somewhere later in the V5 header, and the x.x.x
might say where to find it.

- 127.0.0.1 was not considered as having any special meaning, it was
simply "reserved" for some possible future need; I suspect the 127.0.0.1
convention came about later as some host-side implementor decided to use
that "reserved" but unused and "safe" address to indicate local loopback
within a computer.

- A lot of tweaking around that time was motivated by the impending
explosion of LANs, especially Ethernets - where it was impossible to
encode the "local net address" of Ethernet in the remaining 24 bits even
of a "Class A" network.   For example, that led to creation of ARP, and
later DHCP.  In the ARPANET days, every host pretty much knew its IMP
address, so it could "loop back" by simply connecting to itself.  With
TCP/IP V2, a host could still do that, i.e., connect to itself, by using
its own IP address for both source and destination.   Exactly what that
did, i.e., what path such a packet might take, depended on how that
host's TCP/IP implementation worked.  It might loopback in software
within the O/S; it might send the IP packet to it's local gateway which
would send it right back; etc.   IIRC, 127.0.0.1 came into use when
workstations appeared and the number of hosts on a LAN exploded.  Even
if you didn't know your own IP address, 127.0.0.1 would work to talk to
yourself, an important step in bringing a machine up on the net.  

- At the time (defining IPV4), there weren't a lot of nets and class A
would be sufficient.  But the future was clear that many more computers
would be appearing with workstations.  So class B was defined, and even
class C.   We limited the choices of net versus host to fall on byte
boundaries in order to avoid computational load on hosts and gateways
(yes, we even counted instructions needed in tight loops like
checksumming or table lookups).

- There was also a recognition that it was architecturally possible to
define a new type of "network" which was just a wire, e.g., a telephone
circuit.   A Wire network is a network with only two IP addresses --
"this end" and "that end".  That triggered creating definitions of
additional classes of networks - class D, E, and F.   F networks used 31
bits of network number, and 1 bit of address -- very suitable for Wire
networks.   That made possible the interconnection of gateways by simply
using a wire instead of a traditional network (like the ARPANET); it was
no longer necessary to have a gateway and an IMP at an Internet site

- As TCP/IP V4 was deploying, there was a lot of "technology transfer"
from the ARPANET experience, aided by the co-location of the "ARPANET"
and "Internet" groups at BBN; both were located just "down the hallway"
from the NOC which had been operating the ARPANET for a decade and was
now tasked to do the same for the fledgling Internet.  So there was a
lot of collaboration and sharing of experience.

- The ARPANET had developed a lot of tools and techniques, and we tried
very hard to replicate that functionality within the Internet.  For
example, with "Wire" networks interconnecting gateways, each interface
on a gateway now had a unique IP address (e.g., a "Class F" one).  So it
was straightforward to perform troubleshooting tasks like "looping"
through each interface on some remote gateway to isolate problems.  
Similarly, metrics taken inside a gateway (queue lengths, checksum
errors, etc.) could be reported back to the NOC just as IMPs had been
doing.   That led to SNMP et al, analogous to the ARPANET's "traps"
mechanisms.

- The Internet at that time was still very mich an experiment; we didn't
know how it would behave.  The IP address structure and especially tools
like the various flavors of Source Routing all were intended to help
probe the network, even when it was somehow "broken", to find and fix
problems.

- There was an unknown amount of unplanned and uncoordinated
"experimentation" in that early Internet.  Probably largely undocumented
too.  Sometimes the curtains fell down and you got to see what was going
on.  A good example is the time when the gateways started reporting
massive amounts of checksum errors.  It was traced down to a new OS
release (BSD IIRC), where they had decided to change the ordering of the
LAN/TCP/IP headers traversing the Ethernet in order to avoid the need to
do a memory-memory copy operation in the O/S.   Worked fine between
"consenting workstations", but when those packets leaked out to the
other gateways it caused quite a ruckus.

/Jack Haverty

On 1/2/21 1:38 AM, John Gilmore via Internet-history wrote:
> I am in the process of sorting out various ways that the IPv4 unicast
> address space was historically constrained to allow fewer than the 2^32
> available IP addresses.  One question that came up was how we ended up
> with 16,777,216 loopback addresses in IPv4.
>
> History questions:
>
> Was there a host-software-accessible loopback or localhost function in
> the ARPANET or in NCP?  How was it invoked?
>
> When TCP/IP was being designed, where did the concept of a loopback
> function come from?  How did it merge with the "connect to a port
> on the local host, without having to figure out its IP address" function
> that 127/8 eventually got used for?
>
> Did Jon Postel or other IP designers have the localhost function in mind
> for 127 when he first reserved it back in 1981?  Was 127 used this way
> prior to 1986?  Did Jon or others discuss this use prior to then?
>
> Who, if anyone, argued for having more than a single loopback address?
> Was there discussion of whether a full Class-A network was needed for
> the loopback function?  Why was a Class-C network not used?  Is there an
> explanation for why so many addresses were ultimately assigned to that
> function?
>
> And, fast-forwarding into the 1990s:  When IPv6 was designed, why was
> this design decision revised?  Who made the decision to allocate a
> single IPv6 localhost address?  Was that controversial?
>
> Thanks for the memories!
>
> Researching in the first thousand RFC's reveals:
>
> The first mention of any kind of loopback in the RFC series seems to be
> in June 1984 in RFC 900.  In that Assigned Numbers RFC, loopback appears
> as an Ethernet frame type 0x9000, assigned for Larry Garlick of Xerox.
> This refers to a specific kind of packet sent on 10-megabit Ethernet
> v2.0 networks to test connectivity among hosts.
>
> In RFC 907 of July 1984, the SATNET Host Access Protocol has a specific
> bit assigned as the "Loopback Bit", and also defines a remote loopback
> request/response message and function.  (This is for setting a mode
> in which ALL traffic is looped from transmit to receive side of an
> interface -- not for looping an individual packet or TCP connection.)
>
> In the evolution of IP Multicast from RFC 966 in December 1985 to RFC
> 988 in July 1986, a new parameter specified whether multicast packets
> would or would not be "looped-back" to their sending host.
>
> In September 1981, in RFC 790, Jon Postel first indicated that IP
> network number 127 was "reserved", without explicitly stating for what.
> This was repeated in all the Assigned Numbers RFCs through RFC 960
> (December 1985); then in RFC 990 (November 1986), Jon and Joyce Reynolds
> assigned it for loopback, stating that:
>
>   The class A network number 127 is assigned the "loopback"
>   function, that is, a datagram sent by a higher level protocol
>   to a network 127 address should loop back inside the host.  No
>   datagram "sent" to a network 127 address should ever appear on
>   any network anywhere.
>
> By the Host Requirements RFC 1122 in October 1989, the spec was
> restated to:
>
>   { 127, <any> }
>
>   Internal host loopback address.  Addresses of this form
>   MUST NOT appear outside a host.
>
> The first mention of the specific address "127.0.0.1" in RFCs is in May
> 1993 in RFC 1459 (as an example dotted decimal address that one might
> use in the IRC protocol).  The RFCs contain no explanation of how the
> whole specified range of 16 million loopback addresses was narrowed in
> many peoples' minds to the single "localhost" address 127.0.0.1.
> ("Localhost" does not appear in the first thousand RFC's.)
>
> 	John