[ih] nice story about dave mills and NTP

Jack Haverty jack at 3kitty.org
Sun Oct 2 10:55:05 PDT 2022


The short answer is "Yes".  The Time-To-Live field was intended to count 
down actual transit time as a datagram proceeded through the Internet.   
A datagram was to be discarded as soon as some algorithm determined it 
wasn't going to get to its destination before its TTL ran to zero.   But 
we didn't have the means to measure time, so hop-counts were the 
placeholder.

I wasn't involved in the IPV6 work, but I suspect the change of the 
field to "hop count" reflected the reality of what the field actually 
was.   But it would have been better to have actually made Time work.

Much of these "original ideas" probably weren't ever written down in 
persistent media.  Most discussions in the 1980 time frame were done 
either in person or more extensively in email.   Disk space was scarce 
and expensive, so much of such email was probably never archived - 
especially email not on the more "formal" mailing lists of the day.

As I recall, Time was considered very important, for a number of 
reasons.  So here's what I remember...
-----

Like every project using computers, the Internet was constrained by too 
little memory, too slow processors, and too limited bandwidth. A 
typical, and expensive, system might have a few dozen kilobytes of 
memory, a processor running at perhaps 1 MHz, and "high speed" 
communications circuits carrying 56 kilobits per second.   So there was 
strong incentive not to waste resources.

At the time, the ARPANET had been running for about ten years, and quite 
a lot of experience had been gained through its operation and crises.  
Over that time, a lot of mechanisms had been put in place, internally in 
the IMP algorithms and hardware, to "protect" the network and keep it 
running despite what the user computers tried to do.  So, for example, 
an IMP could regulate the flow of traffic from any of its "host" 
computers, and even shut it off completely if needed.  (Google "ARPANET 
RFNM counting" if curious).

In the Internet, the gateways had no such mechanisms available.  We were 
especially concerned about the "impedance mismatch" that would occur at 
a gateway connecting a LAN to a much slower and "skinnier" long-haul 
network.  All of the "flow control" mechanisms that were implemented 
inside an ARPANET IMP would be instead implemented inside TCP software 
in users' host computers.

We didn't know how that would work.   But something had to be in the 
code....  So the principle was that IP datagrams could be simply 
discarded when necessary, wherever necessary, and TCP would retransmit 
them so they would eventually get delivered.

We envisioned that approach could easily lead to "runaway" scenarios, 
with the Internet full of duplicate datagrams being dropped at any 
"impedance mismatch" point along the way.   In fact, we saw exactly that 
at a gateway between ARPANET and SATNET - IIRC in one of Dave's 
transatlantic experiments ("Don't do that!!!")

So, Source Quench was invented, as a way of telling some host to "slow 
down", and the gateways sent an SQ back to the source of any datagram it 
had to drop.  Many of us didn't think that would work very well (e.g., a 
host might send one datagram and get back an SQ - what should it do to 
"slow down"...?).   I recall that Dave knew exactly what to do.  Since 
his machine's datagram had been dropped, it meant he should immediately 
retransmit it.   Another "Don't do that!" moment....

But SQ was a placeholder too -- to be replaced by some "real" flow 
control mechanism as soon as the experimentation revealed what that 
should be.

-----

TCP retransmissions were based on Time.  If a TCP didn't receive a 
timely acknowledgement that data had been received, it could assume that 
someone along the way had dropped the datagram and it should retransmit 
it.  SQ datagrams were also of course not guaranteed to get to their 
destination, so you couldn't count on them as a signal to retransmit.  
So Time was the only answer.

But how to set the Timer in your TCP - that was subject to 
experimentation, with lots of ideas.  If you sent a copy of your data 
too soon, it would just overload everything along the path through the 
Internet with superfluous data consuming those scarce resources.  If you 
waited too long, your end-users would complain that the Internet was too 
slow.   So the answer was to have each TCP estimate how long it was 
taking for a datagram to get to its destination, and set its own 
"retransmission timer" to slightly longer than that value.

Of course, such a technique requires instrumentation and data. Also, 
since the delays might depend on the direction of a datagram's travel, 
you needed synchronized clocks at the two endpoint of a TCP connection, 
so they could accurately measure one-way transit times.

Meanwhile, inside the gateways, there were ideas about how to do even 
better by using Time.  For example, if the routing protocols were 
actually based on Time (shortest transit time) rather than Hops (number 
of gateways between here and destination), the Internet would provide 
better user performance and be more efficient.  Even better - if a 
gateway could "know" that a particular datagram wouldn't get to its 
destination before it's TTL ran out, it could discard that datagram 
immediately, even though it still had time to live.  No point in wasting 
network resources carrying a datagram already sentenced to death.

We couldn't do all that.   Didn't have the hardware, didn't have the 
algorithms, didn't have the protocols.  So in the meantime, any computer 
handling an IP datagram should simply decrement the TTL value, and if it 
reached zero the datagram should be discarded. TTL effectively became a 
"hop count".

When Dave got NTP running, and enough Time Servers were online and 
reliable, and the gateways and hosts had the needed hardware, Time could 
be measured, TTL could be set based on Time, and the Internet would be 
better.

In the meanwhile, all of us TCP implementers just picked some value for 
our retransmission timers.  I think I set mine to 3 seconds. No 
exhaustive analysis or sophisticated mathematics involved.  It just felt 
right.....there was a lot of that going on in the early Internet.

-----

While all the TCP work was going on, other uses were emerging.  We knew 
that there was more to networking than just logging in to distant 
computers or transferring files between them - uses that had been common 
for years in the ARPANET.   But the next "killer app" hadn't appeared 
yet, although there were lots of people trying to create one.

In particular, "Packet Voice" was popular, with a contingent of 
researchers figuring out how to do that on the fledgling Internet. There 
were visions that someday it might even be possible to do Video.  In 
particular, *interactive* voice was the goal, i.e., the ability to have 
a conversation by voice over the Internet (I don't recall when the term 
VOIP emerged, probably much later).

In a resource-constrained network, you don't want to waste resources on 
datagrams that aren't useful.  In conversational voice, a datagram that 
arrives too late isn't useful.  A fragment of audio that should have 
gone to the speaker 500 milliseconds ago can only be discarded.  It 
would be better that it hadn't been sent at all, but at least discarding 
it along the way, as soon as it's known to be too late to arrive, would 
be appropriate.

Of course, that needs Time.  UDP was created as an adjunct to TCP, 
providing a different kind of network service.   Where TCP got all of 
the data to its destination, no matter how long it took, UDP would get 
as much data as possible to the destination, as long as it got there in 
time to be useful.   Time was important.

UDP implementations, in host computers, didn't have to worry about 
retransmissions.  But they did still have to worry about how long it 
would take for a datagram to get to its destination.  With that 
knowledge, they could set their datagrams' TTL values to something 
appropriate for the network conditions at the time.  Perhaps they might 
even tell their human users "Sorry, conversational use not available 
right now." -- an Internet equivalent of the "busy signal" - if the 
current network transit times were too high to provide a good user 
experience.

Within the world of gateways, the differing needs of TCP and UDP 
motivated different behaviors.  That motivated the inclusion of the TOS 
- Type Of Service - field in the IP datagram header.  Perhaps UDP 
packets would receive higher priority, being placed at the head of 
queues so they got transmitted sooner.  Perhaps they would be discarded 
immediately if the gateway knew, based on its routing mechanisms, that 
the datagram would never get delivered in time. Perhaps UDP would be 
routed differently, using a terrestrial but low-bandwidth network, while 
TCP traffic was directed over a high-bandwidth but long-delay satellite 
path.   A gateway mesh might have two or more independent routing 
mechanisms, each using a "shortest path" approach, but with different 
metrics for determining "short" - e.g., UDP using the shortest time 
route, while some TCP traffic travelled a route with least ("shortest") 
usage at the time.

We couldn't do all that either.  We needed Time, hardware, algorithms, 
protocols, etc.  But the placeholders were there, in the TCP, IP, and 
UDP formats, ready for experimentation to figure all that stuff out.

-----

When Time was implemented, there could be much needed experimentation to 
figure out the right answers.  Meanwhile, we had to keep the Internet 
working.  By the early 1980s, the ARPANET had been in operation for more 
than a decade, and lots of operational experience had accrued.  We knew, 
for example, that things could "go wrong" and generate a crisis for the 
network operators to quickly fix.    TTL, even as just a hop count, was 
one mechanism to suppress problems.  We knew that "routing loops" could 
occur.   TTL would at least prevent situations where datagrams 
circulated forever, orbiting inside the Internet until someone 
discovered and fixed whatever was causing a routing loop to keep those 
datagrams speeding around.

Since the Internet was an Experiment, there were mechanisms put in place 
to help run experiments.  IIRC, in general things were put in the IP 
headers when we thought they were important and would be needed long 
after the experimental phase was over - things like TTL, SQ, TOS.

Essentially every field in the IP header, and every type of datagram, 
was there for some good reason, even though its initial implementation 
was known to be inadequate.   The Internet was built on Placeholders....

Other mechanisms were put into the "Options" mechanism of the IP 
format.   A lot of that was targeted towards supporting experiments, or 
as occasional tools to be used to debug problems in crises during 
Internet operations.

E.g., all of the "Source Routing" mechanisms might be used to route 
traffic in particular paths that the current gateways wouldn't otherwise 
use.  An example would be routing voice traffic over specific paths, 
which the normal gateway routing wouldn't use.   The Voice experimenters 
could use those mechanisms to try out their ideas in a controlled 
experiment.

Similarly, Source Routing might be used to debug network problems. A 
network analyst might use Source Routing to probe a particular remote 
computer interface, where the regular gateway mechanisms would avoid 
that path.

So a general rule was that IP headers contained important mechanisms, 
often just as placeholders, while Options contained things useful only 
in particular circumstances.

But all of these "original ideas" needed Time.   We knew Dave was "on 
it"....

-----

Hopefully this helps...  I (and many others) probably should have 
written these "original ideas" down 40 years ago.   We did, but I 
suspect all in the form of emails which have now been lost.   Sorry 
about that.   There was always so much code to write.  And we didn't 
have the answers yet to motivate creating RFCs which were viewed as more 
permanent repositories of the solved problems.

Sorry about that.....

Jack Haverty



On 10/2/22 07:45, Alejandro Acosta via Internet-history wrote:
> Hello Jack,
>
>   Thanks a lot for sharing this, as usual, I enjoy this kind of 
> stories :-)
>
>   Jack/group, just a question regarding this topic. When you mentioned:
>
> "This caused a lot of concern about protocol elements such as 
> Time-To-Live, which were temporarily to be implemented purely as "hop 
> counts"
>
>
>   Do you mean, the original idea was to really drop the packet at 
> certain time, a *real* Time-To-Live concept?.
>
>
> Thanks,
>
> P.S. That's why it was important to change the field's name to hop 
> count in v6 :-)
>
>
>
> On 2/10/22 12:35 AM, Jack Haverty via Internet-history wrote:
>> On 10/1/22 16:30, vinton cerf via Internet-history wrote:
>>> in the New Yorker
>>>
>>> https://www.newyorker.com/tech/annals-of-technology/the-thorny-problem-of-keeping-the-internets-time 
>>>
>>>
>>> v
>>
>> Agree, nice story.   Dave did a *lot* of good work.  Reading the 
>> article reminded me of the genesis of NTP.
>>
>> IIRC....
>>
>> Back in the early days circa 1980, Dave was the unabashed tinkerer, 
>> experimenter, and scientist.  Like all good scientists, he wanted to 
>> run experiments to explore what the newfangled Internet was doing and 
>> test his theories.   To do that required measurements and data.
>>
>> At the time, BBN was responsible for the "core gateways" that 
>> provided most of the long-haul Internet connectivity, e.g., between 
>> US west and east coasts and Europe.  There were lots of ideas about 
>> how to do things - e.g., strategies for TCP retransmissions, 
>> techniques for maintaining dynamic tables of routing information, 
>> algorithms for dealing with limited bandwidth and memory, and other 
>> such stuff that was all intentionally very loosely defined within the 
>> protocols.   The Internet was an Experiment.
>>
>> I remember talking with Dave back at the early Internet meetings, and 
>> his fervor to try things out, and his disappointment at the lack of 
>> the core gateway's ability to measure much of anything. In 
>> particular, it was difficult to measure how long things took in the 
>> Internet, since the gateways didn't even have real-time clocks. This 
>> caused a lot of concern about protocol elements such as Time-To-Live, 
>> which were temporarily to be implemented purely as "hop counts", 
>> pending the introduction of some mechanism for measuring Time into 
>> the gateways.  (AFAIK, we're still waiting....)
>>
>> Curiously, in the pre-Internet days of the ARPANET, the ARPANET IMPs 
>> did have a pretty good mechanism for measuring time, at least between 
>> pairs of IMPs at either end of a communications circuit, because such 
>> circuits ran at specific speeds.   So one IMP could tell how long it 
>> was taking to communicate with one of its neighbors, and used such 
>> data to drive the ARPANET internal routing mechanisms.
>>
>> In the Internet, gateways couldn't tell how long it took to send a 
>> datagram over one of its attached networks.   The networks of the day 
>> simply didn't make such information available to its "users" (e.g., a 
>> gateway).
>>
>> But experiments require data, and labs require instruments to collect 
>> that data, and Dave wanted to test out lots of ideas, and we (BBN) 
>> couldn't offer any hope of such instrumentation in the core gateways 
>> any time soon.
>>
>> So Dave built it.
>>
>> And that's how NTP got started.  IIRC, the rest of us were all just 
>> trying to get the Internet to work at all.   Dave was interested in 
>> understanding how and why it worked.  So while he built NTP, that 
>> didn't really affect any other projects.  Plus most (at least me) 
>> didn't understand how it was possible to get such accurate 
>> synchronization when the delays through the Internet mesh were so 
>> large and variable.   (I still don't). But Dave thought it was 
>> possible, and that's why your computer, phone, laptop, or whatever 
>> know what time it is today.
>>
>> Dave was responsible for another long-lived element of the 
>> Internet.   Dave's experiments were sometimes disruptive to the 
>> "core" Internet that we were tasked to make a reliable 24x7 service.  
>> Where Dave The Scientist would say "I wonder what happens when I do 
>> this..." We The Engineers would say "Don't do that!"
>>
>> That was the original motivation for creating the notion of 
>> "Autonomous Systems" and EGP - a way to insulate the "core" of the 
>> Internet from the antics of the Fuzzballs.  I corralled Eric Rosen 
>> after one such Fuzzball-triggered incident and we sat down and 
>> created ASes, so that we could keep "our" AS running reliably.  It 
>> was intended as an interim mechanism until all the experimentation 
>> revealed what should be the best algorithms and protocol features to 
>> put in the next generation, and the Internet Experiment advanced into 
>> a production network service.   We defined ASes and EGP to protect 
>> the Internet from Dave's Fuzzball mania.
>>
>> AFAIK, that hasn't happened yet ... and from that article, Dave is 
>> still Experimenting..... and The Internet is still an Experiment.
>>
>> Fun times,
>> Jack Haverty
>>




More information about the Internet-history mailing list