[ih] nice story about dave mills and NTP

Sun Oct 2 12:50:34 PDT 2022

Jack,
On 03-Oct-22 06:55, Jack Haverty via Internet-history wrote:
> The short answer is "Yes".  The Time-To-Live field was intended to count
> down actual transit time as a datagram proceeded through the Internet.
> A datagram was to be discarded as soon as some algorithm determined it
> wasn't going to get to its destination before its TTL ran to zero.   But
> we didn't have the means to measure time, so hop-counts were the
> placeholder.
> 
> I wasn't involved in the IPV6 work, but I suspect the change of the
> field to "hop count" reflected the reality of what the field actually
> was.   But it would have been better to have actually made Time work.

To be blunt, why?

There was no promise of guaranteed latency in those days, was there?
As soon as queueing theory entered the game, that wasn't an option.
So it wasn't just the absence of precise time, it was the presence of
random delays that made a hop count the right answer, not just the
convenient answer.

I think that's why IPv6 never even considered anything but a hop count.
The same lies behind the original TOS bits and their rebranding as
the Differentiated Services Code Point many years later. My motto
during the diffserv debates was "You can't beat queueing theory."

There are people in the IETF working hard on Detnet ("deterministic
networking") today. Maybe they have worked out how to beat queueing
theory, but I doubt it. What I learned from working on real-time
control systems is that you can't guarantee timing outside a very
limited and tightly managed set of resources, where unbounded
queues cannot occur.

    Brian

> 
> Much of these "original ideas" probably weren't ever written down in
> persistent media.  Most discussions in the 1980 time frame were done
> either in person or more extensively in email.   Disk space was scarce
> and expensive, so much of such email was probably never archived -
> especially email not on the more "formal" mailing lists of the day.
> 
> As I recall, Time was considered very important, for a number of
> reasons.  So here's what I remember...
> -----
> 
> Like every project using computers, the Internet was constrained by too
> little memory, too slow processors, and too limited bandwidth. A
> typical, and expensive, system might have a few dozen kilobytes of
> memory, a processor running at perhaps 1 MHz, and "high speed"
> communications circuits carrying 56 kilobits per second.   So there was
> strong incentive not to waste resources.
> 
> At the time, the ARPANET had been running for about ten years, and quite
> a lot of experience had been gained through its operation and crises.
> Over that time, a lot of mechanisms had been put in place, internally in
> the IMP algorithms and hardware, to "protect" the network and keep it
> running despite what the user computers tried to do.  So, for example,
> an IMP could regulate the flow of traffic from any of its "host"
> computers, and even shut it off completely if needed.  (Google "ARPANET
> RFNM counting" if curious).
> 
> In the Internet, the gateways had no such mechanisms available.  We were
> especially concerned about the "impedance mismatch" that would occur at
> a gateway connecting a LAN to a much slower and "skinnier" long-haul
> network.  All of the "flow control" mechanisms that were implemented
> inside an ARPANET IMP would be instead implemented inside TCP software
> in users' host computers.
> 
> We didn't know how that would work.   But something had to be in the
> code....  So the principle was that IP datagrams could be simply
> discarded when necessary, wherever necessary, and TCP would retransmit
> them so they would eventually get delivered.
> 
> We envisioned that approach could easily lead to "runaway" scenarios,
> with the Internet full of duplicate datagrams being dropped at any
> "impedance mismatch" point along the way.   In fact, we saw exactly that
> at a gateway between ARPANET and SATNET - IIRC in one of Dave's
> transatlantic experiments ("Don't do that!!!")
> 
> So, Source Quench was invented, as a way of telling some host to "slow
> down", and the gateways sent an SQ back to the source of any datagram it
> had to drop.  Many of us didn't think that would work very well (e.g., a
> host might send one datagram and get back an SQ - what should it do to
> "slow down"...?).   I recall that Dave knew exactly what to do.  Since
> his machine's datagram had been dropped, it meant he should immediately
> retransmit it.   Another "Don't do that!" moment....
> 
> But SQ was a placeholder too -- to be replaced by some "real" flow
> control mechanism as soon as the experimentation revealed what that
> should be.
> 
> -----
> 
> TCP retransmissions were based on Time.  If a TCP didn't receive a
> timely acknowledgement that data had been received, it could assume that
> someone along the way had dropped the datagram and it should retransmit
> it.  SQ datagrams were also of course not guaranteed to get to their
> destination, so you couldn't count on them as a signal to retransmit.
> So Time was the only answer.
> 
> But how to set the Timer in your TCP - that was subject to
> experimentation, with lots of ideas.  If you sent a copy of your data
> too soon, it would just overload everything along the path through the
> Internet with superfluous data consuming those scarce resources.  If you
> waited too long, your end-users would complain that the Internet was too
> slow.   So the answer was to have each TCP estimate how long it was
> taking for a datagram to get to its destination, and set its own
> "retransmission timer" to slightly longer than that value.
> 
> Of course, such a technique requires instrumentation and data. Also,
> since the delays might depend on the direction of a datagram's travel,
> you needed synchronized clocks at the two endpoint of a TCP connection,
> so they could accurately measure one-way transit times.
> 
> Meanwhile, inside the gateways, there were ideas about how to do even
> better by using Time.  For example, if the routing protocols were
> actually based on Time (shortest transit time) rather than Hops (number
> of gateways between here and destination), the Internet would provide
> better user performance and be more efficient.  Even better - if a
> gateway could "know" that a particular datagram wouldn't get to its
> destination before it's TTL ran out, it could discard that datagram
> immediately, even though it still had time to live.  No point in wasting
> network resources carrying a datagram already sentenced to death.
> 
> We couldn't do all that.   Didn't have the hardware, didn't have the
> algorithms, didn't have the protocols.  So in the meantime, any computer
> handling an IP datagram should simply decrement the TTL value, and if it
> reached zero the datagram should be discarded. TTL effectively became a
> "hop count".
> 
> When Dave got NTP running, and enough Time Servers were online and
> reliable, and the gateways and hosts had the needed hardware, Time could
> be measured, TTL could be set based on Time, and the Internet would be
> better.
> 
> In the meanwhile, all of us TCP implementers just picked some value for
> our retransmission timers.  I think I set mine to 3 seconds. No
> exhaustive analysis or sophisticated mathematics involved.  It just felt
> right.....there was a lot of that going on in the early Internet.
> 
> -----
> 
> While all the TCP work was going on, other uses were emerging.  We knew
> that there was more to networking than just logging in to distant
> computers or transferring files between them - uses that had been common
> for years in the ARPANET.   But the next "killer app" hadn't appeared
> yet, although there were lots of people trying to create one.
> 
> In particular, "Packet Voice" was popular, with a contingent of
> researchers figuring out how to do that on the fledgling Internet. There
> were visions that someday it might even be possible to do Video.  In
> particular, *interactive* voice was the goal, i.e., the ability to have
> a conversation by voice over the Internet (I don't recall when the term
> VOIP emerged, probably much later).
> 
> In a resource-constrained network, you don't want to waste resources on
> datagrams that aren't useful.  In conversational voice, a datagram that
> arrives too late isn't useful.  A fragment of audio that should have
> gone to the speaker 500 milliseconds ago can only be discarded.  It
> would be better that it hadn't been sent at all, but at least discarding
> it along the way, as soon as it's known to be too late to arrive, would
> be appropriate.
> 
> Of course, that needs Time.  UDP was created as an adjunct to TCP,
> providing a different kind of network service.   Where TCP got all of
> the data to its destination, no matter how long it took, UDP would get
> as much data as possible to the destination, as long as it got there in
> time to be useful.   Time was important.
> 
> UDP implementations, in host computers, didn't have to worry about
> retransmissions.  But they did still have to worry about how long it
> would take for a datagram to get to its destination.  With that
> knowledge, they could set their datagrams' TTL values to something
> appropriate for the network conditions at the time.  Perhaps they might
> even tell their human users "Sorry, conversational use not available
> right now." -- an Internet equivalent of the "busy signal" - if the
> current network transit times were too high to provide a good user
> experience.
> 
> Within the world of gateways, the differing needs of TCP and UDP
> motivated different behaviors.  That motivated the inclusion of the TOS
> - Type Of Service - field in the IP datagram header.  Perhaps UDP
> packets would receive higher priority, being placed at the head of
> queues so they got transmitted sooner.  Perhaps they would be discarded
> immediately if the gateway knew, based on its routing mechanisms, that
> the datagram would never get delivered in time. Perhaps UDP would be
> routed differently, using a terrestrial but low-bandwidth network, while
> TCP traffic was directed over a high-bandwidth but long-delay satellite
> path.   A gateway mesh might have two or more independent routing
> mechanisms, each using a "shortest path" approach, but with different
> metrics for determining "short" - e.g., UDP using the shortest time
> route, while some TCP traffic travelled a route with least ("shortest")
> usage at the time.
> 
> We couldn't do all that either.  We needed Time, hardware, algorithms,
> protocols, etc.  But the placeholders were there, in the TCP, IP, and
> UDP formats, ready for experimentation to figure all that stuff out.
> 
> -----
> 
> When Time was implemented, there could be much needed experimentation to
> figure out the right answers.  Meanwhile, we had to keep the Internet
> working.  By the early 1980s, the ARPANET had been in operation for more
> than a decade, and lots of operational experience had accrued.  We knew,
> for example, that things could "go wrong" and generate a crisis for the
> network operators to quickly fix.    TTL, even as just a hop count, was
> one mechanism to suppress problems.  We knew that "routing loops" could
> occur.   TTL would at least prevent situations where datagrams
> circulated forever, orbiting inside the Internet until someone
> discovered and fixed whatever was causing a routing loop to keep those
> datagrams speeding around.
> 
> Since the Internet was an Experiment, there were mechanisms put in place
> to help run experiments.  IIRC, in general things were put in the IP
> headers when we thought they were important and would be needed long
> after the experimental phase was over - things like TTL, SQ, TOS.
> 
> Essentially every field in the IP header, and every type of datagram,
> was there for some good reason, even though its initial implementation
> was known to be inadequate.   The Internet was built on Placeholders....
> 
> Other mechanisms were put into the "Options" mechanism of the IP
> format.   A lot of that was targeted towards supporting experiments, or
> as occasional tools to be used to debug problems in crises during
> Internet operations.
> 
> E.g., all of the "Source Routing" mechanisms might be used to route
> traffic in particular paths that the current gateways wouldn't otherwise
> use.  An example would be routing voice traffic over specific paths,
> which the normal gateway routing wouldn't use.   The Voice experimenters
> could use those mechanisms to try out their ideas in a controlled
> experiment.
> 
> Similarly, Source Routing might be used to debug network problems. A
> network analyst might use Source Routing to probe a particular remote
> computer interface, where the regular gateway mechanisms would avoid
> that path.
> 
> So a general rule was that IP headers contained important mechanisms,
> often just as placeholders, while Options contained things useful only
> in particular circumstances.
> 
> But all of these "original ideas" needed Time.   We knew Dave was "on
> it"....
> 
> -----
> 
> Hopefully this helps...  I (and many others) probably should have
> written these "original ideas" down 40 years ago.   We did, but I
> suspect all in the form of emails which have now been lost.   Sorry
> about that.   There was always so much code to write.  And we didn't
> have the answers yet to motivate creating RFCs which were viewed as more
> permanent repositories of the solved problems.
> 
> Sorry about that.....
> 
> Jack Haverty
> 
> 
> 
> On 10/2/22 07:45, Alejandro Acosta via Internet-history wrote:
>> Hello Jack,
>>
>>    Thanks a lot for sharing this, as usual, I enjoy this kind of
>> stories :-)
>>
>>    Jack/group, just a question regarding this topic. When you mentioned:
>>
>> "This caused a lot of concern about protocol elements such as
>> Time-To-Live, which were temporarily to be implemented purely as "hop
>> counts"
>>
>>
>>    Do you mean, the original idea was to really drop the packet at
>> certain time, a *real* Time-To-Live concept?.
>>
>>
>> Thanks,
>>
>> P.S. That's why it was important to change the field's name to hop
>> count in v6 :-)
>>
>>
>>
>> On 2/10/22 12:35 AM, Jack Haverty via Internet-history wrote:
>>> On 10/1/22 16:30, vinton cerf via Internet-history wrote:
>>>> in the New Yorker
>>>>
>>>> https://www.newyorker.com/tech/annals-of-technology/the-thorny-problem-of-keeping-the-internets-time
>>>>
>>>>
>>>> v
>>>
>>> Agree, nice story.   Dave did a *lot* of good work.  Reading the
>>> article reminded me of the genesis of NTP.
>>>
>>> IIRC....
>>>
>>> Back in the early days circa 1980, Dave was the unabashed tinkerer,
>>> experimenter, and scientist.  Like all good scientists, he wanted to
>>> run experiments to explore what the newfangled Internet was doing and
>>> test his theories.   To do that required measurements and data.
>>>
>>> At the time, BBN was responsible for the "core gateways" that
>>> provided most of the long-haul Internet connectivity, e.g., between
>>> US west and east coasts and Europe.  There were lots of ideas about
>>> how to do things - e.g., strategies for TCP retransmissions,
>>> techniques for maintaining dynamic tables of routing information,
>>> algorithms for dealing with limited bandwidth and memory, and other
>>> such stuff that was all intentionally very loosely defined within the
>>> protocols.   The Internet was an Experiment.
>>>
>>> I remember talking with Dave back at the early Internet meetings, and
>>> his fervor to try things out, and his disappointment at the lack of
>>> the core gateway's ability to measure much of anything. In
>>> particular, it was difficult to measure how long things took in the
>>> Internet, since the gateways didn't even have real-time clocks. This
>>> caused a lot of concern about protocol elements such as Time-To-Live,
>>> which were temporarily to be implemented purely as "hop counts",
>>> pending the introduction of some mechanism for measuring Time into
>>> the gateways.  (AFAIK, we're still waiting....)
>>>
>>> Curiously, in the pre-Internet days of the ARPANET, the ARPANET IMPs
>>> did have a pretty good mechanism for measuring time, at least between
>>> pairs of IMPs at either end of a communications circuit, because such
>>> circuits ran at specific speeds.   So one IMP could tell how long it
>>> was taking to communicate with one of its neighbors, and used such
>>> data to drive the ARPANET internal routing mechanisms.
>>>
>>> In the Internet, gateways couldn't tell how long it took to send a
>>> datagram over one of its attached networks.   The networks of the day
>>> simply didn't make such information available to its "users" (e.g., a
>>> gateway).
>>>
>>> But experiments require data, and labs require instruments to collect
>>> that data, and Dave wanted to test out lots of ideas, and we (BBN)
>>> couldn't offer any hope of such instrumentation in the core gateways
>>> any time soon.
>>>
>>> So Dave built it.
>>>
>>> And that's how NTP got started.  IIRC, the rest of us were all just
>>> trying to get the Internet to work at all.   Dave was interested in
>>> understanding how and why it worked.  So while he built NTP, that
>>> didn't really affect any other projects.  Plus most (at least me)
>>> didn't understand how it was possible to get such accurate
>>> synchronization when the delays through the Internet mesh were so
>>> large and variable.   (I still don't). But Dave thought it was
>>> possible, and that's why your computer, phone, laptop, or whatever
>>> know what time it is today.
>>>
>>> Dave was responsible for another long-lived element of the
>>> Internet.   Dave's experiments were sometimes disruptive to the
>>> "core" Internet that we were tasked to make a reliable 24x7 service.
>>> Where Dave The Scientist would say "I wonder what happens when I do
>>> this..." We The Engineers would say "Don't do that!"
>>>
>>> That was the original motivation for creating the notion of
>>> "Autonomous Systems" and EGP - a way to insulate the "core" of the
>>> Internet from the antics of the Fuzzballs.  I corralled Eric Rosen
>>> after one such Fuzzball-triggered incident and we sat down and
>>> created ASes, so that we could keep "our" AS running reliably.  It
>>> was intended as an interim mechanism until all the experimentation
>>> revealed what should be the best algorithms and protocol features to
>>> put in the next generation, and the Internet Experiment advanced into
>>> a production network service.   We defined ASes and EGP to protect
>>> the Internet from Dave's Fuzzball mania.
>>>
>>> AFAIK, that hasn't happened yet ... and from that article, Dave is
>>> still Experimenting..... and The Internet is still an Experiment.
>>>
>>> Fun times,
>>> Jack Haverty
>>>
>