[ih] nice story about dave mills and NTP

Tue Oct 4 06:33:00 PDT 2022

Thanks!

On 2/10/22 1:55 PM, Jack Haverty via Internet-history wrote:
> The short answer is "Yes".  The Time-To-Live field was intended to 
> count down actual transit time as a datagram proceeded through the 
> Internet.   A datagram was to be discarded as soon as some algorithm 
> determined it wasn't going to get to its destination before its TTL 
> ran to zero.   But we didn't have the means to measure time, so 
> hop-counts were the placeholder.
>
> I wasn't involved in the IPV6 work, but I suspect the change of the 
> field to "hop count" reflected the reality of what the field actually 
> was.   But it would have been better to have actually made Time work.
>
> Much of these "original ideas" probably weren't ever written down in 
> persistent media.  Most discussions in the 1980 time frame were done 
> either in person or more extensively in email.   Disk space was scarce 
> and expensive, so much of such email was probably never archived - 
> especially email not on the more "formal" mailing lists of the day.
>
> As I recall, Time was considered very important, for a number of 
> reasons.  So here's what I remember...
> -----
>
> Like every project using computers, the Internet was constrained by 
> too little memory, too slow processors, and too limited bandwidth. A 
> typical, and expensive, system might have a few dozen kilobytes of 
> memory, a processor running at perhaps 1 MHz, and "high speed" 
> communications circuits carrying 56 kilobits per second.   So there 
> was strong incentive not to waste resources.
>
> At the time, the ARPANET had been running for about ten years, and 
> quite a lot of experience had been gained through its operation and 
> crises.  Over that time, a lot of mechanisms had been put in place, 
> internally in the IMP algorithms and hardware, to "protect" the 
> network and keep it running despite what the user computers tried to 
> do.  So, for example, an IMP could regulate the flow of traffic from 
> any of its "host" computers, and even shut it off completely if 
> needed.  (Google "ARPANET RFNM counting" if curious).
>
> In the Internet, the gateways had no such mechanisms available. We 
> were especially concerned about the "impedance mismatch" that would 
> occur at a gateway connecting a LAN to a much slower and "skinnier" 
> long-haul network.  All of the "flow control" mechanisms that were 
> implemented inside an ARPANET IMP would be instead implemented inside 
> TCP software in users' host computers.
>
> We didn't know how that would work.   But something had to be in the 
> code....  So the principle was that IP datagrams could be simply 
> discarded when necessary, wherever necessary, and TCP would retransmit 
> them so they would eventually get delivered.
>
> We envisioned that approach could easily lead to "runaway" scenarios, 
> with the Internet full of duplicate datagrams being dropped at any 
> "impedance mismatch" point along the way.   In fact, we saw exactly 
> that at a gateway between ARPANET and SATNET - IIRC in one of Dave's 
> transatlantic experiments ("Don't do that!!!")
>
> So, Source Quench was invented, as a way of telling some host to "slow 
> down", and the gateways sent an SQ back to the source of any datagram 
> it had to drop.  Many of us didn't think that would work very well 
> (e.g., a host might send one datagram and get back an SQ - what should 
> it do to "slow down"...?).   I recall that Dave knew exactly what to 
> do.  Since his machine's datagram had been dropped, it meant he should 
> immediately retransmit it.   Another "Don't do that!" moment....
>
> But SQ was a placeholder too -- to be replaced by some "real" flow 
> control mechanism as soon as the experimentation revealed what that 
> should be.
>
> -----
>
> TCP retransmissions were based on Time.  If a TCP didn't receive a 
> timely acknowledgement that data had been received, it could assume 
> that someone along the way had dropped the datagram and it should 
> retransmit it.  SQ datagrams were also of course not guaranteed to get 
> to their destination, so you couldn't count on them as a signal to 
> retransmit.  So Time was the only answer.
>
> But how to set the Timer in your TCP - that was subject to 
> experimentation, with lots of ideas.  If you sent a copy of your data 
> too soon, it would just overload everything along the path through the 
> Internet with superfluous data consuming those scarce resources.  If 
> you waited too long, your end-users would complain that the Internet 
> was too slow.   So the answer was to have each TCP estimate how long 
> it was taking for a datagram to get to its destination, and set its 
> own "retransmission timer" to slightly longer than that value.
>
> Of course, such a technique requires instrumentation and data. Also, 
> since the delays might depend on the direction of a datagram's travel, 
> you needed synchronized clocks at the two endpoint of a TCP 
> connection, so they could accurately measure one-way transit times.
>
> Meanwhile, inside the gateways, there were ideas about how to do even 
> better by using Time.  For example, if the routing protocols were 
> actually based on Time (shortest transit time) rather than Hops 
> (number of gateways between here and destination), the Internet would 
> provide better user performance and be more efficient.  Even better - 
> if a gateway could "know" that a particular datagram wouldn't get to 
> its destination before it's TTL ran out, it could discard that 
> datagram immediately, even though it still had time to live.  No point 
> in wasting network resources carrying a datagram already sentenced to 
> death.
>
> We couldn't do all that.   Didn't have the hardware, didn't have the 
> algorithms, didn't have the protocols.  So in the meantime, any 
> computer handling an IP datagram should simply decrement the TTL 
> value, and if it reached zero the datagram should be discarded. TTL 
> effectively became a "hop count".
>
> When Dave got NTP running, and enough Time Servers were online and 
> reliable, and the gateways and hosts had the needed hardware, Time 
> could be measured, TTL could be set based on Time, and the Internet 
> would be better.
>
> In the meanwhile, all of us TCP implementers just picked some value 
> for our retransmission timers.  I think I set mine to 3 seconds. No 
> exhaustive analysis or sophisticated mathematics involved.  It just 
> felt right.....there was a lot of that going on in the early Internet.
>
> -----
>
> While all the TCP work was going on, other uses were emerging.  We 
> knew that there was more to networking than just logging in to distant 
> computers or transferring files between them - uses that had been 
> common for years in the ARPANET.   But the next "killer app" hadn't 
> appeared yet, although there were lots of people trying to create one.
>
> In particular, "Packet Voice" was popular, with a contingent of 
> researchers figuring out how to do that on the fledgling Internet. 
> There were visions that someday it might even be possible to do 
> Video.  In particular, *interactive* voice was the goal, i.e., the 
> ability to have a conversation by voice over the Internet (I don't 
> recall when the term VOIP emerged, probably much later).
>
> In a resource-constrained network, you don't want to waste resources 
> on datagrams that aren't useful.  In conversational voice, a datagram 
> that arrives too late isn't useful.  A fragment of audio that should 
> have gone to the speaker 500 milliseconds ago can only be discarded.  
> It would be better that it hadn't been sent at all, but at least 
> discarding it along the way, as soon as it's known to be too late to 
> arrive, would be appropriate.
>
> Of course, that needs Time.  UDP was created as an adjunct to TCP, 
> providing a different kind of network service.   Where TCP got all of 
> the data to its destination, no matter how long it took, UDP would get 
> as much data as possible to the destination, as long as it got there 
> in time to be useful.   Time was important.
>
> UDP implementations, in host computers, didn't have to worry about 
> retransmissions.  But they did still have to worry about how long it 
> would take for a datagram to get to its destination.  With that 
> knowledge, they could set their datagrams' TTL values to something 
> appropriate for the network conditions at the time.  Perhaps they 
> might even tell their human users "Sorry, conversational use not 
> available right now." -- an Internet equivalent of the "busy signal" - 
> if the current network transit times were too high to provide a good 
> user experience.
>
> Within the world of gateways, the differing needs of TCP and UDP 
> motivated different behaviors.  That motivated the inclusion of the 
> TOS - Type Of Service - field in the IP datagram header. Perhaps UDP 
> packets would receive higher priority, being placed at the head of 
> queues so they got transmitted sooner.  Perhaps they would be 
> discarded immediately if the gateway knew, based on its routing 
> mechanisms, that the datagram would never get delivered in time. 
> Perhaps UDP would be routed differently, using a terrestrial but 
> low-bandwidth network, while TCP traffic was directed over a 
> high-bandwidth but long-delay satellite path.   A gateway mesh might 
> have two or more independent routing mechanisms, each using a 
> "shortest path" approach, but with different metrics for determining 
> "short" - e.g., UDP using the shortest time route, while some TCP 
> traffic travelled a route with least ("shortest") usage at the time.
>
> We couldn't do all that either.  We needed Time, hardware, algorithms, 
> protocols, etc.  But the placeholders were there, in the TCP, IP, and 
> UDP formats, ready for experimentation to figure all that stuff out.
>
> -----
>
> When Time was implemented, there could be much needed experimentation 
> to figure out the right answers.  Meanwhile, we had to keep the 
> Internet working.  By the early 1980s, the ARPANET had been in 
> operation for more than a decade, and lots of operational experience 
> had accrued.  We knew, for example, that things could "go wrong" and 
> generate a crisis for the network operators to quickly fix.    TTL, 
> even as just a hop count, was one mechanism to suppress problems.  We 
> knew that "routing loops" could occur.   TTL would at least prevent 
> situations where datagrams circulated forever, orbiting inside the 
> Internet until someone discovered and fixed whatever was causing a 
> routing loop to keep those datagrams speeding around.
>
> Since the Internet was an Experiment, there were mechanisms put in 
> place to help run experiments.  IIRC, in general things were put in 
> the IP headers when we thought they were important and would be needed 
> long after the experimental phase was over - things like TTL, SQ, TOS.
>
> Essentially every field in the IP header, and every type of datagram, 
> was there for some good reason, even though its initial implementation 
> was known to be inadequate.   The Internet was built on Placeholders....
>
> Other mechanisms were put into the "Options" mechanism of the IP 
> format.   A lot of that was targeted towards supporting experiments, 
> or as occasional tools to be used to debug problems in crises during 
> Internet operations.
>
> E.g., all of the "Source Routing" mechanisms might be used to route 
> traffic in particular paths that the current gateways wouldn't 
> otherwise use.  An example would be routing voice traffic over 
> specific paths, which the normal gateway routing wouldn't use.   The 
> Voice experimenters could use those mechanisms to try out their ideas 
> in a controlled experiment.
>
> Similarly, Source Routing might be used to debug network problems. A 
> network analyst might use Source Routing to probe a particular remote 
> computer interface, where the regular gateway mechanisms would avoid 
> that path.
>
> So a general rule was that IP headers contained important mechanisms, 
> often just as placeholders, while Options contained things useful only 
> in particular circumstances.
>
> But all of these "original ideas" needed Time.   We knew Dave was "on 
> it"....
>
> -----
>
> Hopefully this helps...  I (and many others) probably should have 
> written these "original ideas" down 40 years ago.   We did, but I 
> suspect all in the form of emails which have now been lost. Sorry 
> about that.   There was always so much code to write.  And we didn't 
> have the answers yet to motivate creating RFCs which were viewed as 
> more permanent repositories of the solved problems.
>
> Sorry about that.....
>
> Jack Haverty
>
>
>
> On 10/2/22 07:45, Alejandro Acosta via Internet-history wrote:
>> Hello Jack,
>>
>>   Thanks a lot for sharing this, as usual, I enjoy this kind of 
>> stories :-)
>>
>>   Jack/group, just a question regarding this topic. When you mentioned:
>>
>> "This caused a lot of concern about protocol elements such as 
>> Time-To-Live, which were temporarily to be implemented purely as "hop 
>> counts"
>>
>>
>>   Do you mean, the original idea was to really drop the packet at 
>> certain time, a *real* Time-To-Live concept?.
>>
>>
>> Thanks,
>>
>> P.S. That's why it was important to change the field's name to hop 
>> count in v6 :-)
>>
>>
>>
>> On 2/10/22 12:35 AM, Jack Haverty via Internet-history wrote:
>>> On 10/1/22 16:30, vinton cerf via Internet-history wrote:
>>>> in the New Yorker
>>>>
>>>> https://www.newyorker.com/tech/annals-of-technology/the-thorny-problem-of-keeping-the-internets-time 
>>>>
>>>>
>>>> v
>>>
>>> Agree, nice story.   Dave did a *lot* of good work.  Reading the 
>>> article reminded me of the genesis of NTP.
>>>
>>> IIRC....
>>>
>>> Back in the early days circa 1980, Dave was the unabashed tinkerer, 
>>> experimenter, and scientist.  Like all good scientists, he wanted to 
>>> run experiments to explore what the newfangled Internet was doing 
>>> and test his theories.   To do that required measurements and data.
>>>
>>> At the time, BBN was responsible for the "core gateways" that 
>>> provided most of the long-haul Internet connectivity, e.g., between 
>>> US west and east coasts and Europe.  There were lots of ideas about 
>>> how to do things - e.g., strategies for TCP retransmissions, 
>>> techniques for maintaining dynamic tables of routing information, 
>>> algorithms for dealing with limited bandwidth and memory, and other 
>>> such stuff that was all intentionally very loosely defined within 
>>> the protocols.   The Internet was an Experiment.
>>>
>>> I remember talking with Dave back at the early Internet meetings, 
>>> and his fervor to try things out, and his disappointment at the lack 
>>> of the core gateway's ability to measure much of anything. In 
>>> particular, it was difficult to measure how long things took in the 
>>> Internet, since the gateways didn't even have real-time clocks. This 
>>> caused a lot of concern about protocol elements such as 
>>> Time-To-Live, which were temporarily to be implemented purely as 
>>> "hop counts", pending the introduction of some mechanism for 
>>> measuring Time into the gateways.  (AFAIK, we're still waiting....)
>>>
>>> Curiously, in the pre-Internet days of the ARPANET, the ARPANET IMPs 
>>> did have a pretty good mechanism for measuring time, at least 
>>> between pairs of IMPs at either end of a communications circuit, 
>>> because such circuits ran at specific speeds.   So one IMP could 
>>> tell how long it was taking to communicate with one of its 
>>> neighbors, and used such data to drive the ARPANET internal routing 
>>> mechanisms.
>>>
>>> In the Internet, gateways couldn't tell how long it took to send a 
>>> datagram over one of its attached networks.   The networks of the 
>>> day simply didn't make such information available to its "users" 
>>> (e.g., a gateway).
>>>
>>> But experiments require data, and labs require instruments to 
>>> collect that data, and Dave wanted to test out lots of ideas, and we 
>>> (BBN) couldn't offer any hope of such instrumentation in the core 
>>> gateways any time soon.
>>>
>>> So Dave built it.
>>>
>>> And that's how NTP got started.  IIRC, the rest of us were all just 
>>> trying to get the Internet to work at all.   Dave was interested in 
>>> understanding how and why it worked.  So while he built NTP, that 
>>> didn't really affect any other projects. Plus most (at least me) 
>>> didn't understand how it was possible to get such accurate 
>>> synchronization when the delays through the Internet mesh were so 
>>> large and variable.   (I still don't). But Dave thought it was 
>>> possible, and that's why your computer, phone, laptop, or whatever 
>>> know what time it is today.
>>>
>>> Dave was responsible for another long-lived element of the 
>>> Internet.   Dave's experiments were sometimes disruptive to the 
>>> "core" Internet that we were tasked to make a reliable 24x7 
>>> service.  Where Dave The Scientist would say "I wonder what happens 
>>> when I do this..." We The Engineers would say "Don't do that!"
>>>
>>> That was the original motivation for creating the notion of 
>>> "Autonomous Systems" and EGP - a way to insulate the "core" of the 
>>> Internet from the antics of the Fuzzballs.  I corralled Eric Rosen 
>>> after one such Fuzzball-triggered incident and we sat down and 
>>> created ASes, so that we could keep "our" AS running reliably.  It 
>>> was intended as an interim mechanism until all the experimentation 
>>> revealed what should be the best algorithms and protocol features to 
>>> put in the next generation, and the Internet Experiment advanced 
>>> into a production network service.   We defined ASes and EGP to 
>>> protect the Internet from Dave's Fuzzball mania.
>>>
>>> AFAIK, that hasn't happened yet ... and from that article, Dave is 
>>> still Experimenting..... and The Internet is still an Experiment.
>>>
>>> Fun times,
>>> Jack Haverty
>>>
>