[ih] nice story about dave mills and NTP
Alejandro Acosta
alejandroacostaalamo at gmail.com
Tue Oct 4 06:33:00 PDT 2022
Thanks!
On 2/10/22 1:55 PM, Jack Haverty via Internet-history wrote:
> The short answer is "Yes". The Time-To-Live field was intended to
> count down actual transit time as a datagram proceeded through the
> Internet. A datagram was to be discarded as soon as some algorithm
> determined it wasn't going to get to its destination before its TTL
> ran to zero. But we didn't have the means to measure time, so
> hop-counts were the placeholder.
>
> I wasn't involved in the IPV6 work, but I suspect the change of the
> field to "hop count" reflected the reality of what the field actually
> was. But it would have been better to have actually made Time work.
>
> Much of these "original ideas" probably weren't ever written down in
> persistent media. Most discussions in the 1980 time frame were done
> either in person or more extensively in email. Disk space was scarce
> and expensive, so much of such email was probably never archived -
> especially email not on the more "formal" mailing lists of the day.
>
> As I recall, Time was considered very important, for a number of
> reasons. So here's what I remember...
> -----
>
> Like every project using computers, the Internet was constrained by
> too little memory, too slow processors, and too limited bandwidth. A
> typical, and expensive, system might have a few dozen kilobytes of
> memory, a processor running at perhaps 1 MHz, and "high speed"
> communications circuits carrying 56 kilobits per second. So there
> was strong incentive not to waste resources.
>
> At the time, the ARPANET had been running for about ten years, and
> quite a lot of experience had been gained through its operation and
> crises. Over that time, a lot of mechanisms had been put in place,
> internally in the IMP algorithms and hardware, to "protect" the
> network and keep it running despite what the user computers tried to
> do. So, for example, an IMP could regulate the flow of traffic from
> any of its "host" computers, and even shut it off completely if
> needed. (Google "ARPANET RFNM counting" if curious).
>
> In the Internet, the gateways had no such mechanisms available. We
> were especially concerned about the "impedance mismatch" that would
> occur at a gateway connecting a LAN to a much slower and "skinnier"
> long-haul network. All of the "flow control" mechanisms that were
> implemented inside an ARPANET IMP would be instead implemented inside
> TCP software in users' host computers.
>
> We didn't know how that would work. But something had to be in the
> code.... So the principle was that IP datagrams could be simply
> discarded when necessary, wherever necessary, and TCP would retransmit
> them so they would eventually get delivered.
>
> We envisioned that approach could easily lead to "runaway" scenarios,
> with the Internet full of duplicate datagrams being dropped at any
> "impedance mismatch" point along the way. In fact, we saw exactly
> that at a gateway between ARPANET and SATNET - IIRC in one of Dave's
> transatlantic experiments ("Don't do that!!!")
>
> So, Source Quench was invented, as a way of telling some host to "slow
> down", and the gateways sent an SQ back to the source of any datagram
> it had to drop. Many of us didn't think that would work very well
> (e.g., a host might send one datagram and get back an SQ - what should
> it do to "slow down"...?). I recall that Dave knew exactly what to
> do. Since his machine's datagram had been dropped, it meant he should
> immediately retransmit it. Another "Don't do that!" moment....
>
> But SQ was a placeholder too -- to be replaced by some "real" flow
> control mechanism as soon as the experimentation revealed what that
> should be.
>
> -----
>
> TCP retransmissions were based on Time. If a TCP didn't receive a
> timely acknowledgement that data had been received, it could assume
> that someone along the way had dropped the datagram and it should
> retransmit it. SQ datagrams were also of course not guaranteed to get
> to their destination, so you couldn't count on them as a signal to
> retransmit. So Time was the only answer.
>
> But how to set the Timer in your TCP - that was subject to
> experimentation, with lots of ideas. If you sent a copy of your data
> too soon, it would just overload everything along the path through the
> Internet with superfluous data consuming those scarce resources. If
> you waited too long, your end-users would complain that the Internet
> was too slow. So the answer was to have each TCP estimate how long
> it was taking for a datagram to get to its destination, and set its
> own "retransmission timer" to slightly longer than that value.
>
> Of course, such a technique requires instrumentation and data. Also,
> since the delays might depend on the direction of a datagram's travel,
> you needed synchronized clocks at the two endpoint of a TCP
> connection, so they could accurately measure one-way transit times.
>
> Meanwhile, inside the gateways, there were ideas about how to do even
> better by using Time. For example, if the routing protocols were
> actually based on Time (shortest transit time) rather than Hops
> (number of gateways between here and destination), the Internet would
> provide better user performance and be more efficient. Even better -
> if a gateway could "know" that a particular datagram wouldn't get to
> its destination before it's TTL ran out, it could discard that
> datagram immediately, even though it still had time to live. No point
> in wasting network resources carrying a datagram already sentenced to
> death.
>
> We couldn't do all that. Didn't have the hardware, didn't have the
> algorithms, didn't have the protocols. So in the meantime, any
> computer handling an IP datagram should simply decrement the TTL
> value, and if it reached zero the datagram should be discarded. TTL
> effectively became a "hop count".
>
> When Dave got NTP running, and enough Time Servers were online and
> reliable, and the gateways and hosts had the needed hardware, Time
> could be measured, TTL could be set based on Time, and the Internet
> would be better.
>
> In the meanwhile, all of us TCP implementers just picked some value
> for our retransmission timers. I think I set mine to 3 seconds. No
> exhaustive analysis or sophisticated mathematics involved. It just
> felt right.....there was a lot of that going on in the early Internet.
>
> -----
>
> While all the TCP work was going on, other uses were emerging. We
> knew that there was more to networking than just logging in to distant
> computers or transferring files between them - uses that had been
> common for years in the ARPANET. But the next "killer app" hadn't
> appeared yet, although there were lots of people trying to create one.
>
> In particular, "Packet Voice" was popular, with a contingent of
> researchers figuring out how to do that on the fledgling Internet.
> There were visions that someday it might even be possible to do
> Video. In particular, *interactive* voice was the goal, i.e., the
> ability to have a conversation by voice over the Internet (I don't
> recall when the term VOIP emerged, probably much later).
>
> In a resource-constrained network, you don't want to waste resources
> on datagrams that aren't useful. In conversational voice, a datagram
> that arrives too late isn't useful. A fragment of audio that should
> have gone to the speaker 500 milliseconds ago can only be discarded.
> It would be better that it hadn't been sent at all, but at least
> discarding it along the way, as soon as it's known to be too late to
> arrive, would be appropriate.
>
> Of course, that needs Time. UDP was created as an adjunct to TCP,
> providing a different kind of network service. Where TCP got all of
> the data to its destination, no matter how long it took, UDP would get
> as much data as possible to the destination, as long as it got there
> in time to be useful. Time was important.
>
> UDP implementations, in host computers, didn't have to worry about
> retransmissions. But they did still have to worry about how long it
> would take for a datagram to get to its destination. With that
> knowledge, they could set their datagrams' TTL values to something
> appropriate for the network conditions at the time. Perhaps they
> might even tell their human users "Sorry, conversational use not
> available right now." -- an Internet equivalent of the "busy signal" -
> if the current network transit times were too high to provide a good
> user experience.
>
> Within the world of gateways, the differing needs of TCP and UDP
> motivated different behaviors. That motivated the inclusion of the
> TOS - Type Of Service - field in the IP datagram header. Perhaps UDP
> packets would receive higher priority, being placed at the head of
> queues so they got transmitted sooner. Perhaps they would be
> discarded immediately if the gateway knew, based on its routing
> mechanisms, that the datagram would never get delivered in time.
> Perhaps UDP would be routed differently, using a terrestrial but
> low-bandwidth network, while TCP traffic was directed over a
> high-bandwidth but long-delay satellite path. A gateway mesh might
> have two or more independent routing mechanisms, each using a
> "shortest path" approach, but with different metrics for determining
> "short" - e.g., UDP using the shortest time route, while some TCP
> traffic travelled a route with least ("shortest") usage at the time.
>
> We couldn't do all that either. We needed Time, hardware, algorithms,
> protocols, etc. But the placeholders were there, in the TCP, IP, and
> UDP formats, ready for experimentation to figure all that stuff out.
>
> -----
>
> When Time was implemented, there could be much needed experimentation
> to figure out the right answers. Meanwhile, we had to keep the
> Internet working. By the early 1980s, the ARPANET had been in
> operation for more than a decade, and lots of operational experience
> had accrued. We knew, for example, that things could "go wrong" and
> generate a crisis for the network operators to quickly fix. TTL,
> even as just a hop count, was one mechanism to suppress problems. We
> knew that "routing loops" could occur. TTL would at least prevent
> situations where datagrams circulated forever, orbiting inside the
> Internet until someone discovered and fixed whatever was causing a
> routing loop to keep those datagrams speeding around.
>
> Since the Internet was an Experiment, there were mechanisms put in
> place to help run experiments. IIRC, in general things were put in
> the IP headers when we thought they were important and would be needed
> long after the experimental phase was over - things like TTL, SQ, TOS.
>
> Essentially every field in the IP header, and every type of datagram,
> was there for some good reason, even though its initial implementation
> was known to be inadequate. The Internet was built on Placeholders....
>
> Other mechanisms were put into the "Options" mechanism of the IP
> format. A lot of that was targeted towards supporting experiments,
> or as occasional tools to be used to debug problems in crises during
> Internet operations.
>
> E.g., all of the "Source Routing" mechanisms might be used to route
> traffic in particular paths that the current gateways wouldn't
> otherwise use. An example would be routing voice traffic over
> specific paths, which the normal gateway routing wouldn't use. The
> Voice experimenters could use those mechanisms to try out their ideas
> in a controlled experiment.
>
> Similarly, Source Routing might be used to debug network problems. A
> network analyst might use Source Routing to probe a particular remote
> computer interface, where the regular gateway mechanisms would avoid
> that path.
>
> So a general rule was that IP headers contained important mechanisms,
> often just as placeholders, while Options contained things useful only
> in particular circumstances.
>
> But all of these "original ideas" needed Time. We knew Dave was "on
> it"....
>
> -----
>
> Hopefully this helps... I (and many others) probably should have
> written these "original ideas" down 40 years ago. We did, but I
> suspect all in the form of emails which have now been lost. Sorry
> about that. There was always so much code to write. And we didn't
> have the answers yet to motivate creating RFCs which were viewed as
> more permanent repositories of the solved problems.
>
> Sorry about that.....
>
> Jack Haverty
>
>
>
> On 10/2/22 07:45, Alejandro Acosta via Internet-history wrote:
>> Hello Jack,
>>
>> Thanks a lot for sharing this, as usual, I enjoy this kind of
>> stories :-)
>>
>> Jack/group, just a question regarding this topic. When you mentioned:
>>
>> "This caused a lot of concern about protocol elements such as
>> Time-To-Live, which were temporarily to be implemented purely as "hop
>> counts"
>>
>>
>> Do you mean, the original idea was to really drop the packet at
>> certain time, a *real* Time-To-Live concept?.
>>
>>
>> Thanks,
>>
>> P.S. That's why it was important to change the field's name to hop
>> count in v6 :-)
>>
>>
>>
>> On 2/10/22 12:35 AM, Jack Haverty via Internet-history wrote:
>>> On 10/1/22 16:30, vinton cerf via Internet-history wrote:
>>>> in the New Yorker
>>>>
>>>> https://www.newyorker.com/tech/annals-of-technology/the-thorny-problem-of-keeping-the-internets-time
>>>>
>>>>
>>>> v
>>>
>>> Agree, nice story. Dave did a *lot* of good work. Reading the
>>> article reminded me of the genesis of NTP.
>>>
>>> IIRC....
>>>
>>> Back in the early days circa 1980, Dave was the unabashed tinkerer,
>>> experimenter, and scientist. Like all good scientists, he wanted to
>>> run experiments to explore what the newfangled Internet was doing
>>> and test his theories. To do that required measurements and data.
>>>
>>> At the time, BBN was responsible for the "core gateways" that
>>> provided most of the long-haul Internet connectivity, e.g., between
>>> US west and east coasts and Europe. There were lots of ideas about
>>> how to do things - e.g., strategies for TCP retransmissions,
>>> techniques for maintaining dynamic tables of routing information,
>>> algorithms for dealing with limited bandwidth and memory, and other
>>> such stuff that was all intentionally very loosely defined within
>>> the protocols. The Internet was an Experiment.
>>>
>>> I remember talking with Dave back at the early Internet meetings,
>>> and his fervor to try things out, and his disappointment at the lack
>>> of the core gateway's ability to measure much of anything. In
>>> particular, it was difficult to measure how long things took in the
>>> Internet, since the gateways didn't even have real-time clocks. This
>>> caused a lot of concern about protocol elements such as
>>> Time-To-Live, which were temporarily to be implemented purely as
>>> "hop counts", pending the introduction of some mechanism for
>>> measuring Time into the gateways. (AFAIK, we're still waiting....)
>>>
>>> Curiously, in the pre-Internet days of the ARPANET, the ARPANET IMPs
>>> did have a pretty good mechanism for measuring time, at least
>>> between pairs of IMPs at either end of a communications circuit,
>>> because such circuits ran at specific speeds. So one IMP could
>>> tell how long it was taking to communicate with one of its
>>> neighbors, and used such data to drive the ARPANET internal routing
>>> mechanisms.
>>>
>>> In the Internet, gateways couldn't tell how long it took to send a
>>> datagram over one of its attached networks. The networks of the
>>> day simply didn't make such information available to its "users"
>>> (e.g., a gateway).
>>>
>>> But experiments require data, and labs require instruments to
>>> collect that data, and Dave wanted to test out lots of ideas, and we
>>> (BBN) couldn't offer any hope of such instrumentation in the core
>>> gateways any time soon.
>>>
>>> So Dave built it.
>>>
>>> And that's how NTP got started. IIRC, the rest of us were all just
>>> trying to get the Internet to work at all. Dave was interested in
>>> understanding how and why it worked. So while he built NTP, that
>>> didn't really affect any other projects. Plus most (at least me)
>>> didn't understand how it was possible to get such accurate
>>> synchronization when the delays through the Internet mesh were so
>>> large and variable. (I still don't). But Dave thought it was
>>> possible, and that's why your computer, phone, laptop, or whatever
>>> know what time it is today.
>>>
>>> Dave was responsible for another long-lived element of the
>>> Internet. Dave's experiments were sometimes disruptive to the
>>> "core" Internet that we were tasked to make a reliable 24x7
>>> service. Where Dave The Scientist would say "I wonder what happens
>>> when I do this..." We The Engineers would say "Don't do that!"
>>>
>>> That was the original motivation for creating the notion of
>>> "Autonomous Systems" and EGP - a way to insulate the "core" of the
>>> Internet from the antics of the Fuzzballs. I corralled Eric Rosen
>>> after one such Fuzzball-triggered incident and we sat down and
>>> created ASes, so that we could keep "our" AS running reliably. It
>>> was intended as an interim mechanism until all the experimentation
>>> revealed what should be the best algorithms and protocol features to
>>> put in the next generation, and the Internet Experiment advanced
>>> into a production network service. We defined ASes and EGP to
>>> protect the Internet from Dave's Fuzzball mania.
>>>
>>> AFAIK, that hasn't happened yet ... and from that article, Dave is
>>> still Experimenting..... and The Internet is still an Experiment.
>>>
>>> Fun times,
>>> Jack Haverty
>>>
>
More information about the Internet-history
mailing list