[ih] Evolution of Internet audio and video

the keyboard of geoff goodfellow geoff at iconia.com
Mon Sep 29 19:37:48 PDT 2025


after returning to sf the bay area in '04 -- from living in prague -- paul
vixie -- then the head of ISC.ORG -- brought yours truly on for a project
that was known as the Multicasting Deployment Effort (MDE) to, well, get
multicasting widely deployed and used on the internet.

four things stymied that effort:

#1.) level 2 switching "imbalances" when you have a fire hose of feed going
into a garden hose of a spigot at routing junctions/distribution points

#2.) the effect multicasting would have on "peering arrangements" traffic
flowing accords

the next two came out of a conversation yours truly had with the CEO of
LIVE365.COM at the time that touts itself as "The world's audio. Every
station is made by a real human."

#3.) the cost of the streaming bandwidth for each listener was not their
highest expense/hassle, but rather the music royalties of the
RIAA/SoundExchange/ASCAP/BMI/SECAM/et al + the "onerous" reporting and
accounting requirements that necessitated keeping track of how many
listeners each song was streamed to/heard by (unlike with radio
broadcasting and multicasting).

#4.) by serving listeners with individual "monocasting" streams LIVE365 was
able to collect some demographic information about each listener knowing
who they are and thusly being able to tailor and individually target/direct
better and specific kinds of ads to them.

when all of the above was all taken together in consideration and
"realized" -- the MDE effort was summarily abandoned/closed (and yours
truly subsequently became a music director + system and network janitor
burping and diapring the streaming infrastructure as well as an on-air DJ
at KZSU 90.1 FM).

geoff

On Mon, Sep 29, 2025 at 6:51 PM Karl Auerbach via Internet-history <
internet-history at elists.isoc.org> wrote:

> Today's Internet multimedia (which mostly means videos and Zoom-like
> conferences) does not work all that well.
>
> Original IP multicast did not work well in a multi-administrative
> environment.  There was a lot of problems when the distribution tree
> cross administrative boundaries, and there was pressure for carriers to
> hot-potato the traffic onto another provider.
>
> (One day at Precept I was installing a new Cisco router - a small one, a
> 2514 - and it was not yet configured with its addresses. But the MBone
> DVMRP routing found it and started to send the entire MBONE traffic over
> our T-1 link while our poor router tried to scream "prune", "prune!",
> "PRUNE!!!" but could not be heard because those prune messages could not
> be sent because the unicast routing had not yet been configured.  It was
> like when one of Dave Mills PDP-11/03 boxes became the destination for
> all destinations on the net.)
>
> That problem was significantly reduced when one of Dave Cheriton's
> students came up with the idea to do single-source multicast.  This
> changed the original multiple-source IP multicast into something far
> more manageable and stable.  But I have not seen it used much in
> commercial products, nor do I know how well supported it is in routers
> and edge devices.
>
> Single source works well for presentation-style audio/video or for
> systems in which every participant feeds into a mixing engine that
> resolves things like data formats (Zoom does this, but it uses direct
> TCP connections to deliver the mixed content to the users.)
>
> One of the difficulties is when there is a mix of well provisioned
> user/clients and some poorly provisioned ones.  The question becomes
> "who waits?"  (And there is that problem of one stream, e.g. voice,
> being out of sync with visual pointers such as someone pointing to a map
> and saying "we meet here at dawn".)
>
> Present day conferencing works because most conferences have relatively
> few users.  When Steve Casner and I did the Precept RTP stack we went
> full bore and did the support for large client populations - that
> required a lot of code to back down on the client feedback in order to
> avoid packet implosions crushing the sources.  (That was hard code to
> test!)
>
> And things like Netflix and other streaming work basically because we
> are willing to dedicate a lot of bandwidth to those streams and to
> distribute the origination of the traffic across many, often widely
> dispersed, servers.
>
> (I notice that the net on this very day seems to be having stuttering
> problems - I've observed it from many points of view, including
> point-of-sale devices.  So it does seem that our assumption of plenty of
> bandwidth may at times be optimistic.)
>
> One of the interesting variations on IP multicast, variations that never
> took off, were things like multicast file transfers or multicast,
> reliable open-ended streams.  Multicast allowed expanding ring (TTL
> based) searches of "nearby" other recipients in order to obtain copies
> of lost data.  It was kinda cool (but did create security issues
> regarding malicious introduction of modified data.)
>
> Given that broadcast TV and radio are slowly dying (and their RF bands
> snapped up for other purposes) we may need to revisit how we do Internet
> multimedia.
>
>          --karl--
>
>
> On 9/29/25 6:21 PM, Jack Haverty via Internet-history wrote:
> > FYI, multimedia was on the Internet radar very early.  It was probably
> > the most important driving force for the evolution of TCP2 to become
> > TCP/IP4 in the late 1970s and early 1980s.
> >
> > Shortly after I got the assignment to implement the first TCP for Unix
> > in 1977, I started attending Internet meetings.  At one of the early
> > ones, I remember Vint describing a bunch of "scenarios" that the
> > Internet was expected to handle.  One was especially memorable. It
> > involved a teleconference with a group of military officers, located
> > over a broad geographic area including some perhaps in the Pentagon or
> > regional command centers, and others far away in jeeps or tanks, or
> > even helicopters, in action on a battlefield.
> >
> > The gist of the teleconference was to collect information about what
> > was happening, make decisions, and issue orders to the field units. At
> > the time, video was not even a dream, but it was deemed feasible even
> > in the near term to use the multimedia then available.  For example,
> > everyone might have some kind of display device, enabling them all to
> > see the same map or graphic.  A pointer device would allow anyone
> > while speaking to point to the graphic and everyone else would see the
> > same motions on their displays.  The teleconference would be conducted
> > by voice, which of course had to be interactive.  It also had to be
> > synchronized with the graphics, so that orders like "Move your
> > battalion here; we're going to bomb over here." didn't cause serious
> > problems if transmission delays were happening in the Internet and the
> > voice and graphics became unsynchronized.
> >
> > Such scenarios drove the thinking about what the Internet technology
> > had to be able to do.  It led to a consensus that the virtual
> > connection service of TCP2 was insufficient, due to its likelihood of
> > delays that would disrupt interactive voice.  In addition, the
> > consensus was that multiple types of service should be provided by the
> > Internet.  One type might be appropriate for interactive voice, where
> > getting as much data delivered as possible was more important than
> > getting all the data delivered eventually.  Similarly, large data
> > transfers, such as high-resolution graphics, could be delivered
> > intact, but it was less important that they arrive within milliseconds.
> >
> > That led to the split of TCP into TCP and IP, and the introduction of
> > UDP as a possible vehicle for carrying interactive content with a need
> > for low latency.  In addition, it might be useful for different types
> > of traffic to follow different routes through the Internet.
> > Interactive traffic might use a terrestrial route, where bulk traffic
> > such as graphics might travel through long-delay, but high bandwidth,
> > geosynchronous satellite networks.  The TOS field was added to the IP
> > header so that a teleconferencing program could tell the Internet how
> > to handle its traffic.
> >
> > TCP/IP4 created an experimental environment where such approaches
> > could be tried.  Various researchers used to come to the Internet
> > meetings to report on their experiments and lobby for new mechanisms.
> > (I recall Steve Casner and Jim Forgie as being frequent attendees with
> > those interests).   Experimentation later produced the MBONE, with
> > multicast which helped reduce the traffic loads through the
> > Internet.   MBONE seems to have faded away over the years, and various
> > "silos" of proprietary teleconferencing mechanisms have popped up to
> > provide such functionality, but unfortunately seem to have done so in
> > a non-interoperable way.
> >
> > Today, I use teleconferencing with Zoom, Facetime, and several
> > others.   There seems to be a lot of choices.   It seems to work
> > pretty well, at least for my personal scenarios.  But a few years ago
> > I was asked to give a presentation over the Internet to a conference
> > halfway around the planet, and we decided that it was too risky to
> > count on that Internet path being good enough at the scheduled time.
> > So we prerecorded the presentation and transferred it via FTP well
> > ahead of time.   Perhaps it would have worked, but we couldn't be
> > confident.
> >
> > Recently I heard anecdotal reports that the Internet on cruise ships
> > works well - but is reliable only when the ship is far out to sea.
> > When it's in port, or even just approaching port, teleconferencing is
> > unreliable.   My speculation is that traffic loads when near a port
> > include all the land-based users and the network may be overwhelmed.
> > But that's just speculation, I have no data.
> >
> > So I wonder - is the multimedia on the Internet problem now solved?
> > As near as I can tell, the Internet today only provides one type of
> > service, with all datagrams following the same route. Did the
> > introduction of fiber make the concerns of the 1980s moot? Does
> > teleconferencing now work well throughout the Internet?  Do users
> > simply abandon the idea of using the Internet for teleconferencing
> > when they discover it doesn't work for them (as I did for my
> > presentation)?   Does the military now do what the 1970s scenarios
> > envisioned over the Internet?
> >
> > How did multimedia on the Internet evolve over the last 45+ years?
> >
> > Jack Haverty
> >
> >
> > On 9/29/25 14:59, Karl Auerbach via Internet-history wrote:
> >> On 9/29/25 2:13 PM, Craig Partridge wrote:
> >>
> >>>   * How to persuade video to deal with occasional loss. Dave Clark did
> >>>     early outreach to codec experts and said that in response to the
> >>>     question "What do we do if some of your data has to be dropped"
> >>>     were told "Don't.  We're good at compression and if the data could
> >>>     be dropped, we'd have removed it."  As I recall, it was Facebook
> >>>     that led to codecs that could deal with loss?
> >>>
> >> Steve Casner and I worked really hard on these issues.  And because
> >> we often moved audio and video via different packet streams there was
> >> an impact from loss/delay/duplication/re-sequencing on one of the
> >> streams  on the other stream.
> >>
> >> Many codecs are not friendly to loss or underrunning their input
> >> buffers.  And with cipher chained (aka block-chained) streams it can
> >> get harder to pick up the sticks when a packet is lost.
> >>
> >> We were working with UDP so we did not have TCP trying to do
> >> reliability and sequencing.
> >>
> >> Some of the issues we faced were "what do we do when we don't have a
> >> video or audio packet at the time we need to feed it to the rendering
> >> hardware?"  For audio there was "redundant audio transport", aka
> >> "RAT" in which the data in packet N was carried in lower quality in
> >> packet N+1 (or N+2).
> >>
> >> For video we had to deal with 30 per second freight trains of closely
> >> spaced large packets.
> >>
> >> There were demarcations in the streams about where sound spurts began
> >> and where video frames ended.  Loss of those packets forced us to
> >> develop heuristics about how to imply where those packets were and
> >> what to do about it.
> >>
> >> Out of order packets were a bane.
> >>
> >> Patching voice/video data is hard because it can create artifacts,
> >> sometimes unexpected ones, such as synthetic tones when audio was
> >> being patched (and patched with what - we experimented with silence
> >> [doesn't work well] or averaging the prior/next [worked better], etc.)
> >>
> >> Things are worse these days because of the games that "smart"
> >> Ethernet NICs play with Ethernet frames - such as combining several
> >> small Ethernet frames and delivering to the receiving operating
> >> system as one large (up to 64Kbyte!) ethernet frame. One's software
> >> has to approach a modern Ethernet NIC with a software sledge hammer
> >> to turn off all of the "offloads".
> >>
> >> All in all the cure for many things was to add delay before rendering
> >> content.  But that affected conversational uses where, according to
> >> the ITU we have a round trip budget of only about 140 milliseconds
> >> before people go into half-duplex/walkie-talkie mode.  I really
> >> wanted to get my physicist friends to consider increasing the speed
> >> of light, but they were resistant to the idea.
> >>
> >> I began work on a meta stream to carry information about objects in
> >> the video stream (in order to do fast, set top product placements and
> >> such) and with scripted morphing in to react to events in the
> >> viewer's space.  (E.g. morph Alan Arkin's eyes onto the source of a
> >> viewer gasp, such as when he sneaks up on Audrie Hepburn in the film
> >> Wait Until Dark.)  This was part of my notion about breaking down the
> >> 4th wall.  I hypothesized a video conferencing system in which each
> >> person posted a series of photos in a set of patterned poses - then
> >> the conference would proceed by sending small morphing instructions
> >> rather than full images.  One could turn a knob to change from "staid
> >> English" to "hand waving Italian" modes of presentation. (This came
> >> out of my work with communications with submarines in which voice was
> >> converted into tokenized words rather than conveyed as voice itself -
> >> that saved a lot of bandwidth on our 300 bits/second path and the
> >> resulting voice was much clearer and comprehensible, even if the
> >> speaker was synthetic - and it was something we suggested to the FCC
> >> for air traffic control.  I had pieces of these things running, but
> >> only small pieces.  it is an area that is waiting for further work.)
> >>
> >> Tools to test and exercise this stuff were hard to come by.  Jon had
> >> proposed his "flakeway" and a few years later I built one (operating
> >> as a malicious Ethernet switch rather than as a router.)  I now sell
> >> that, or a distant successor, as a product.
> >>
> >>         --karl--
> >>
> >
> >
> --
> Internet-history mailing list
> Internet-history at elists.isoc.org
> https://elists.isoc.org/mailman/listinfo/internet-history
> -
> Unsubscribe:
> https://app.smartsheet.com/b/form/9b6ef0621638436ab0a9b23cb0668b0b?The%20list%20to%20be%20unsubscribed%20from=Internet-history
>
>

-- 
Geoff.Goodfellow at iconia.com
living as The Truth is True


More information about the Internet-history mailing list