[ih] DARTnet: Evolution of Internet audio and video
Karl Auerbach
karl at iwl.com
Sat Oct 11 14:07:06 PDT 2025
Judy Estrin gathered several of us together in the mid '90s to form
Precept Software. (The "us" consisted of Steve Casner, Chia-Chee Kuan,
Scott Firestone, myself, and others.)
We set forth to do (and we actually did) RTP/RTCP based entertainment
grade (2K/DVD quality) media distribution over IP multicast.
Apart from fighting the troubles of classic multi-source IP multicast
(single-source had not been invented yet), our main difficulty was
getting video and sound synchronized. (We did not try to synchronize
across multiple receiving platforms - so going into our test area was to
see/hear a cacophony of each machine synced with itself but all of the
machines unsynchronized with one another. I find modern speaker
systems, such as the Amazon echo, to be quite amazing in the
synchronization of multiple receivers.)
When doing music or another form of entertainment it is necessary to
distinguish two broad cases: interactive (where human participants
interact with one another, enduring the cross-net latency, and
presentation, where there is no back/forth interaction, at least not in
a conversational/back-forth manner.) The latter form is much easier; the
former can quickly become impossible because our human vision and
hearing are trained by a billion years of evolutionary pressures in a
non-electronic world.
Our great enemy to good conversational use or for synchronized playback
was time, accurate, precise time. And not just accurate and precise,
but also stable ticking of the clock - as the cost of network based
parts (such as an ESP-32) drops the quality of the clock in them tends
to erode. Back when Casner and I were doing this stuff we measured
clock drifts of as much as +/- 5% - that's three minutes over an hour
for each machine, six minutes per hour if the drift between sender and
receiver is 5% in opposing directions.
(Even my semi-pro video and sound gear, without SMPTE sync, can drift by
a video frame time [about 33.4ms] over the course of 30 to 60 minutes.
[I remain impressed at how useful is that most simple of tools - a clap
board.])
For voice one can often stretch or squeeze moments of silence, but for
music that has seriously bad effects.
Grace Hopper used a piece of rope roughly one foot long to illustrate
one nanosecond travel for light in a vacuum.
That same one foot (roughly) piece of rope illustrates one millisecond
of travel of sound through air at sea level. (Theatrical sound system
designers know this quite well - our main performance space has about
75ms sound latency from the stage to the back of the house.)
I have long wanted to talk to some legislature or court about repealing,
or at least increasing, the speed of light.
I have been impressed at the quality of some distributed music stuff -
such as is done by the Playing For Change folks - but they are playing
against a recorded, pre-distributed metronome and the tracks all
combined later.
I have been experimenting with cheap GPS time synch (mostly on Rasberry
Pi's) - each machine runs within a few hundred microseconds of "correct"
time. I am doing this to create a kind of network panopticon that can
be used as both an early warning system for troubles (I am hoping to
adopt some techniques from astronomy) as well as a tool to isolate where
those troubles lay. (Yes, I am aware of RIPE's ATLAS system.)
--karl--
On 10/11/25 9:42 AM, Greg Skinner via Internet-history wrote:
> There were some applications of multicast to distributed music performances in the 1990s, such as the following:
>
> https://www.postel.org/pipermail/end2end-interest/2001-August/001314.html
>
> The paper is available from ResearchGate.
>
> https://www.researchgate.net/publication/339055474_Distributed_Music_A_Foray_into_Networked_Performance
>
> Greg
More information about the Internet-history
mailing list