[ih] DARTnet: Evolution of Internet audio and video

Karl Auerbach karl at iwl.com
Sat Oct 11 14:07:06 PDT 2025


Judy Estrin gathered several of us together in the mid '90s to form 
Precept Software.  (The "us" consisted of Steve Casner, Chia-Chee Kuan, 
Scott Firestone, myself, and others.)

We set forth to do (and we actually did) RTP/RTCP based entertainment 
grade (2K/DVD quality) media distribution over IP multicast.

Apart from fighting the troubles of classic multi-source IP multicast 
(single-source had not been invented yet), our main difficulty was 
getting video and sound synchronized.  (We did not try to synchronize 
across multiple receiving platforms - so going into our test area was to 
see/hear a cacophony of each machine synced with itself but all of the 
machines unsynchronized with one another.  I find modern speaker 
systems, such as the Amazon echo, to be quite amazing in the 
synchronization of multiple receivers.)

When doing music or another form of entertainment it is necessary to 
distinguish two broad cases: interactive (where human participants 
interact with one another, enduring the cross-net latency, and 
presentation, where there is no back/forth interaction, at least not in 
a conversational/back-forth manner.) The latter form is much easier; the 
former can quickly become impossible because our human vision and 
hearing are trained by a billion years of evolutionary pressures in a 
non-electronic world.

Our great enemy to good conversational use or for synchronized playback 
was time, accurate, precise time.  And not just accurate and precise, 
but also stable ticking of the clock - as the cost of network based 
parts (such as an ESP-32) drops the quality of the clock in them tends 
to erode.  Back when Casner and I were doing this stuff we measured 
clock drifts of as much as +/- 5% - that's three minutes over an hour 
for each machine, six minutes per hour if the drift between sender and 
receiver is 5% in opposing directions.

(Even my semi-pro video and sound gear, without SMPTE sync, can drift by 
a video frame time [about 33.4ms] over the course of 30 to 60 minutes. 
[I remain impressed at how useful is that most simple of tools - a clap 
board.])

For voice one can often stretch or squeeze moments of silence, but for 
music that has seriously bad effects.

Grace Hopper used a piece of rope roughly one foot long to illustrate 
one nanosecond travel for light in a vacuum.

That same one foot (roughly) piece of rope illustrates one millisecond 
of travel of sound through air at sea level. (Theatrical sound system 
designers know this quite well - our main performance space has about 
75ms sound latency from the stage to the back of the house.)

I have long wanted to talk to some legislature or court about repealing, 
or at least increasing, the speed of light.

I have been impressed at the quality of some distributed music stuff - 
such as is done by the Playing For Change folks - but they are playing 
against a recorded, pre-distributed metronome and the tracks all 
combined later.

I have been experimenting with cheap GPS time synch (mostly on Rasberry 
Pi's) - each machine runs within a few hundred microseconds of "correct" 
time.  I am doing this to create a kind of network panopticon that can 
be used as both an early warning system for troubles (I am hoping to 
adopt some techniques from astronomy) as well as a tool to isolate where 
those troubles lay. (Yes, I am aware of RIPE's ATLAS system.)

         --karl--

On 10/11/25 9:42 AM, Greg Skinner via Internet-history wrote:
> There were some applications of multicast to distributed music performances in the 1990s, such as the following:
>
> https://www.postel.org/pipermail/end2end-interest/2001-August/001314.html
>
> The paper is available from ResearchGate.
>
> https://www.researchgate.net/publication/339055474_Distributed_Music_A_Foray_into_Networked_Performance
>
> Greg


More information about the Internet-history mailing list