[ih] Evolution of Internet audio and video

Mon Sep 29 14:59:43 PDT 2025

On 9/29/25 2:13 PM, Craig Partridge wrote:

>   * How to persuade video to deal with occasional loss. Dave Clark did
>     early outreach to codec experts and said that in response to the
>     question "What do we do if some of your data has to be dropped"
>     were told "Don't.  We're good at compression and if the data could
>     be dropped, we'd have removed it."  As I recall, it was Facebook
>     that led to codecs that could deal with loss?
>
Steve Casner and I worked really hard on these issues.  And because we 
often moved audio and video via different packet streams there was an 
impact from loss/delay/duplication/re-sequencing on one of the streams  
on the other stream.

Many codecs are not friendly to loss or underrunning their input 
buffers.  And with cipher chained (aka block-chained) streams it can get 
harder to pick up the sticks when a packet is lost.

We were working with UDP so we did not have TCP trying to do reliability 
and sequencing.

Some of the issues we faced were "what do we do when we don't have a 
video or audio packet at the time we need to feed it to the rendering 
hardware?"  For audio there was "redundant audio transport", aka "RAT" 
in which the data in packet N was carried in lower quality in packet N+1 
(or N+2).

For video we had to deal with 30 per second freight trains of closely 
spaced large packets.

There were demarcations in the streams about where sound spurts began 
and where video frames ended.  Loss of those packets forced us to 
develop heuristics about how to imply where those packets were and what 
to do about it.

Out of order packets were a bane.

Patching voice/video data is hard because it can create artifacts, 
sometimes unexpected ones, such as synthetic tones when audio was being 
patched (and patched with what - we experimented with silence [doesn't 
work well] or averaging the prior/next [worked better], etc.)

Things are worse these days because of the games that "smart" Ethernet 
NICs play with Ethernet frames - such as combining several small 
Ethernet frames and delivering to the receiving operating system as one 
large (up to 64Kbyte!) ethernet frame. One's software has to approach a 
modern Ethernet NIC with a software sledge hammer to turn off all of the 
"offloads".

All in all the cure for many things was to add delay before rendering 
content.  But that affected conversational uses where, according to the 
ITU we have a round trip budget of only about 140 milliseconds before 
people go into half-duplex/walkie-talkie mode.  I really wanted to get 
my physicist friends to consider increasing the speed of light, but they 
were resistant to the idea.

I began work on a meta stream to carry information about objects in the 
video stream (in order to do fast, set top product placements and such) 
and with scripted morphing in to react to events in the viewer's space.  
(E.g. morph Alan Arkin's eyes onto the source of a viewer gasp, such as 
when he sneaks up on Audrie Hepburn in the film Wait Until Dark.)  This 
was part of my notion about breaking down the 4th wall.  I hypothesized 
a video conferencing system in which each person posted a series of 
photos in a set of patterned poses - then the conference would proceed 
by sending small morphing instructions rather than full images.  One 
could turn a knob to change from "staid English" to "hand waving 
Italian" modes of presentation. (This came out of my work with 
communications with submarines in which voice was converted into 
tokenized words rather than conveyed as voice itself - that saved a lot 
of bandwidth on our 300 bits/second path and the resulting voice was 
much clearer and comprehensible, even if the speaker was synthetic - and 
it was something we suggested to the FCC for air traffic control.  I had 
pieces of these things running, but only small pieces.  it is an area 
that is waiting for further work.)

Tools to test and exercise this stuff were hard to come by.  Jon had 
proposed his "flakeway" and a few years later I built one (operating as 
a malicious Ethernet switch rather than as a router.)  I now sell that, 
or a distant successor, as a product.

         --karl--