[ih] Evolution of Internet audio and video
Karl Auerbach
karl at iwl.com
Mon Sep 29 14:59:43 PDT 2025
On 9/29/25 2:13 PM, Craig Partridge wrote:
> * How to persuade video to deal with occasional loss. Dave Clark did
> early outreach to codec experts and said that in response to the
> question "What do we do if some of your data has to be dropped"
> were told "Don't. We're good at compression and if the data could
> be dropped, we'd have removed it." As I recall, it was Facebook
> that led to codecs that could deal with loss?
>
Steve Casner and I worked really hard on these issues. And because we
often moved audio and video via different packet streams there was an
impact from loss/delay/duplication/re-sequencing on one of the streams
on the other stream.
Many codecs are not friendly to loss or underrunning their input
buffers. And with cipher chained (aka block-chained) streams it can get
harder to pick up the sticks when a packet is lost.
We were working with UDP so we did not have TCP trying to do reliability
and sequencing.
Some of the issues we faced were "what do we do when we don't have a
video or audio packet at the time we need to feed it to the rendering
hardware?" For audio there was "redundant audio transport", aka "RAT"
in which the data in packet N was carried in lower quality in packet N+1
(or N+2).
For video we had to deal with 30 per second freight trains of closely
spaced large packets.
There were demarcations in the streams about where sound spurts began
and where video frames ended. Loss of those packets forced us to
develop heuristics about how to imply where those packets were and what
to do about it.
Out of order packets were a bane.
Patching voice/video data is hard because it can create artifacts,
sometimes unexpected ones, such as synthetic tones when audio was being
patched (and patched with what - we experimented with silence [doesn't
work well] or averaging the prior/next [worked better], etc.)
Things are worse these days because of the games that "smart" Ethernet
NICs play with Ethernet frames - such as combining several small
Ethernet frames and delivering to the receiving operating system as one
large (up to 64Kbyte!) ethernet frame. One's software has to approach a
modern Ethernet NIC with a software sledge hammer to turn off all of the
"offloads".
All in all the cure for many things was to add delay before rendering
content. But that affected conversational uses where, according to the
ITU we have a round trip budget of only about 140 milliseconds before
people go into half-duplex/walkie-talkie mode. I really wanted to get
my physicist friends to consider increasing the speed of light, but they
were resistant to the idea.
I began work on a meta stream to carry information about objects in the
video stream (in order to do fast, set top product placements and such)
and with scripted morphing in to react to events in the viewer's space.
(E.g. morph Alan Arkin's eyes onto the source of a viewer gasp, such as
when he sneaks up on Audrie Hepburn in the film Wait Until Dark.) This
was part of my notion about breaking down the 4th wall. I hypothesized
a video conferencing system in which each person posted a series of
photos in a set of patterned poses - then the conference would proceed
by sending small morphing instructions rather than full images. One
could turn a knob to change from "staid English" to "hand waving
Italian" modes of presentation. (This came out of my work with
communications with submarines in which voice was converted into
tokenized words rather than conveyed as voice itself - that saved a lot
of bandwidth on our 300 bits/second path and the resulting voice was
much clearer and comprehensible, even if the speaker was synthetic - and
it was something we suggested to the FCC for air traffic control. I had
pieces of these things running, but only small pieces. it is an area
that is waiting for further work.)
Tools to test and exercise this stuff were hard to come by. Jon had
proposed his "flakeway" and a few years later I built one (operating as
a malicious Ethernet switch rather than as a router.) I now sell that,
or a distant successor, as a product.
--karl--
More information about the Internet-history
mailing list