[ih] firmware for innovation

Dave Taht via Internet-history internet-history at elists.isoc.org
Wed Dec 18 20:06:53 PST 2019

This is a bit OT for internet history, but perhaps, 10 years from now,
however the dust settles, perhaps this will be a part of it.

"Why the internet failed to become a universal transport for voice,
videoconferencing and gaming traffic and became a slow, unreliable form
of television" - Old Grumpy Guy - ACM-Queue...

"Brought to you by GOOGFACE, your source of all the TruNews NOW",
2029. Payment for this article has been deducted from your account and
your social credit score adjusted downward automatically. Have a very
very nice day, and remember, to keep your consumption of ads up and
cigarettes down!"

Toerless Eckert <tte at cs.fau.de> writes:

> On Wed, Dec 18, 2019 at 12:25:43PM -0800, Dave Taht wrote:
>> > Hmm. Not sure if its really infeasible to get access to and modify 
>> > WiFi firmware. I think to remember at least to have been toold that
>> If you know anyone....
> Alas, only hearsay. And if youu go in to the smallest vendor
> of such chip as a startup pitching them how they could double
> their chip sales through alternative firmware in a non-wifi radio
> market, thats a whole different ballpark then coming in as a
> researcher. 

Wasn't just researchers.

At one point I had the resources of google fiber. No dice on
getting source licenses to the core fiber, moca, and wifi blobs.

No luck in getting core features in there either... underneath dozens of
obfusticating mgmt folk is some poor engineer living under a bridge that
wrote the code in the first place...

I've also worked with various startups and larger firms (on the wifi
front), not even possible to start a negotiation or a price with QCA,
or broadcom.

It makes me worry of course, that part of the reluctance, must be
fear that security holes will be found - or that they are already,
intentionally, there.

At one point comcast was retiring their cisco cmtses - and would rather
crush them than dumpster them because of CALEA requirements. I wanted
one of those, bad (they were better than the ARRIS CMTSes in a lot of

>> > wrote modified or their own firmware for better / non-wifi radio
>> > solution. But of course i don't know what the licensing conditions
>> > where for the API to the chip itself.
>> I have been trying for 8 years to acquire source licenses for firmware
>> for dsl and wifi chips, with no success. I intend to try a lot harder
>> in the coming year, persuing multiple companies, and perhaps, even legal
>> measures and the FCC. The first generation ax chips and drivers are
>> awful - and 5G, worse.
> Have lost track of DSL chips. In the past they where awfully feature
> constrained, so no good way to make them do better buffering.

They don't need anything more than a few lines of code added to keep
their onboard FIFO buffering down to sanity. They end up using less
internal memory, as well, for buffering, that can be used more smartly -
or fit in more code!

There's a pretty interesting arc vliw cpu in more than a few dsl chips,
it's totally easy to do this... I'd actually made some progress in
reverse engineering one chip at one point but rewriting code that
consists of asm turned into C is something I gave up on after working on
the game of empire, back in the 80s.. (worse, it's a very weird VLIW)

At its most primitive its merely sizing the internal ringbuffer
by the rate....

numbufs = configured_line_rate_per_sec/possible_interrupts_sec;
buffer = malloc(numbufs*2048); // padding needed

that's it. Tracking bytes on the internal ringbuffer instead of just
packets is vastly better but slightly more code than I care to write in
an email, and managing high and low watermarks a snap once you do that.

(BQL is a bit more complicated than what is needed in firmware
firmware because it sits in the middle of things, but it's only 50 lines
of code:

https://lwn.net/Articles/454390/ and it was the recognition that the
dynamic range of "packet" limits was far worse than "byte" limits that
drove it's innovation. Particularly in the TSO age (64-64k "superpackets")

Instead, in every current dsl or wifi chip there's always a fixed
allocation for onboard buffering designed to overbuffer at the highest
rate supported by the hardware at the smallest packet size.


With valid statistics coming off the chip (and driver), it's possible to
throttle overbuffered firmware, but BOY, is that hard. On the wifi
front, we finally finished AQL "airtime queue limits" last month (it has
been shipping commercially for several years out of tree.

The difference was 500+ms  of onboard buffering reduced to 10ms (at
this tested rate, 2+ seconds at lower rates), no loss in throughput.


I just wish I could find whatever dude, now living under a bridge
somewhere, that wrote the dsl drivers for everything, and buy him a
beer. A 6pack later, dsl'd be fixed, at least.

>> I have wasted more time trying to come up with shapers that defeat the
>> buffering in cable/dsl/wifi etc - rather than stuff that runs at line
>> rate, merely because I couldn't find anyone at any company that could
>> patch out the buffer allocation scheme for something sane.
> Well, took a long time for folks to understand how to even do ingres
> shaping and why you need it. Not sure that many products already
> do it well.
> Lots of stupid/complex remote BRAS shaping though.

Policing, also.

>> The only sane dsl driver in the world was custom written by free.fr, and
>> they won't let it go. DSLAMs, fuggetabout it.
> Well, the thought it to be critical to their business differentiation,
> hard to blame them for not publishing it.

Free has been very free about "what to do", just not about their code.

They have also tried, and failed, to get source licenses to their fiber
ONT, dslams, etc and other overbuffered gear and are deeply unhappy
about that. 

One reason why fiber "feels" so much better than other systems is that
those devices have reasonably small buffers. 

Sonic fiber's network has about 60ms worth of buffering at 100mbit.
Improvements on this we have made *can* be felt by users - not having to
ride the sawtooth


is *nice* but compared to the outrages on other edge technologies... plain
ole fifo'd fiber is enough better.

Google fiber's "shaped" 5mbit service had 400+ms of FIFO buffering
because, well, see further above....

>> In order to finally finish the bufferbloat work - and get rid of most of
>> the need for expensive cpu shapers in favor of adaquate backpression - 
>> We only need two algorithms in play - less than 1k lines of code -
>> to solve the bufferbloat problem thoroughly - and the simplest, and the
>> only one needed near the hardware - is the bql algorithm, invented in
>> 2012, and universal across linux based ethernet devices today.
> I thought PIE was fairly decent too, but of course i was surrounded by
> that team back in the days it was done, so i do not have no good
> overview of alternatives.

I think pie is decent also! Despite my fondness for fq_codel (which does
win the internets) the relative simplicity of pie made it more suitable
for a direct hardware implementation in switches and hardware routers. 

DOCSIS 3.1 got pie in the uplink direction, and it helps (compared to
250+ms of default buffering, getting 16ms is almost magical). 

Please note we're talking about two things - I am ranting first about
overbuffered unfixable binary blob firmware, embedded in systems that
are otherwise mostly done in software. Once you have got those blobs
to "1 interrupt's worth of buffering", you can do much fancier things
like bql and pie/fq_codel/complicated classification on top with even
the cheapest cpus available today.

Everything below 1Gbit is fixable if we could kill those blobs.

> Ultimately i think the best solution will
> likely be a combination of per-hop AQM and end-to-end mechanisms, and
> so far i have seen proposal mostly to focus ononly one of these two
> control points.

I think highly of the BBR + fq_codel inbound shaping configuration.
BBRv1 had a few misfeatures that fq_codel works beautifly around.

(there may be a paper on this soon)

I am sadly certain at this point that inbound shaping by edge devices is
going to remain a necessity, but I keep hoping some "headend" maker will
get the memo.

These folk are doing a nice job with an inline transparent bridge.

I figure we'll see more of that.

>> and in the firmware... all you need is one interrupt's worth of buffering.
> If you have an algorithm only necessary because you assume to have
> a limited timer resolution timer reaction entity ("interrup level"), then
> i am a bit worried about the long-term necessity of it.

? We need firmware on dsl/wifi/fiber/etc running on a (usually pretty
weak) separate cpu because a central cpu cannot respond to interrupts fast
enough, so it's a realtime requirement that I don't see going away.

A modern 802.11ac chip has over 400 dsps running at nsec resolution on
it, also.

But offloaded firmware doesn't need to do more work than the central cpu
can handle per interrupt it can take. Perhaps this could be an internet
maxim said another way by someone else?

"offloading too much work to a weaker cpu elsewhere is a
lose, the overhead of interfacing with it is rapidly outstripped over
time by progress in the main cpu"

(hey! this is now an internet history question! :) Mainframes, for
 example DO and did have a lot of co-processors (how many ms of work did
 they typically offload?, but so far as I know, most early ethernet
 cards had very little onboard for cpu - they basically had a lot of
 expensive rf logic and a few filters, but that was it. Yes? Did sun's
 10Mbit ethernet have anything other than that? The fuzzball?)

>> > I believe the problem you seem to be referring to is more fundamental
>> > for router/switch forwarding plane (research). Whereas WiFi chips
>> > really are AFAIK mostly general purpose CPU/DSP based, the hardware
>> > of switches / routers often has a much more convoluted model (e.g.: multi-stage).
>> dsl/cell/wifi are all binary blobs, almost universally. the only
>> exception in the wifi world is the much heralded and aging ath9k chip.
> Yes, and regulation to make it harder to violate regional frequency
> requirements does not help innovation.

fought that with vint and a few hundred other folk.


in filing with the fcc:


kind of won, kind of lost. If we hadn't "won" we'd have not been able to
finish the fq_codel research on wifi at all, and wouldn't have got to
the 10s of millions on commercial shipments several years later. The
bufferbloat project would have died, instead of limping along as it does today.

but we lost in not managing to open up the sources more fully on
wifi... or in making clear our mechanisms for updates were better than
locking things down.... for here, for iot, etc.

In retrospect I wish we'd fought harder then to open up the wifi
firmware; it was my first, and I'd hoped, my last, political battle.

Fighting that fight cost me my job. I kind of despaired after that.

There's a fight ongoing in the eu over "the radio directive". I keep
hoping the various arguments we made during the fcc fight will win this time.
If not there, then maybe brexit?

I think the fight with huawei over "security" and trustable code is
deeply ironic - at one point they offered to make their sources
available - only to have folk in the us (that want to keep their lagging
efforts proprietary)

A good way to restore trust in every stack is to open up the code as far
down as it can go. This, certainly, is one of the things driving the
risc-v effort.

>> Most of the designs on these technologies are very insecure and have
>> full access to shared memory.
>> I agree that building switches is harder. Only netfpga has made the
>> tiniest amount of progress there.
> And feasibility in FPGA is not even a good proof for an
> algorithm/approach to be feasible or ideal for custom asics because they
> are sufficiently different from each othrer. Or so i was told by HW engineers.

I agree. Fulcrum (long since absorbed by intel) did amazing things with
async logic that have not been duplicated since that FPGAS cannot do.

I'm pretty sure async logic is at the core(s) of most AI chips. There's
just no way anything can run as cool as these do without it...

... but nobody talks about it.

>> > Abstractions like P4 try to hide so much of those HW programming models
>> > that several researchers i talked to have a very tainted opinion about what
>> > router/switch forwarding hardware could do, and that of course influences also
>> > how research and the industry at large seems to perceive what better
>> > protocol functions would be feasible to support in next-generation 
>> > high end router/switches.
>> Merely trying to get an invsqrt (needed for codel) into P4 has been a
>> failure. P4 is mind-bendingly different and crude compared to the
>> APIs on smarter enthernet cards... and 
> Well, i never meant to imply that P4 would be useful as it stands today
> for the core of any quos mechanisms. Indeed i think it is not, and
> the long-term vision as i think i have heard from Nick is adding a 
> concept of scheduled processing, and thats very difficult to put into
> arbitrary chips.
>> The smarter ethernet card vendors have all been swallowed up of late. 
>> netronome just essentially went under. mellonox was purchased by
>> nvidia. Intel just bought barefoot. There are three switch chips "out
>> there", nobody working on gigE and the switches now appearing on
>> integrated low end silicon are "copy and paste" affairs.
> My impression was that SmartNICs are having a good business niche
> in servers for stack and acceleration, especially for HW adjacent
> stacks like RoCE and diagnostic/monitoring/clock-synchronization.

yep, that's why they all got bought up. Now innovation will die.

> Then of course it would be up to third-parties to add firmware to
> do more than host-stack code on such smart-nics, for forwaders.
> Which brings us ack to the issue we discussed.

I've lost track. :) It's been just pouring rain all day, and my
internet was down... only this backlogged email account works,

>> Another thing I have hope for, though (trying to end on a more
>> encouraging note) is I really like the work on packet pacing going on,
>> where you can offload a packet and a time to send it, to the hw.
> Yes. Well, there are some research papers in the last years pitching
> PIFO/PIEO as a good underlying HW abstraction to enable flexible qos
> in combination of calculation of rank by e.g: P4 or other programmable
> forwarding plane.

Off of here:


Is linked van's mind bending AFAP talk

> Cheers
>     Toerless
Internet-history mailing list
Internet-history at elists.isoc.org

More information about the Internet-history mailing list