[ih] firmware for innovation

Thu Dec 19 08:33:36 PST 2019

On Wed, Dec 18, 2019 at 08:06:53PM -0800, Dave Taht wrote:
> 
> This is a bit OT for internet history, but perhaps, 10 years from now,
> however the dust settles, perhaps this will be a part of it.

well, if there was "internet-future" mailing list ;-))

[...]
agreed.

> At one point comcast was retiring their cisco cmtses - and would rather
> crush them than dumpster them because of CALEA requirements. I wanted
> one of those, bad (they were better than the ARRIS CMTSes in a lot of
> ways). 

Well, if it was not the customer, cisco would have done it for them 
to avoid impating new product sales by them appearing on ebay.

Good companies actually provide documentation to customers which chip
carries which information, so customers can optimize crishing down to
those components (in the absence of large enough crushers for whole 
boxes/boards).

> > Have lost track of DSL chips. In the past they where awfully feature
> > constrained, so no good way to make them do better buffering.
> 
> They don't need anything more than a few lines of code added to keep
> their onboard FIFO buffering down to sanity. They end up using less
> internal memory, as well, for buffering, that can be used more smartly -
> or fit in more code!
> 
> There's a pretty interesting arc vliw cpu in more than a few dsl chips,
> it's totally easy to do this... I'd actually made some progress in
> reverse engineering one chip at one point but rewriting code that
> consists of asm turned into C is something I gave up on after working on
> the game of empire, back in the 80s.. (worse, it's a very weird VLIW)
> 
> At its most primitive its merely sizing the internal ringbuffer
> by the rate....
> 
> numbufs = configured_line_rate_per_sec/possible_interrupts_sec;
> buffer = malloc(numbufs*2048); // padding needed
> 
> that's it. Tracking bytes on the internal ringbuffer instead of just
> packets is vastly better but slightly more code than I care to write in
> an email, and managing high and low watermarks a snap once you do that.

Been there, done that. The idea to avoid having to fix DSL modems or
DSL-modem chips in the router is to do all the right buffering in the
router ("home-gateway"), but there is the missing element of signaling
from modem to router the training rates (up/down) as they do change.

Never managed to get this fixed even when working for the vendor
and submitting it as a bug/feaure request even when routers had internal
DSL modems where the router had access to training rates with management
interface. Oh well, there where tcl scripts to adjust router QoS
policy bit-rate parameters as a workaround.

> (BQL is a bit more complicated than what is needed in firmware
> firmware because it sits in the middle of things, but it's only 50 lines
> of code:
> 
> https://lwn.net/Articles/454390/ and it was the recognition that the
> dynamic range of "packet" limits was far worse than "byte" limits that
> drove it's innovation. Particularly in the TSO age (64-64k "superpackets")
> 
> Instead, in every current dsl or wifi chip there's always a fixed
> allocation for onboard buffering designed to overbuffer at the highest
> rate supported by the hardware at the smallest packet size.

Indeed. Decade long proliferation of packet instead of byte counting at
that level certainly ranks up high in sillyness.

> Alternatively...
> 
> With valid statistics coming off the chip (and driver), it's possible to
> throttle overbuffered firmware, but BOY, is that hard. On the wifi
> front, we finally finished AQL "airtime queue limits" last month (it has
> been shipping commercially for several years out of tree.

As said above, its IMHO not difficult to make the right buffering on the
router instead of the DSL modem because DSL is fairly fixed bitrate
so prediction of in-modem buffering is fairly easy - training rates
and ignore loss of bitrate from FEC corrections (low enough unless you
have such a sucking copper that you have bigger problems than
bufferbloat). WiFi of course has such a high degree of effective
variability of bitrate through retransmissions that i wouldn't count on
that scheme to work well there.

> The difference was 500+ms  of onboard buffering reduced to 10ms (at
> this tested rate, 2+ seconds at lower rates), no loss in throughput.
> 
> https://drive.google.com/drive/folders/14OIuQEHOUiIoNrVnKprj6rBYFNZ0Coif
> 
> I just wish I could find whatever dude, now living under a bridge
> somewhere, that wrote the dsl drivers for everything, and buy him a
> beer. A 6pack later, dsl'd be fixed, at least.

Given how DSL is evolving into another market monopolization 
tool through vectoring i am not that much interested in it anymore.

Besides, just having gotten Gbps FTTH with new flexible ducts 
through the city where fibers can easily be added/pulled/replaced,
maybe we do want ongoing pain with copper for the investments
into fiber to happen more. Alas, most people do only think about
fiber as "unnecessary" more speed and nobody explains their benefits 
as more flexible, longer-term reusable and easier open-accessed
physical infrastructure. 

> One reason why fiber "feels" so much better than other systems is that
> those devices have reasonably small buffers. 
> 
> Sonic fiber's network has about 60ms worth of buffering at 100mbit.

Interesting. Need to wait for a friend at my place to also get FTTH to run useful
tests on that buffering.

> Improvements on this we have made *can* be felt by users - not having to
> ride the sawtooth
> 
> http://www.taht.net/~d/sonic_106_cake_vs_default.png
> 
> is *nice* but compared to the outrages on other edge technologies... plain
> ole fifo'd fiber is enough better.

I am not a big fan of the sawtooth burst creation by lame old end-to-end
CC. No reason for senders not to do rate estimation based shaping to be
friendly to the network. And instead we get aven larger bursts at ABR segment sizes.

> Please note we're talking about two things - I am ranting first about
> overbuffered unfixable binary blob firmware, embedded in systems that
> are otherwise mostly done in software. Once you have got those blobs
> to "1 interrupt's worth of buffering", you can do much fancier things
> like bql and pie/fq_codel/complicated classification on top with even
> the cheapest cpus available today.
> 
> Everything below 1Gbit is fixable if we could kill those blobs.

Well, i do of course like the idea of open source reference solutions
for anything of relevance. I am a bit unclear, why standard bodies
failed to simply mandate appropriate per-hop-behaviors for the
buffering. Broadband Forum ? They where/are doing DSL, right ?

> ? We need firmware on dsl/wifi/fiber/etc running on a (usually pretty
> weak) separate cpu because a central cpu cannot respond to interrupts fast
> enough, so it's a realtime requirement that I don't see going away.

I do like the idea of SmartNICs, but for speeds up to 1 Gbps,
it would be great to avoid the binary blobs they do create. Instead,
ARMs start to have all type of additional, differently speed cores.
Rather allocate one low-speed ARM core with open source PMD code
to (interleaved) service all interface queues.

> A modern 802.11ac chip has over 400 dsps running at nsec resolution on
> it, also.
> 
> But offloaded firmware doesn't need to do more work than the central cpu
> can handle per interrupt it can take. Perhaps this could be an internet
> maxim said another way by someone else?

Well, there is O(packet-rate), but depending on more advanced
QoS requirements, there may be additional O(). But yes, i have little
concern about resource requirments below 1 Gbps speed. It is interesting
for algorithms we're interested in at 100 Gbps or more and feasibility
of certain base algorithms for ASICs. Such as PIFO. But thats of
course way beyond bufferbloat reduction.

> "offloading too much work to a weaker cpu elsewhere is a
> lose, the overhead of interfacing with it is rapidly outstripped over
> time by progress in the main cpu"

Looking at how you design high-speed SW routers via infras like fd.io
gives a good idea what feasible and what not.

> (hey! this is now an internet history question! :) Mainframes, for
>  example DO and did have a lot of co-processors (how many ms of work did
>  they typically offload?, but so far as I know, most early ethernet
>  cards had very little onboard for cpu - they basically had a lot of
>  expensive rf logic and a few filters, but that was it. Yes? Did sun's
>  10Mbit ethernet have anything other than that? The fuzzball?)

Back in the days (80th/90th), there was a lot of pressure for gigantic
packet sizes because especially Cray did not manage to build high
packet-rate I/O, or they just did not care. [ Maybe they where busy
enough trying to figure out how to deal with the inspirational
but misguided approach of putting Unix natively onto their calculator CPU
and killing it with that. ]

Classical 90th MIC chips had a number of buffers that typically
could be configured, forgot the name of the config command in Cisco
IOS, but its probably still there (in/out). I may be wrong but i think to
remember it was defaulted to 7 at some time but could be set as low
as 2 and still get linerate on later CPU routers on all packet sizes.
WOuld be surprised if this does not goes back in design to the 80th NICs from Sun.

[...]
need to spend more time reading up on your URLs.

> There's a fight ongoing in the eu over "the radio directive". I keep
> hoping the various arguments we made during the fcc fight will win this time.
> If not there, then maybe brexit?
> 
> I think the fight with huawei over "security" and trustable code is
> deeply ironic - at one point they offered to make their sources
> available - only to have folk in the us (that want to keep their lagging
> efforts proprietary)

They did give source code to UK Gvt. and had it examined.

> A good way to restore trust in every stack is to open up the code as far
> down as it can go. This, certainly, is one of the things driving the
> risc-v effort.

The core element of risc-v is not to trust nyone else, but build
everything yourself. That works for those who can and know they must.
China/USA. EU can, but is to silly to know they must. Rest of the
world is back to the base probleem you're trying to solve: They can't
so they need to trust someone else and would love to be able to verify.
But most likely they don't want to admit to themselves the situation.
To complex to solve to get votes.

> I agree. Fulcrum (long since absorbed by intel) did amazing things with
> async logic that have not been duplicated since that FPGAS cannot do.
> 
> I'm pretty sure async logic is at the core(s) of most AI chips. There's
> just no way anything can run as cool as these do without it...

Yes. Not enough insight here to know if async is the only mayor
difference. I think there is overall a lot more flexiblity in design on
ASIC. Not being reduced to combine fixed bilding blocks.

I always like to look at the analog computers at CHM down the street ;-)

> > My impression was that SmartNICs are having a good business niche
> > in servers for stack and acceleration, especially for HW adjacent
> > stacks like RoCE and diagnostic/monitoring/clock-synchronization.
> 
> yep, that's why they all got bought up. Now innovation will die.

I thought Intel is doubling down with their new line of SmartNICs,
maybe i am not well informed, just been looking at it fromt the side.

Cheers
    Toerless
-- 
Internet-history mailing list
Internet-history at elists.isoc.org
https://elists.isoc.org/mailman/listinfo/internet-history