[ih] The story of BGP?

Fri Feb 8 21:03:14 PST 2013

On Feb 7, 2013, at 10:23 AM, Justine Sherry <justine at eecs.berkeley.edu> wrote:

> Does anyone have any pointers to a summary of this history or
> interesting experiences to share?

Hi Justine,

I'm coming into this a bit late in the conversation, but being a first hand participant wanted to offer my $.02.  Rather than pester the list with a whole lot of individual replies, I'm going to aggregate my replies to all of the comments that have been made to the list so far.  Please see the original messages for correct attributions.

> I believe you'll find a lot of what you want to know in Yakov Rehkter's
> talk on BGP at 18 (http://www.youtube.com/watch?v=_Mn4kKVBdaM).

Seconded.  This is just a primer for everything else, of course.

My part of the story begins in 1991, when I joined cisco and took over maintenance of EGP and BGP.

> This was in addition to EGP-2 suffering under the ever increasing size of the route announcements.  If I recall, there was a lack of incremental updates and EGP-2 relied on IP reassembly of very large fragmented IP datagrams.  A single dropped fragment in practice rendered the entire announcement useless, and I think there were some concerns on how large a package some operating systems were going to be willing to reassemble.  The NSFNET and scores of networks would only add to the pressure of the ever-growing size of the EGP announcements.

This is exactly correct.  

> Was the shutdown of the ARPAnet a big factor?

Absolutely.  The creation of the NSFnet regionals added thousands of prefixes to the routing tables very quickly, causing EGP updates to grow rapidly.  With cisco's implementation IP reassembly hadn't been truly stressed, and EGP uncovered several bugs, including internal buffer sizes that were simply unable to contain the reassembled packets.  These buffer sizes had to be increased several times to keep up with the table growth.

This pain became obvious to everyone, and was coupled with the significant pain of route filtering that had to be used to prevent the looping that has already been discussed.  The operator community was very vocal in their need for something, and BGP at that point was the only real alternative.  As of 1991, cisco's implementation was somewhat immature.  While it largely complied with the letter of the specification, it had numerous structural issues that became apparent with even moderate usage.

The operational community (much credit to smd, asp, roll, vaf, et. al.) began testing the application of BGP-3 by running it in parallel with EGP, in some cases by route redistribution (aka route leaking) and in many cases by tunneling.  This led to very frequent (usually daily) bug reports that caused us to generate very frequent software changes (usually daily).  Many of the structural issues within the implementation were addressed in this cycle.  The hub of this activity was the isp-geeks mailing list, internal to cisco and its customers running these test images.

> I'm not sure where 1994 comes from (that's the date on BGP-4, is that it?),
> but it's wrong.

The transition was more in the '91-'93 window.  The urgency to publish the RFC was far lower than the need to have working code and a working network.

As things stabilized, carriers started to phase BGP into production, usually on a peer-by-peer basis as a replacement for EGP, with redistribution still being used to interconnect with the remainder of routing.  As this process continued, it rapidly encompassed the full set of Tier 1 ISPs.

> Other random thought: CIDR arrived in BGP-4. I remember the transition from BGP-3 to BGP-4 and while strictly speaking not a flag-day, the coexistence of both was intended to be limited because of the difficulty in understanding how classfull and classless announcements would coexist.

CIDR was an outcome of the ROAD discussions.  It became obvious that we needed to be classless and carry prefixes.  Yakov had already worked out the mechanisms and issues with doing this within IDRP, so he dropped that into the BGP spec and we massaged that into BGP-4.  Paul Traina took over the Cisco implementation and did the enhancements for BGP-4.  The primary issue there was all about how to deal with aggregation.  Integration with classfull announcements was obviously an issue, so what we tried to do was to deploy BGP-4 with only classfull prefixes at first.  Once that stabilized and was pervasive, we added prefixes.

> (2) How is the BGP we switched to in 1994 different from the BGP we
> used today, and who drove those changes?

At the bottom line, it's tough to say that we 'switched' to BGP.  As we have been changing the tires on a moving car the whole time, without a single flag day, it has been more a careful process of incremental evolution.

As Yakov describes, there have been extensions to BGP, primarily for 2547.  However, it's hard to say that it is very different.  For better or worse, we've bolted a bag onto the side, but at the heart, BGP is fundamentally unchanged.  While this may seem like we have not made forward progress, I'm actually mostly thankful that we haven't broken things.  The network has become MUCH more conservative in its deployment policies since the early days, and with the increased scrutiny, the returns on major changes will be limited.

> Finally, I do remember in the late 90s and early 2000s a bunch of
> research into CPU and network effects of BGP, specifically:
> 
> * The behaviour of BGP with non-instantaneous route updates, causing
> repetitive route additions/withdraws
> * .. and the BGP dampening stuff, for both announcements and CPU churn

It's probably worth noting that not a lot of that work has made it into production and that it's very likely that we could tweak further to improve convergence and stability.  As noted, this is an implementation and best practices issue and not a protocol issue per-se.

> Also, nowadays we have a much better understanding of how economics is key in
> protocol deployment, of how the short-term cost/benefit ratio is really
> crucial. 

Rather than an economic viewpoint, I view it as a psychology of disaster avoidance.  We (all of humanity) seem to be unwilling to make architectural changes to working systems until they are on the brink of collapse.  (Ref: "Why the Internet only just works"  Mark Handley)  Unfortunately, that's not very good engineering, and without the DARPA mandate or similar leadership, it seems like it's impossible to do better.

> And so the Internet is now stuck with this obsolete 1960s-grade routing
> architecture (in architectural terms, the whole BGP-4/IGP system is really
> not that much advanced over the routing in Baran's original design)…

That's hardly fair.  Where we are is far past what Baran originally described.  Tho it's true that it's far short of where we can and should be.

Regards,
Tony

p.s. All errors above are defects straight from my unrefreshed DRAM, which I take full responsibility for.  If you have further questions, ask now, because tomorrow it might be gone.