RouterBaba RouterBaba

// PRIMER · 12 MIN READ

BGP, from zero.

If you finish this page you will know what BGP is, why it exists, what every line of Lab 01 is doing, and enough vocabulary to read real router configs without panic. About twelve minutes. Drink a coffee, do not skim.

01Why BGP exists

The internet is not one network. It is roughly a hundred thousand separate networks owned by separate organizations, and they have to route packets to each other every microsecond of every day. Your bank, your phone carrier, Cloudflare, an ISP in rural Saskatchewan: each runs its own network with its own equipment, its own engineers, and its own policies about who it will carry traffic for and on what terms.

Inside any one of those networks, the routers cooperate. They trust each other. A protocol like OSPF or IS-IS floods link information across the whole domain and every router builds the same map and computes the same shortest paths. That works because it is one team.

The moment you cross between two networks, that trust evaporates. The neighbouring AS has different goals, different paranoia, different commercial agreements. You cannot flood your link state to them: you would expose your internal topology and they would expose theirs to you, and worst of all the algorithm assumes a shortest path is desirable, when in reality you may want to avoid certain paths because they cost more or go through a competitor.

BGP exists because of this. It is the protocol that connects networks across organizational boundaries. It does not flood. It does not compute shortest paths. It announces reachability one prefix at a time, with a list of who has carried that announcement, and lets each router pick a winner using policy. Every advertisement can be filtered. Every speaker is identified. Every route can be rejected. RFC 4271 (BGP-4), 2006 is the current spec. It is 104 pages of careful paranoia.

// Nerd tip

BGP-3 (RFC 1267) shipped in 1991. BGP-4 (RFC 1771) added CIDR support in 1994 and is the version still in production. The 4-byte ASN extension came later in RFC 6793 (2012). The protocol number is 179/tcp, chosen because the original authors wanted something obviously non-default and non-collision-prone in the unprivileged-friendly IANA range.

02What an Autonomous System is

An Autonomous System (AS) is one administrative domain. One organization, one policy, one operator. From the outside it looks like a single black box that announces some IP prefixes and accepts traffic for them. From the inside it can be one router or ten thousand, doesn't matter, the world only sees the boundary.

Every AS is identified by an ASN, an Autonomous System Number. Originally 16-bit (so 0 to 65535). That ran out, and in 2007 we got 32-bit ASNs (RFC 6793), giving us roughly 4.2 billion of them. Most public ASNs today are 32-bit.

The ranges you should know:

Some real ones to fix in your head: AS15169 is Google. AS32934 is Meta/Facebook. AS13335 is Cloudflare. AS16509 is Amazon. AS7018 is AT&T. The RouterBaba simulator uses private ASNs (65001 through 65012) because we are pretending to be a single Canadian backbone, not 12 different organizations.

The two-layer thing matters. Inside an AS, an IGP (OSPF, IS-IS, EIGRP) figures out paths between routers. Between ASes, BGP figures out reachability across the boundary. They cooperate: BGP often points at next-hops that live inside the AS, and the IGP knows how to actually get there.

// Nerd tip

ASN 23456 is reserved as AS_TRANS. When a 32-bit ASN announcement crosses a router that only speaks 16-bit, the real ASN goes into a separate path attribute and 23456 sits in the regular AS_PATH as a placeholder. This is why you may see 23456 in the wild. It is not a real network. It is the protocol's "I have no idea what to put here" sentinel.

03eBGP vs iBGP

BGP comes in two flavours that look nearly identical on the wire but behave differently. The flavour is decided by one thing only: whether the two BGP speakers are in the same AS or different ASes.

eBGP, between ASes

External BGP. The session crosses an AS boundary, typically over a single physical link between two organizations. eBGP is built on the assumption you do not trust the peer, so:

iBGP, within an AS

Internal BGP. Both speakers are in the same AS. Different rules, because the threat model is different:

The full-mesh problem is solved by Route Reflectors (RFC 4456). A Route Reflector is allowed to re-advertise iBGP-learned routes to its clients, breaking the no-re-advertisement rule in a controlled way. It adds two attributes (ORIGINATOR_ID and CLUSTER_LIST) so the cluster can detect its own loops.

// Nerd tip

In RouterBaba's Canadian backbone, YYC and YYZ are the two route reflectors. The other five sites are clients (YVR, YXY, YZF cluster under YYC; YUL and YFB cluster under YYZ). With 7 sites a true iBGP full mesh would need 21 sessions; the RR design cuts it to 6. The point of the design is not the savings at this size, it is that the same pattern keeps working when there are 700 routers, where a full mesh is unbuildable. The Tier 1 carriers run RR clusters internally for exactly this reason.

04The Finite State Machine

A BGP session does not just exist. It has to be brought up, negotiated, and kept alive. RFC 4271 §8.2.2 defines a finite state machine with six states (we ignore one in practice). Every time you run show ip bgp summary, the right-most column is one of these states.

What each state means

// Nerd tip

The transition from IDLE happens on the BGP_Start event, which on most platforms fires when you run no shutdown on a neighbour or when ConnectRetry expires after a brief reset. There is also a BGP_Stop event that immediately terminates a session and forbids retries. This is the difference between a session that flaps (cycles IDLE → CONNECT → up → IDLE on its own) and one that stays down until you fix it.

05The four message types

Every byte that flows on a BGP session is one of four message types. They share a common 19-byte header (16 bytes of marker, 2 bytes of length, 1 byte of type) and a body whose layout depends on the type.

OPEN · type 1

The handshake. Sent once when entering OPENSENT.

Carries: BGP version (always 4), my ASN, hold time proposal (typically 90s, or 180s on slower links), BGP identifier (router-id, an IPv4), and a list of capabilities negotiated for this session (4-byte ASN support, multi-protocol families, route refresh, etc.).

UPDATE · type 2

The reason BGP exists. Sent whenever there is a route to advertise or withdraw.

Carries: withdrawn routes (prefixes the sender no longer reaches), path attributes (ORIGIN, AS_PATH, NEXT_HOP, MED, LOCAL_PREF, COMMUNITIES, and dozens more), and NLRI (the prefixes being announced and inheriting those attributes).

KEEPALIVE · type 4

"Still here." 19 bytes total, header only, no body.

Sent every hold_time / 3 seconds (so 30s when hold=90). Receiver resets its hold timer on every message, including UPDATEs, but on a quiet session, KEEPALIVEs are what keep it alive. Drop too many and the peer declares you dead.

NOTIFICATION · type 3

Something is wrong, the session is being torn down. Sent before the close.

Carries an error code and subcode. Common ones: code 2 subcode 2 = "Bad Peer AS" (your ASN didn't match my remote-as), code 4 subcode 0 = "Hold Timer Expired", code 6 subcode 0 = "Cease" (admin shutdown).

// Nerd tip

In Slice 1 of RouterBaba's Protocol Microscope, every packet that flows on a link is clickable. Click a pulse on the map, the sim pauses, and you see the layered decode of that exact message: Ethernet, IPv4, TCP, BGP header, BGP body. The bytes are synthesized from sim state, not from a wire capture, but the field structure follows the RFC. Worth a try after you finish Lab 01.

06Best-path selection

A router can learn the same prefix from multiple neighbours. 10.0.0.0/24 might come from your eBGP peer in Calgary, your iBGP peer in Toronto, and your customer in Vancouver, all at the same time. BGP picks exactly one as best, installs that one in the routing table, and uses it for forwarding. The losers stay in the adjacency-RIB-In as backup but do not affect traffic.

The selection rules run in strict order. The first rule that finds a difference between two routes decides the winner; later rules are skipped. Memorize the order or get fooled in production.

  1. Locally originated wins. If you typed network 10.0.0.0/24 here, your own route always beats one learned from anyone.
  2. Highest LOCAL_PREF wins. An iBGP-internal attribute set by inbound policy. Higher = preferred. Used to pick a primary upstream.
  3. Shortest AS_PATH wins. The most natural BGP signal. Fewer ASes in the path = more direct.
  4. Lowest ORIGIN type wins. IGP < EGP < Incomplete. Routes from network beat redistributed routes.
  5. Lowest MED wins, but only when both routes are from the same neighbour AS. The peer is suggesting a preferred entry point. RFC explicitly forbids comparing MEDs across different neighbour ASes.
  6. eBGP wins over iBGP. External info beats internal echo of external info.
  7. Lowest router-id of the advertising peer wins. Pure tiebreaker. Determinism over fairness.

Worked example

Three candidate routes for 10.0.0.0/24 arrive at a router. Walk the rules.

Source AS_PATH LOCAL_PREF ORIGIN MED Type Router-ID
A · neighbour 10.255.1.2 65002 65010 100 IGP 50 eBGP 10.0.2.1
B · neighbour 10.255.7.2 65003 65010 200 IGP 0 eBGP 10.0.3.1
C · neighbour 10.255.4.2 65007 65010 200 IGP 10 eBGP 10.0.4.1
// Rule 1 · locally originated

None of these are local. Move on.

// Rule 2 · highest LOCAL_PREF

A has 100. B and C both have 200. A is eliminated. Move on with B and C.

// Rule 3 · shortest AS_PATH

Both B and C have a path length of 2. Move on.

// Rule 4 · lowest ORIGIN type

Both are IGP. Move on.

// Rule 5 · lowest MED, same neighbour AS

B comes from AS 65003, C from AS 65007. Different neighbour ASes. The MED comparison is skipped. Move on.

// Rule 6 · eBGP over iBGP

Both eBGP. Move on.

// Rule 7 · lowest advertising router-id

B's router-id is 10.0.3.1. C's is 10.0.4.1. B wins.

ResultBest pathDecided by
10.0.0.0/24 via 10.255.7.2 (B) Rule 7. lowest router-id
// Nerd tip

The trick that bites real engineers: rule 5 (MED) silently skipping when neighbour ASes differ. Operators routinely set MED to influence inbound traffic from a single peer's two links and then wonder why MED is ignored when the comparison spans two different upstreams. It is not a bug. The RFC says MED is only meaningful within an adjacent AS's view. Cross-AS MED comparison was added later as a vendor knob (bgp always-compare-med) and turning it on can cause oscillation. Use with intent.

// READY

Now go fix one.

You know what BGP is, what an AS is, why eBGP and iBGP behave differently, what the FSM is doing in every column of show ip bgp summary, what each message type carries, and how a router picks one path out of many. That is the ground floor.

Lab 01 puts you on a Canadian backbone with one broken session. Same protocol, same commands, same FSM. Seven cities, six watching. Open the lab and fix it.

Open Lab 01: First Hello →