X-Git-Url: https://git.sommitrealweird.co.uk/quagga-debian.git/blobdiff_plain/191fe7a34582876de01d3e62c2a6587baf59a283..064d9c633233495319bcaa66335ea3d24c0bd7a1:/doc/next-hop-tracking.txt diff --git a/doc/next-hop-tracking.txt b/doc/next-hop-tracking.txt deleted file mode 100644 index d157866..0000000 --- a/doc/next-hop-tracking.txt +++ /dev/null @@ -1,326 +0,0 @@ -0. Introduction - -This is the design specification for next hop tracking feature in -Quagga. - -1. Background - -Recursive routes are of the form: - - p/m --> n - [Ex: 1.1.0.0/16 --> 2.2.2.2] - -where 'n' itself is resolved through another route as follows: - - p2/m --> h, interface - [Ex: 2.2.2.0/24 --> 3.3.3.3, eth0] - -Usually, BGP routes are recursive in nature and BGP nexthops get -resolved through an IGP route. IGP usually adds its routes pointing to -an interface (these are called non-recursive routes). - -When BGP receives a recursive route from a peer, it needs to validate -the nexthop. The path is marked valid or invalid based on the -reachability status of the nexthop. Nexthop validation is also -important for BGP decision process as the metric to reach the nexthop -is a parameter to best path selection process. - -As it goes with routing, this is a dynamic process. Route to the -nexthop can change. The nexthop can become unreachable or -reachable. In the current BGP implementation, the nexthop validation -is done periodically in the scanner run. The default scanner run -interval is one minute. Every minute, the scanner task walks the -entire BGP table. It checks the validity of each nexthop with Zebra -(the routing table manager) through a request and response message -exchange between BGP and Zebra process. BGP process is blocked for -that duration. The mechanism has two major drawbacks: - -(1) The scanner task runs to completion. That can potentially starve - the other tasks for long periods of time, based on the BGP table - size and number of nexthops. - -(2) Convergence around routing changes that affect the nexthops can be - long (around a minute with the default intervals). The interval - can be shortened to achieve faster reaction time, but it makes the - first problem worse, with the scanner task consuming most of the - CPU resources. - -"Next hop tracking" feature makes this process event-driven. It -eliminates periodic nexthop validation and introduces an asynchronous -communication path between BGP and Zebra for route change notifications -that can then be acted upon. - -2. Goal - -Stating the obvious, the main goal is to remove the two limitations we -discussed in the previous section. The goals, in a constructive tone, -are the following: - -- fairness: the scanner run should not consume an unjustly high amount - of CPU time. This should give an overall good performance and - response time to other events (route changes, session events, - IO/user interface). - -- convergence: BGP must react to nexthop changes instantly and provide - sub-second convergence. This may involve diverting the routes from - one nexthop to another. - -3. Overview of the changes - -The changes are in both BGP and Zebra modules. The short summary is -the following: - -- Zebra implements a registration mechanism by which clients can - register for next hop notification. Consequently, it maintains a - separate table, per (VRF, AF) pair, of next hops and interested - client-list per next hop. - -- When the main routing table changes in Zebra, it evaluates the next - hop table: for each next hop, it checks if the route table - modifications have changed its state. If so, it notifies the - interested clients. - -- BGP is one such client. It registers the next hops corresponding to - all of its received routes/paths. It also threads the paths against - each nexthop structure. - -- When BGP receives a next hop notification from Zebra, it walks the - corresponding path list. It makes them valid or invalid depending - on the next hop notification. It then re-computes best path for the - corresponding destination. This may result in re-announcing those - destinations to peers. - -4. Design - -4.1. Modules - -The core design introduces an "nht" (next hop tracking) module in BGP -and "rnh" (recursive nexthop) module in Zebra. The "nht" module -provides the following APIs: - -bgp_find_or_add_nexthop() : find or add a nexthop in BGP nexthop table -bgp_find_nexthop() : find a nexthop in BGP nexthop table -bgp_parse_nexthop_update() : parse a nexthop update message coming - from zebra - -The "rnh" module provides the following APIs: - -zebra_add_rnh() : add a recursive nexthop -zebra_delete_rnh() : delete a recursive nexthop -zebra_lookup_rnh() : lookup a recursive nexthop - -zebra_add_rnh_client() : register a client for nexthop notifications - against a recursive nexthop - -zebra_remove_rnh_client(): remove the client registration for a - recursive nexthop - -zebra_evaluate_rnh_table(): (re)evaluate the recursive nexthop table - (most probably because the main routing - table has changed). - -zebra_cleanup_rnh_client(): Cleanup a client from the "rnh" module - data structures (most probably because the - client is going away). - -4.2. Control flow - -The next hop registration control flow is the following: - -<==== BGP Process ====>|<==== Zebra Process ====> - | -receive module nht module | zserv module rnh module ----------------------------------------------------------------------- - | | | -bgp_update_ | | | - main() | bgp_find_or_add_ | | - | nexthop() | | - | | | - | | zserv_nexthop_ | - | | register() | - | | | zebra_add_rnh() - | | | - - -The next hop notification control flow is the following: - -<==== Zebra Process ====>|<==== BGP Process ====> - | -rib module rnh module | zebra module nht module ----------------------------------------------------------------------- - | | | -meta_queue_ | | | - process() | zebra_evaluate_ | | - | rnh_table() | | - | | | - | | bgp_read_nexthop_ | - | | update() | - | | | bgp_parse_ - | | | nexthop_update() - | | | - - -4.3. zclient message format - -ZEBRA_NEXTHOP_REGISTER and ZEBRA_NEXTHOP_UNREGISTER messages are -encoded in the following way: - -/* - * 0 1 2 3 - * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * | AF | prefix len | - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * . Nexthop prefix . - * . . - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * . . - * . . - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * | AF | prefix len | - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * . Nexthop prefix . - * . . - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - */ - -ZEBRA_NEXTHOP_UPDATE message is encoded as follows: - -/* - * 0 1 2 3 - * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * | AF | prefix len | - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * . Nexthop prefix getting resolved . - * . . - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * | metric | - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * | #nexthops | - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * | nexthop type | - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * . resolving Nexthop details . - * . . - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * . . - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * | nexthop type | - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * . resolving Nexthop details . - * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - */ - -4.4. BGP data structure - -Legend: - -/\ struct bgp_node: a BGP destination/route/prefix -\/ - -[ ] struct bgp_info: a BGP path (e.g. route received from a peer) - - _ -(_) struct bgp_nexthop_cache: a BGP nexthop - - - - /\ NULL - \/--+ ^ - | : - +--[ ]--[ ]--[ ]--> NULL - /\ : - \/--+ : - | : - +--[ ]--[ ]--> NULL - : - _ : - (_)............. - - -4.5. Zebra data structure - -rnh table: - - O - / \ - O O - / \ - O O - - struct rnh - { - u_char flags; - struct rib *state; - struct list *client_list; - struct route_node *node; - }; - -5. User interface changes - -quagga# show ip nht -3.3.3.3 - resolved via kernel - via 11.0.0.6, swp1 - Client list: bgp(fd 12) -11.0.0.10 - resolved via connected - is directly connected, swp2 - Client list: bgp(fd 12) -11.0.0.18 - resolved via connected - is directly connected, swp4 - Client list: bgp(fd 12) -11.11.11.11 - resolved via kernel - via 10.0.1.2, eth0 - Client list: bgp(fd 12) - -quagga# show ip bgp nexthop -Current BGP nexthop cache: - 3.3.3.3 valid [IGP metric 0], #paths 3 - Last update: Wed Oct 16 04:43:49 2013 - - 11.0.0.10 valid [IGP metric 1], #paths 1 - Last update: Wed Oct 16 04:43:51 2013 - - 11.0.0.18 valid [IGP metric 1], #paths 2 - Last update: Wed Oct 16 04:43:47 2013 - - 11.11.11.11 valid [IGP metric 0], #paths 1 - Last update: Wed Oct 16 04:43:47 2013 - -quagga# show ipv6 nht -quagga# show ip bgp nexthop detail - -quagga# debug bgp nht -quagga# debug zebra nht - -6. Sample test cases - - r2----r3 - / \ / - r1----r4 - -- Verify that a change in IGP cost triggers NHT - + shutdown the r1-r4 and r2-r4 links - + no shut the r1-r4 and r2-r4 links and wait for OSPF to come back - up - + We should be back to the original nexthop via r4 now -- Verify that a NH becoming unreachable triggers NHT - + Shutdown all links to r4 -- Verify that a NH becoming reachable triggers NHT - + no shut all links to r4 - -7. Future work - -- route-policy for next hop validation (e.g. ignore default route) -- damping for rapid next hop changes -- prioritized handling of nexthop changes ((un)reachability vs. metric - changes) -- handling recursion loop, e.g. - 11.11.11.11/32 -> 12.12.12.12 - 12.12.12.12/32 -> 11.11.11.11 - 11.0.0.0/8 -> -- better statistics