Have you ever wondered what would happen with your tier-0 dynamic routing in a case of a BGP failure? Should you use BFD or a Graceful Restart to protect the control plane? Maybe both together? Let's examine these protocols and see how they fit with tier-0 gateways in NSX-T.
Graceful Restart for BGP
Graceful Restart (GR) is a mechanism that is designed to allow control protocol to restart without interrupting network connectivity state. It is based on the existence of a separate forwarding plane that does not necessarily share fate with the control plane. The GR implementation ensures the forwarding plane can continue to function while the routing protocol restarts.
The Graceful Restart Capability is a new BGP capability that can be used by a BGP speaker to indicate its ability to preserve its forwarding state during BGP restart.
GR is useful in implementations where the BGP peer has dual supervisors, so the GR capability will allow for preserving the existing routes and the forwarding state during supervisor failover.
A BGP control plane restart could happen due to a supervisor switchover, planned maintenance or active routing engine crash. As soon as a GR-enabled router restarts, or the control plane fails, its peer will mark the routes in the forwarding table as stale. A router does not differentiate between stale and other routing information, it will keep forwarding to both. The reason behind those routes to be marked as 'stale' is so they can be deleted if the graceful restart timer expires or to be replaced by fresh routing updates once the control plane session restarts.
However, in my recent experience I haven't seen a single customer to use dual supervisor switch to peer their tier-0 gateways with. I am not saying nobody uses the big fat boys anymore, just nowadays leaf/spine networks have taken over the datacentres, where they provide scaled up control plane and smaller fault domains unlike the dual supervisor switches that have the potential to bring down half the network, in case of an outage.
BFD for BGP
According to RFC5880:
BFD is a simple Hello protocol that, in many respects, is similar to the detection components of well-known routing protocols. A pair of systems transmit BFD packets periodically over each path between the two systems, and if a system stops receiving BFD packets for long enough, some component in that particular bidirectional path to the neighboring system is assumed to have failed.
BFD is especially helpful in the network scenarios where an alternative path is available in the network. There it will help to detect a failure in the end-to-end forwarding path quicker, so the forwarding can switch to the next best path. BFD Implementation in NSX-T allows for sub-second convergence when you run on a Bare Metal Edge Node (300ms x3) and a second and a half convergence when the tier-0 runs on a VM Edge Node.
BFD for BGP with Graceful Restart
In the NSX-T Reference Design Guide it is written that:
It is recommended to enable GR If the Edge node is connected to a dual supervisor system that supports forwarding traffic when the control plane is restarting. This will ensure that forwarding table data is preserved and forwarding will continue through the restarting supervisor or control plane. Enabling BFD with such a system would depend on the device-specific BFD implementation. If the BFD session goes down during supervisor failover, then BFD should not be enabled with this system. If the BFD implementation is distributed such that the BFD session would not go down in case of supervisor or control plane failure, then enable BFD as well as GR.
That is fine, but there is a catch in that recommendation: you cannot configure the Control Plane Independent bit in tier-0 gateway. It just lacks that functionality, even though the NSX-T routing engine runs FRR under the hood, which supports that capability.
On the left is my pFsense with FRR package installed and on the right, it is my Tier-0 gateway:
Therefore, it is safe to conclude there is no value in configuring BFD and Graceful Restart together on a tier-0 gateway. Even if you do configure them both, and all the BGP sessions go down that will introduce a failure condition for the tier-0 SR and it will trigger a failover to another Edge Node, so preserving the forwarding state on the failed node makes no sense.
Graceful Restart Helper
Graceful Restart Helper mode is the ability to assist a neighbouring router attempting graceful restart. VMware recommends enabling GR Helper, however I fail to see the benefits of using that considering the fact we do not want to mix GR and BFD on the ToR switches, due to the reasons explained above. No GR on the ToR would mean no need of GR Helper on the tier-0 gateway.
Any comments are welcome.
Thanks for reading!