On the Packet Pushers YouTube channel, Jorge asks in response to Using VXLAN To Span One Data Center Across Two Locations…
if stretching the layer 2 is not recommended, then what is the recommendation if you need to fault over to a different physical location and still got to keep the same IP addresses for mission critical applications?
That video is a couple of years old at this point, and I don’t recall the entire discussion. Here’s my answer at this moment in time. If DCI is required (and I argue that it shouldn’t be in most cases), look at VXLAN/EVPN. EVPN is supported by several vendors. If you are a multi-vendor shop, watch for EVPN inter-vendor compatibility problems. Also look for vendor EVPN guides discussing the use case of data center interconnect (DCI).
Also be aware (and beware) of vendor-proprietary DCI technologies like Cisco’s OTV. I recommend against investing in OTV and similar tech unless you already have hardware that can do it and can turn the feature on for free. Otherwise, my opinion, for what it’s worth, is to stick with an EVPN solution. EVPN is a standard that’s been running in production environments for years.
EVPN is complex. There are tradeoffs. You could talk me out of it depending on the scenario. But at the moment, it’s a design I favor because it’s broadly supported across the industry and scalable.
More High Level Detail
Jorge is describing a situation where the network needs to support a badly designed application that’s tightly coupled to an IP address. No application should be tightly coupled to an IP address. This common issue should really be solved by application architects rebuilding the app properly instead of continuing like it’s 1999 while screaming YOLO.
I know there are really old legacy apps that need that L2 adjacency for redundancy or can’t re-IP. I know there are apps that can’t be redesigned because reasons. Fair enough. But what I’d like to ask the business stakeholders is…do you really want to have a critical business function rely on an application that can break when something as ephemeral as an IP address changes? You really don’t, and so I see this as more of a business problem than a technology problem. Your business is tied to an inflexible app. That’s bad for business.
But back to the reality Jorge and many of us face. The business stakeholders are more likely to say, “Make it work,” sticking engineers with a horrifying network design requirement–stretching L2 between physical locations.
Avoid Fate Sharing
The big idea is to support the same IP address in multiple locations, but to NOT have fate-sharing, where a problem like a bridging loop and resulting broadcast storm at one site would take down the other site. That means we can’t just throw up a tagged VLAN link (trunk) between the DCs. Instead, we have to divide the L2 broadcast domain (the VLAN) into different L2 domains separated by a routed segment. This way we’ve created two failure domains that will not share fate.
But this introduces a problem, because now hosts in the separate data centers think they are in the same L2 broadcast domain…but aren’t. Therefore, hosts can’t discover each other to send Ethernet frames back and forth, because ARP broadcasts don’t go any further than the router at the edge of each location.
Tunnels All The Way Down
That means we need a layer on top that connects the two separate L2 domains together, while maintaining that sweet L3 separation. What’s that layer? A tunnel. A tunnel that can encapsulate an entire Ethernet frame and carry it from the L2 domain in the one data center to the L2 domain in the other data center. An encapsulation format designed to do that and commonly used in data centers is VXLAN.
Note that there are other encapsulation formats that can also do this such as NVGRE and Geneve, Geneve seeing increased use lately.
Okay…but now we have another problem. How do we know where a VXLAN tunnel should begin and end? That is, where are the VXLAN tunnel endpoints (VTEPs)? And what if we have a bunch of VLANs we want to stretch between data centers like this, because of course we do? How do we track which VXLAN tunnel is carrying traffic for which VLAN and where we should be dropping off these Ethernet frames as they pop in and out of tunnels so that they make it to their destination?
Well…you could code all that by hand. (Ha ha ha ha…NO.) Or…you could rely on multicast to do VXLAN advertisement and discovery aka flood and learn. (Um, multicast…arguably, that’s a big request if you’re not already running multicast for other reasons.) Or…you could build out forwarding tables using EVPN. EVPN uses BGP to advertisement what MAC addresses are reachable via what VXLAN tunnels.
Note that EVPN isn’t limited to using VXLAN tunnels for transport. MPLS is another data plane commonly used. VXLAN just happens to be our context here.
Another piece of the puzzle is recognizing that the L2 hosts in each data center have no knowledge of VXLAN tunnels or EVPN advertisements. Hosts still expect to put an ARP on the wire and get back a response so that they know the MAC to put in the destination address field of the Ethernet frame they’re building.
The short answer is that the VXLAN/EVPN solution is going to handle that, too. There’s a few different things related to ARP that can happen which are beyond our scope today, but the point is that…ARP is handled. The solution will fake it so that a host is none the wiser that the other host they are trying to communicate with is in some other data center many miles away.
Complexity Breeds Fragility
There are traffic optimization problems that crop up when stretching L2. Search for “DCI traffic trombone” to do some reading on that issue and possible solutions.
And of course, with any technology as complex as EVPN, you’re introducing to the network something else that can break. This is why I’m inclined to push the problem of stretching L2 back to application architects. The problem isn’t the network. It’s the app. Fix the app.
It’s worth pointing out that public cloud vendors don’t let you stretch L2 around the cloud. Or if they do, what’s going on under the hood isn’t simply stretching L2. For example, Ivan Pepelnjak has a 2019 post on this for the Azure cloud. Why do you suppose the public cloud vendors don’t support this? Because they are hosting many customers on a shared infrastructure. Fate sharing is not an option. Therefore, you’ll host your app on their infrastructure their way, and they won’t let you do dumb things. They can’t afford the fragility.
Keep An Open Mind
As a parting thought, it’s worth pointing out that there are other ways to make IPs appear where you need them to. For instance, you can forget about stretching L2 domains at all, and look instead at IP mobility options. Simple host routing is one way to tackle this, albeit a hardware-intensive one at scale.
The larger point is that it’s important to keep an open mind about how you can solve the problem you’re trying to solve. Stretching L2 is different from host routing which is different from anycast which is different from application delivery controllers which is different from CDNs. All of these (and other) networking technologies (hi there, DNS) might come into play to make a service (far more important than an IP) appear where you need it to appear.
While there are pros and cons to each approach, of course, know you have many design options to make an application highly available. Don’t get locked into thinking that you have to have a DCI solution because someone who doesn’t know any better told you to stretch a VLAN.
If you’d like to learn more about EVPN, the Packet Pushers offer lots of podcasts and blogs on EVPN where you can learn more about it. For free. No sign ups, data harvesting, select all the traffic lights, or other nonsense. Just go listen or read.