As I continue to study software defined WAN (SD-WAN) technologies, I have discovered many commonalities in the various vendor products. One that sticks out to me is the centralization of functions — specifically, which network functions get centralized, and which do not.
In the world of idealistic fantasy, an software defined network of whatever kind would centralize all functions. Pesky reality gets in the way of idealism, and so it is that we find full centralization to be an impractical idea.
The impracticality of full centralization.
There are at least two significant reasons why software defined networking can’t rely exclusively on a centralized controller for control plane functions.
1. Latency. Sending traffic across the wire to a controller, allowing the application(s) running on the controller to process that traffic, and waiting for a response all takes time. The latency penalty might be tolerable in a LAN where topology changes are relatively few, baseline network performance relatively constant, and the controller very close by. In such an environment, relatively little should get punted to the remote controller making the occasional pause not so bad. (Maybe. We could argue the opposite point rather easily.)
However, in an WAN scenario, latency is substantially higher. Forwarders and controllers are likely to be separated by tens of milliseconds or more. In addition, I could argue that there are more events for the control plane to consider in a WAN vs. a LAN.
More control plane events. Higher latency. Yuck — full centralization of all network functions would do application performance over an SD-WAN no favors.
2. Separation. How does a software-defined network device lacking an independent control plane cope with forwarding when separated from its controller? In LAN scenarios, this condition is anticipated and handled in a variety of ways, depending on the vendor. In most scenarios, the network will operate headless, forwarding as it had been the last time the controller offered input. The compromise is an inability to change network forwarding behavior until the controller is back online.
In SD-WAN scenarios, the situation is much the same. SD-WAN forwarders that have lost contact with the central controller will forward traffic in accordance with the last policy received. This is a necessary function, as network separation of remote forwarders from the central controller is highly likely in geographically distributed WANs.
Again, full centralization of all network functions does the network no favors when the control plane is removed due to a controller outage or network partition.
Which network functions should be centralized? Which distributed?
Let’s expand this logic a bit further. In the context of latency and separation from the controller, what sorts of network functions make sense to centralize, and what sorts do not? I believe Cisco with their still-developing Intelligent WAN (IWAN) product has some good answers for the SD-WAN space.
In a presentation I heard in August 2015, Cisco described IWAN’s centralize-or-not philosophy in a common sense way. First, the IWAN team starts with a question: what is being optimized for? To put context around that question, consider that Cisco’s IWAN has a goal. That goal is to optimize the performance of individual applications across the WAN, taking into account real-life WAN conditions that are changing frequently. To achieve that goal…
- There must be a policy describing application SLAs.
- There must also be real-time measurement of WAN behavior to determine the ability of a path to meet that SLA.
Where should these functions be placed? Cisco uses the following principles to guide where IWAN functions are placed.
- Don’t centralize for the sake of an ideal or a philosophy of “centralize all the things.”
- Centralize functions when that centralization will make WAN operations simpler (I don’t have to touch a bunch of individual devices to affect change) or more agile (I can complete a task at WAN scale quickly).
- Distribute functions when that distribution makes the WAN perform better (react to changes without waiting for a central controller) or scale larger (add more routers to the infrastructure while not weighing the controller down with more input).
With this in mind, Cisco IWAN centralizes functions such as zero touch provisioning, policy automation, system verification, and troubleshooting. For most of these functions, a network operator will interaction with the still-in-beta Cisco APIC-EM SDN controller. To wit, APIC-EM is where policy is built, the functionality of the system validated, and troubleshooting from a system-wide point of view occurs.
Distributed functions include fundamental routing, overlay system management (DMVPN), and path performance measurements (PfRv3).
Does this make sense?
It does to me, and it does to all SD-WAN vendors thus far if you study their architectures. Centralized vs. distributed functions is a concern common to all of these products. As a result, most of the SD-WAN architectures look similar. Placing the SD-WAN controller in the middle of every monitoring function and decision would be a bad idea. Therefore, we tend to see management functions centralized and forwarding decisions distributed.
For more information.
To see the same Cisco presentation that I witnessed at Networking Field Day 10, view below. The discussion on central vs. distributed control-plane functions begins at 10:58.