With an introduction to QoS behind us, let’s start talking through some of the design concerns that drive QoS policy creation.
What QoS problems do enterprises typically have?
Network convergence is a trend that has stayed steady in enterprise (and service provider) networks for well over a decade now. The challenge with collaboration applications – voice & video especially – is that the network traffic they generate needs special treatment so that the service level is guaranteed for the organization using them. This is perhaps not obvious. After all, many of us have used voice over IP (VoIP) across the Internet, and we know that the Internet is a best effort transport. While you might be able to prioritize your Skype or VoIP phone traffic as it leaves your router, once that traffic hits the public Internet, all bets are off. That voice traffic very probably won’t be treated any better than any other traffic. And yet, we know that Skype is quite usable for voice & video, and our VoIP phone works just fine most of the time.
I tipped my hand in that last sentence. “Most of the time” is the issue. While using the Internet for voice & video traffic transport is fine as long as you only need it to work “most of the time,” that’s not an acceptable service level agreement for serious-minded enterprise business customers who need their fancy VoIP phone to work all the time. Every time. Without issue or exception. After all, their cell phone works rather consistently well, and their old-school land line worked as far back as their grandparents and great-grandparents. Pick up the handset, hear the dial-tone, punch in the numbers, connect the call, talk to the other party, and then hang up. The expectation from many decades of ubiquitous user experience is that phones just work. For them to not work, work sporadically, or have voice quality issues is therefore not an acceptable user experience. despite the fact that the underlying transport has changed dramatically.
That said, the fact of the matter is that your voice & video applications (assuming enough bandwidth has been provisioned to carry expected traffic loads) will sound and look good without a QoS policy most of the time. You create and apply a QoS policy to handle the rest of the time.
The question might have crossed your mind, “Why do my web services, FTP, SQL transactions, e-mail, instant messaging, and so on seem to work fine without QoS? Why do voice and video need this kid glove treatment?” Or put another way, why do most apps seem to work it out no matter what’s happening on the network, while voice & video traffic are impacted at times? Those are good questions, and the answer is twofold.
- The first issue is that the voice & video traffic is real-time traffic. And by real-time, I mean that they can’t tolerate excessive buffering, packet drops, or changing network characteristics. The packets carrying voice & video traffic have to get to the recipient on time, or the conversation is impacted – an audible problem is heard on the other end of the call, or artifacts are seen on the remote screen. By contrast, if a web page loads a bit slow, or if an e-mail arrives a bit late, or if an FTP download takes a little long, there is very little degradation of the user experience. Most of the time, these issues aren’t even noticed with those sorts of applications.
- The second issue is that live voice & video protocols tend to run over UDP, as opposed to TCP. UDP is an unacknowledged protocol. In other words, UDP packets are sent over IP with the hope that the packet makes it, but the sender has no knowledge about whether or not the recipient actually received the traffic. If you think about it for a moment, this makes sense in a live setting. Traffic sent live must be received in the time frame expected, or it just doesn’t matter anymore. The moment has passed. So why burden live traffic with the additional overhead TCP acknowledgement introduces? Now, there are arguments in favor of TCP for live data streams, but at this writing, UDP is normal. Putting this all together, the majority of data flows like FTP, HTTP, SMTP, etc. use TCP as their transport layer protocol, and as we’ve already established, they are usually not used to carry real-time traffic. In the face of network congestion and packet loss, non real-time TCP applications do okay. TCP error recovery mechanisms carry the application through most circumstances. But if UDP traffic flows through a lousy network path, there is no way for it to recover.
All of that was a long explanation of why QoS is a key part of a collaboration application deployment. You need to guarantee packet delivery of these real-time UDP applications across your network, because all the traffic needs to get there on time, every time that user picks up a phone or fires up a video conference.
While I’ve focused on collaboration applications as a key driver for enterprise QoS, there are many other drivers I can think of that would drive a QoS policy development and deployment. I’ll go so far as to say that if you aren’t running QoS on your network at all, then you’re inviting problems. I’m not suggesting that every enterprise has to have a perfect QoS policy deployed right now or the network is at risk of sudden doom. But I am suggesting that having a QoS policy in place that’s at least functional provides a networking team with an existing production framework they can tweak if a problem arises. Having that production framework in place before a problem happens means there’s a lower time to resolution when a problem actually arises.
Before you wave me off as grumpy old network engineer tilting at windmills, allow me to share some scenarios where I think QoS should be applied in any enterprise network, along with a few poignant facts.
- In Cisco-speak, there’s a special kind of QoS called control plane policing. The goal of CoPP is to make sure that your routers and switches remain manageable in the face of a denial of service attack. While purposefully malicious DoS attacks against network device control planes are plausible, the reality is that most enterprises will DoS their own equipment by introducing a topology loop into the environment. As the loop spins never-dying Ethernet frames into a network tornado, network devices will see more and more frames carrying traffic they must respond to, for example generic broadcasts, ARP requests, routing protocol updates, etc. A network device’s CPU must handle control plane traffic. When a topology loop kills the network with traffic, network device CPUs spikes to 100%, preventing network operators from managing the equipment, as the CPU is too busy trying to keep up with the traffic flood to respond to SSH or even keep routing adjacencies up (!). CoPP is a tool to discard control-plane traffic that exceeds a rate that you set, thus preventing excess traffic from causing a control plane DoS. While CoPP is indeed a QoS tool, it’s a bit of a corner case. While I definitely wanted to mention it, I’m not planning to discuss it further in this series.
- All enterprises have applications that are mission critical, or at least easily identifiable as more important than other applications. Some applications I can think of that fall into this category include financial packages, sales engines, and portal sites. Essentially, any application that drives an organization’s cash flow or operations should be given priority across a network infrastructure.
- Logically, no network has infinite amounts of bandwidth. (Okay, almost no network. I acknowledge that there are lossless fabrics that exist, usually built at great cost and for special purposes such a high performance computing.) While WAN links are more likely than LAN links to become congested, it’s not unreasonable to view LAN links as possible points of congestion and deal with them as such if they are carrying mixed traffic types. On the other hand, it doesn’t follow to apply a traffic prioritization scheme to a host-facing Ethernet port that only sees one application running across it. My point is that enterprises shouldn’t overlook their LANs when designing their QoS scheme, despite the temptation to do exactly that.