What is QoS?
Quality of service is a group of tools that form a traffic delivery policy. This policy ensures some traffic arrives on time, allows some traffic to arrive late, and some to not arrive at all, depending on how the traffic is classified and what importance is given to it. The idea is that traffic is split up into classes, and each class is treated in a particular way as it travels through interfaces across your network infrastructure. That “particular way” is defined by your QoS policies.
On Cisco equipment, QoS policies are usually defined by the Modular Quality of Service Command Line (MQC), a policy description language that consists of class maps, policies, and behaviors. Class maps identify particular traffic, creating a “class.” A policy describes behaviors to be followed for the defined traffic classes. The policy is then applied to a router or switch interface in an ingress (traffic flowing into the interface) or egress (traffic flowing out of the interface) direction. Once applied to an interface, the QoS policy can impact traffic flowing through the interface.
The purpose of the applied QoS policy is to help a business to meet its objectives. Business objectives might be to reduce telecommunications costs by converging voice, video and data traffic onto the same network. QoS comes into play in a converged network by making sure the common network infrastructure delivers all traffic types as required by their applications (i.e. preventing a huge FTP transfer traversing a WAN link from disrupting the quality of an IP phone call traversing the same link at the same time.)
Why do engineers hate QoS?
In the network engineering world, QoS as a networking technology is second only to IP multicast on the list of loathsome knowledge domains. There are a number of reasons for this.
- One issue is that of confusing terms. QoS is loaded with acronyms & a unique vocabulary that doesn’t transfer from any other networking knowledge domain especially well. An engineer is left with a rather large group of new concepts to make sense of. At first exposure to QoS theory and concepts, an engineer might wonder things like, “What the heck is a DSCP mutation map? What is a per-hop behavior? What is a three-color, two-bucket policer? Random early detection?? What’s being detected, why is it random, and in what sense is it early?”
- Another issue is that of confusing use-cases. By this, I mean that the QoS neophyte is likely to be overwhelmed by the number of QoS tools available and knobs that can be turned when implementing those tools. At first, it is difficult to know how best to apply QoS tools to resolve a particular problem. For example, an engineer might ponder, “Should we apply congestion management in this situation? Or congestion avoidance? And what’s the difference anyway?”
- Another major QoS frustration is the fact that QoS policy syntax, at least in the Cisco realm, is inconsistent across platforms. While Cisco’s MQC alleviates this concern to some degree, the Catalyst product line is still notorious for having a variety of silicon resulting in a variety of layer 2 QoS behavior that must be accounted for by a designer. It seems that nearly every Ethernet chipset has a different queuing structure, varying even among different line cards designed for specific products, such at the Catalyst 6500 chassis. This means that an engineer will probably find it impossible to issue the same QoS policy to every device on the network. Instead, each device will have to be evaluated on an individual basis and the code modified accordingly to affect the same change.
What problems can QoS solve?
QoS helps traffic classes share network infrastructure in a way that allows applications to function as expected by their designers. In other words, voice calls can be set up, and the conversation sounds natural with no gaps or stutters. Video streaming looks clear and consistent, with no frozen screens or garbled pictures. Business critical applications such as financial services are never rendered inaccessible or unusable due to network congestion.
In more detail, QoS tools exist to accomplish specific tasks such as the following:
- Ensure that voice packets are guaranteed not only delivery, but timely delivery using bandwidth reservation and low latency queuing.
- Ensure a minimum available amount of link bandwidth for video traffic using a bandwidth guarantee.
- Ensure that large, noisy flows (like an FTP download on an Internet pipe) doesn’t cause quieter flows to be starved for bandwidth using combinations of shaping, policing, fair-queueing, and random early detection.
- Using policing, ensure that known undesirable traffic such as file-sharing applications do not take up significant bandwidth. (This is a tricky one, since some protocols tend to hide themselves quite well, with no easy way to overcome this. But it’s still a topic worth talking about.)
What problems can QoS *not* solve?
In my experience, there’s a perception that if you apply QoS to just about any network throughput problem, the problem will go away. QoS is the sparkly magic dust that you blow towards a router’s fan intake to make astonishing performance improvements. Sadly, that’s just not the case. While QoS clearly has an important role to play in end-to-end application delivery, it can only work within the constraints of the network in which it is applied. There’s no magic here – only science & math.
One common problem I’ve seen people try to fix with QoS is an undersized pipe. While it is possible to reserve bandwidth for some applications and force fair sharing of remaining bandwidth to others, the fact remains that you can’t drain the ocean with a straw. No amount of QoS will make a fractional T1 perform like 100Mbps Ethernet.
QoS is sometimes proffered as a solution to poor WAN performance, when the underlying issue is actually a physical layer problem. QoS can’t fix bit errors, bad frames, and other layer 1/2 anomalies. Yet, “QoS” is often the knee-jerk reaction to any reported WAN issue. A link’s integrity must be proven before the effect of a QoS policy can be accurately measured.
A physical issue that is much harder to detect is lossiness in a WAN provider’s cloud. Just like QoS can’t resolve a physical layer problem you can see, QoS isn’t going to fix the problems you can’t see, either.
On the heels of physical link issues as something that QoS can’t fix comes bad network design as something QoS can’t fix. For example, if you’ve left your spanning-tree to form its own topology based on defaults, there’s a good possibility that your network is forwarding across unexpected (and probably slower) links than you’d like, while blocking across faster links you’d assume were in a forwarding state. Similarly, QoS can’t fix a lack of equal-cost multi-path links for a network switch or router to choose from. Especially in data center design, ECMP is a simple solution that can help ameliorate congested links. This is roughly the same issue as an undersized pipe. You can’t put 50 Mbps of traffic into a 10 Mbps pipe, and no amount of QoS will make that happen. A bad routing design can also be sending traffic across undesirable links, creating artificial bottlenecks that, again, no amount of QoS can resolve.
Latency is the amount of time between endpoints. Keeping it simple, the further away two network endpoints are from each other, the higher the latency. While QoS can be informed by latency (i.e. you can anticipate throughput based on math that includes latency as a portion of the calculation), QoS can’t change physics, aka the speed of light is a challenge for network communications that doesn’t disappear because of the application of a QoS policy.