From the blog.

Managing Digital Racket
The more I tune out, the less I miss it. But that has presented me with some complex choices for a nuanced approach to curb
Complexity – My Friend, My Enemy
Over my years of network engineering, I've learned that the fewer features you can implement while still achieving a business goal, the better. Why? Fewer

Enterprise QoS Part 07 – A consistent QoS strategy: queueing collaboration applications at the WAN edge.

2,042 Words. Plan about 13 minute(s) to read this.

As traffic flows across an enterprise’s network, there often comes a point where some part of the infrastructure is not owned by the enterprise. For example, enterprises with offices spread across several different cities usually rely on a telecommunications provider to connect the offices together. The telecom provider will layer the enterprise’s traffic on top of their own infrastructure, commonly in the form of an L3VPN. To the enterprise, the connection handed off to them by the provider is a private one providing access to their remote offices. The wide area network (WAN) feels like an extension of the local area network (LAN), and in certain regards (IP scheme, security policies, etc.) can be treated just like a connection they own. However, QoS requires special consideration.

  • A WAN provider abstracts away the complexity of their network from their customer, hiding the details of the actual transport from the enterprise. While it might seem to the enterprise that a remote office is just one hop away, reality is that their traffic is being tunneled across the provider’s infrastructure, most likely with MPLS. That hidden provider infrastructure contains any number of network devices that the enterprise has no configuration control over.
  • The marks an enterprise selects and builds QoS policies around are not necessarily meaningful to the WAN provider. While a WAN provider will in almost all cases preserve DSCP marks as traffic traverses their infrastructure, that does not imply that they are making queueing decisions on those marks during times of congestion, at least not by default.
  • A WAN provider typically has an option to allow enterprise customers to leverage the provider’s QoS scheme. This might come at a premium cost, or might be baked into the contract whether a customer chooses to take advantage of the feature or not.
  • Providers might not have as many traffic classes as an enterprise has created in their QoS scheme. In my experience, providers offer either 4 or 8 traffic classes. While 8 should be enough for about any enterprise QoS scheme I can imagine (and who would want to manage more than 8 anyway?), 4 might be limiting for some.

In my experience, taking advantage of the provider’s QoS scheme requires two things of the enterprise. One is that QoS functionality must be enabled on the circuit being leased to the enterprise. For the fortunate customer, this can be accomplished via the provider’s portal, minimizing provider involvement in the change. The other is that the enterprise must mark their traffic in a way that the provider will honor, providing the desired queueing behavior. The provider will provide the details of what marks they honor, and how they will treat each traffic class.

In my experience, I’ve found that providers pay attention to IP Precedence (which I’ll abbreviate as PREC) values, although enterprises are typically concerned with DSCP values. While this might seem like a contradiction, PREC and DSCP values can be compatible. In fact, the DSCP PHBs are intended to be backwards compatible with PREC. How does this work? PREC values take up the first 3 bits of the IP ToS field, for a total of 8 possible values; DSCP uses those same 3 bits plus 3 more for a total of 64 possible values. So if a PREC-only device looks at a DSCP value, what he observes are the first 3 bits are. In a practical sense, that means that the only thing really lost is granularity when sending a DSCP value in a PREC-only world. 8 DSCP values are going to be the equivalent of 1 PREC value. Therefore, if you get your DSCP values right, they will remain distinct even when the last 3 bits are ignored.

Let’s talk about DSCP to PREC backwards compatibility in more detail. As we’ve established, DSCP values are comprised of 6 bits. Precedence is only the first 3 bits of those 6 DSCP bits. So, if there first three binary bits of a given DSCP value are 000 and the next 3 are anything from 000 – 111, the resulting PREC value will always be 0, as only the first 3 bits are observed. Based on that logic, this short chart lists the values that could exist in the first 6 bits, and maps them to their 3-bit PREC equivalent. I also show which PHB-defined values fall into the 8 PREC classes.

BINARY        | DECIMAL             | PHBs
000+(000-111) | DSCP 00-07 = PREC 0 | Default, best effort
001+(000-111) | DSCP 08-15 = PREC 1 | CS1, AF11, AF12, AF13
010+(000-111) | DSCP 16-23 = PREC 2 | CS2, AF21, AF22, AF23
011+(000-111) | DSCP 24-31 = PREC 3 | CS3, AF31, AF32, AF33
100+(000-111) | DSCP 32-39 = PREC 4 | CS4, AF41, AF42, AF43
101+(000-111) | DSCP 40-47 = PREC 5 | CS5, EF
110+(000-111) | DSCP 48-55 = PREC 6 | CS6
111+(000-111) | DSCP 56-63 = PREC 7 | CS7

Ideally, an enterprise will choose DSCP values that do not collide when mapped to a provider’s PREC scheme. By “collide”, I mean that two marks that the enterprise is using for distinct kinds of queueing end up in the same queue when handed off to the provider. Referencing the chart above, DSCP values of CS3 (24) and AF31 (26) collide, both ending up mapped to PREC 3. This underscores the importance of thoughtful planning when creating a QoS policy. While you could use whatever values you like to mark your traffic and enforce that policy across your enterprise, perhaps starting with 0 and adding 1 to each new class you define, that scheme will end badly because of the deviation from expected norms for QoS marking.

Note that collaboration application vendors tend to comply with PHB norms when marking their traffic, so inventing your own creative marking scheme is a bad idea. For example, Cisco describes their traffic marking scheme for CallManager here, along with links to several related applications.

The Cisco MQC policy below was created for use with a provider that offered four queues, P1 through P4. The provider recommended the mapping scheme – what marks would end up in which of their queues. I found that our DSCP scheme mapped nicely into the provider’s PREC scheme. There were no collisions, and voice traffic ended up in their priority queue. So, for readability, I mapped our values at the WAN edge into their queues using PREC instead DSCP.

  • The P1 queue was the provider’s priority queue, suitable for voice traffic. A priority queue minimizes transmission delay (how long the packets sit around in the queue) and jitter (the difference in inter packet gaps, which should be consistent for voice traffic.)
  • The P2 queue was the provider’s recommendation for applications requiring a certain amount of bandwidth to be guaranteed at a minimum, the value of which was selectable via a variety of canned templates. Notably, I could not customize beyond the provider’s template without getting a special dispensation from the carrier. For my purposes, streaming video traffic was mapped to PREC 4. Unlike voice traffic, video traffic is somewhat tolerant of delay due to buffering, as well as being tolerant of jitter. As long as the video traffic is delivered, the application can tolerant oddities in the delivery stream. Of course, that’s only to a point. But they key here is realizing that while voice can’t hang around in a queue for very long and needs to be delivered on regular packet intervals, video streaming is going to make it as long as enough bandwidth is set aside for it. Cisco offers some insightful comments about the nature of video vs. voice traffic here.
  • The P3 queue was another queue like P2. The key here is to have another queue that reserves some bandwidth for call signaling, separate from the P1 and P2 queues. Call signaling aka voice control is marked in this scenario as PREC 3. Reserving a percentage of bandwidth for P3 via the template on the provider’s portal means that call setup traffic should always be delivered. By “call setup”, we’re talking roughly about what happens when a phone is dialed and the call established. If that doesn’t happen due to dropped or excessively delayed packets, it doesn’t really matter how effective the rest of the policy might be.
  • The P4 queue from my perspective was for “everything else.” That means that unmarked applications or applications marked with PREC 1 would end up in this queue. From the provider’s perspective, this was a queue just like P2 & P3. A percentage was assigned to it via a template. I wasn’t as concerned about delivery of this traffic; I was more concerned about being sure that voice & video traffic would be delivered across the WAN provider cloud. That’s not to say that the traffic was not potentially important to the business. Rather, the issue here is that the unmarked traffic that was going to end up in this queue was most tolerant of buffering and poor performance when compared to the other traffic classes.

Let’s walk through this Cisco MQC code and review what was happening on the edge router, before it went into the WAN cloud.

! These class-maps define 4 classes that I named to coincide with the provider’s queues. It’s not that somehow the router was magically placing traffic into provider queues. The provider had no sense of how my edge router was queueing traffic. The provider merely looks at the traffic marks, and puts the traffic into the corresponding queue in their equipment.
! In conjunction with the explanation above the class-map code below should be self-explanatory.
class-map match-any WAN-PROVIDER_P1
 description REAL-TIME VOICE
 match  precedence 5
class-map match-any WAN-PROVIDER_P2
 description STREAMING VIDEO
 match  precedence 4  6  7
class-map match-any WAN-PROVIDER_P3
 description CALL SIGNALLING
 match  precedence 2  3
class-map match-any WAN-PROVIDER_P4
 description BEST EFFORT
 match  precedence 0  1

! The policy is named “OUTBOUND.” (As an aside, my habit is to use all capital letters for objects I create in a device configuration. It’s easier to differentiate those objects from OS keywords that way.)

! Traffic in class WAN-PROVIDER_P1 is placed into a low-latency queue as instructed by the keyword “priority”. The LLQ reserved 300Kbps of traffic, and also acts a policer. 300Kbps is a calculated amount based on the codec in use and the amount of simultaneous calls expected at peak load, plus a bit of extra. We’ve talked about low-latency queues in this series already, but remember that a LLQ is suitable for voice traffic because it minimizes both delay and jitter, making for happy VoIP conversationalists.

! Traffic in class WAN-PROVIDER_P2 is guaranteed a minimum bandwidth of 750Kbps, but could use more if available. This behavior is “class based weighted fair queueing” (CBWFQ), and is appropriate for video traffic. Note that CBWFQ does not introduce a policer like LLQ does. Further note that if we changed the keyword “bandwidth” to “priority”, we would *not* have created a second LLQ. Instead, we would have in effect added video traffic to the same LLQ voice traffic is in. With rare exception, Cisco network gear has a single LLQ, no matter how many traffic classes you map to it. Putting multiple kinds of traffic into the LLQ can have a negative effect on voice traffic, because the voice traffic is no longer getting unique, priority treatment when compared to other traffic classes. Or put another way, if we all drove Rolls-Royce automobiles, it would no longer be special to drive a Roller.

! Traffic in class WAN-PROVIDER_P3, a minimum of 40Kbps of bandwidth is guaranteed, although not reserved – remember “bandwidth” indicates CBWFQ and not LLQ. This small amount was all that was required for call signaling in this particular environment.

! In any policy-map, traffic that does not match any of the defined classes will fall through to the pre-defined class “class-default”. In this policy, the router is instructed to apply fair-queueing to traffic flows that fall into class-default, meaning that all traffic flows should have a roughly equal share of remaining bandwidth.

! A final note is that you can’t allocate more bandwidth to traffic classes than you actually have available on the interface. The router will complain if you try.
policy-map OUTBOUND
 class WAN-PROVIDER_P1
    priority 300
 class WAN-PROVIDER_P2
    bandwidth 750
 class WAN-PROVIDER_P3
    bandwidth 40
 class class-default
    fair-queue

! The last step in QoS policy creation with MQC is to apply it to an interface. We want this policy to impact traffic that is exiting (egressing) our interface that uplinks to the provider, so the OUTBOUND policy is applied in an “output” direction.
interface GigabitEthernet0/1
 bandwidth 10000
 ip address 10.11.12.2 255.255.255.252
 service-policy output OUTBOUND

Search for all parts in the Enterprise QoS series.