From the blog.

Managing Digital Racket
The more I tune out, the less I miss it. But that has presented me with some complex choices for a nuanced approach to curb
Complexity – My Friend, My Enemy
Over my years of network engineering, I've learned that the fewer features you can implement while still achieving a business goal, the better. Why? Fewer

Enterprise QoS Part 09 – A consistent QoS strategy: end-to-end packet walk – congested vs. non-congested.

2,156 Words. Plan about 14 minute(s) to read this.

If you’ve made it this far into the series, I have one simple point about QoS policy effectiveness that I want to bring home in this post before going through a couple of packet walks. The point is this. If an interface isn’t congested, your QoS policy dealing with congestion isn’t impacting traffic. Of course, rate limiters & marking policies will be effective whether your interface is congested or not, as the point of them is to throttle a traffic class to a specific transmission rate or mark traffic with a specific value. But think outside of rate limiters (policers, shapers) & marking for a moment. What do congestion management and congestion avoidance techniques do? What they achieve is implied in the name. They deal with situations of congestion.

Let’s think about class-based weighted fair queueing (CBWFQ). CBWFQ is a congestion management QoS tool. If the link is full and packets to be sent have backed up into buffers, CBWFQ is used to determine which packet should be delivered next. In other words, CBWFQ is a tool that manages a congested state. If an interface is not congested, how does CBWFQ fit into the picture? The answer is that it doesn’t. Traffic passes through the interface first in, first out. No buffers fill with queued packets. No dequeueing policy must be enforced by the network device passing the traffic. Does that make you uneasy? It shouldn’t. Why? If traffic is sent as quickly as it was received, then there is no negative impact to the traffic flow. If the traffic is a voice conversation, it will be forwarded out the egress interface as quickly as it came in. I’m assuming there was no ingress interface congestion either, but you get my point. There’s no need to prioritize traffic, because there’s enough bandwidth for all traffic to be sent. There’s no concern about jitter, because the traffic can only be sent as quickly as it was received – the jitter will be as consistent as the sending device makes it.

Not to beat the idea to death, but what about weighted random early congestion (WRED)? WRED is a congestion avoidance technique. WRED comes into play when packets have queued up in a buffer. WRED examines all of the different flows represented by the packets in the queue, and then determines which packets to tail-drop (i.e. drop packets at the end of the line). WRED wants to avoid further congestion, and thus tail-drops packets to encourage the TCP algorithm to slow down. “Hey, I didn’t get an ACK for some of the traffic I sent. I better slow down.” When TCP slows down the transmission rate, then congestion should abate, and long-term interface congestion will be avoided. But again, if there’s no packets queued up in a buffer, what is there for WRED to act on? Nothing at all. As long as the interface is keeping up with the traffic load, the WRED algorithm will never be summoned.

When building and applying QoS policies, this concept of congestion is important to keep in mind. Unless the interface you are applying the policy to is congested, adding a QoS policy isn’t going to change the behavior of transiting traffic. As I mentioned above, the exceptions I can think of to this are rate limiting and marking. Rate limiters will impact traffic flows no matter if an interface is congested or not. If you are shaping or policing traffic, you’re creating artificial congestion that impacts the matching traffic class. In effect, a rate limiting policy say, “Hey, you can only go this fast, even though there’s more physical bandwidth available.” Marking sets traffic flowing through an interface to a specific value. There’s no requisite for congestion for marking to occur.

With this backdrop, let’s do a brief packet walk of traffic, with an emphasis on a real-time voice packet. We’ll do this exercise twice. First, we’ll walk through an uncongested infrastructure. Then, we’ll walk through a partially congested infrastructure. Assume we’re using the AutoQoS policy described earlier in this series for access switches 1 & 2, and the prioritization policy described earlier in this series on WAN routers 1 & 2. For our purposes here, we’ll assume that the core switches trust incoming marks (i.e. will not strip them), but that they are not points of congestion.

We’ll also assume that the IP phones are marking traffic according to the following table:

TRAFFIC                  | IP TOS BYTE    | 802.1Q/802.1p
Real-Time Voice          | DSCP 46 (EF)   | CoS 5
Call Signaling (Control) | DSCP 24 (CS3)  | CoS 3
Video                    | DSCP 34 (AF41) | CoS 4

This is the path the traffic will follow.

  1. PHONE 1
  2. ACCESS SWITCH 1
  3. CORE SWITCH 1
  4. WAN ROUTER 1
  5. PROVIDER CLOUD
  6. WAN ROUTER 2
  7. CORE SWITCH 2
  8. ACCESS SWITCH 2
  9. PHONE 2

Uncongested Packet Walk

Step 1. Phone1 marks traffic before it sends it into the switch. (See table above.)

Step 2. Marked traffic arrives at the switch on the ingress port. This is the port that connects the phone to the switch. There is no congestion between the ingress port and the egress port, so no queueing happens. The ingress port was configured with “mls qos trust cos”, and so the 802.1p value embedded in the 802.1q tag is honored by the switch. The global “mls qos map cos-dscp” command instructs the switch the mark the DSCP value of 24 when CoS is 3, 32 when 4, and 46 when 5, however my understanding is that this command is overridden by the AUTOQOS-SRND4-CISCOPHONE-POLICY attached to the interface. Therefore, the DSCP marks as set by the phone will end up being preserved – DSCP EF (46) for real-time voice and DSCP CS3 (24) for call signaling. With all of this complete, the frame is switched to the egress port and forwarded to Core Switch1.

Step 3. We are assuming for our purposes that Core Switch 1 is uncongested. The real world might be different, but adding a queueing discussion for this hop doesn’t help us much. Therefore, the frame comes in, is switched or routed, and lands on the ingress port of the WAN router. Core Switch 1 has been configured to trust incoming CoS and DSCP values.

Step 4. Traffic arrives at WAN Router 1. The router forwards the packet to the egress port, where the packet is forwarded. While there is an egress QoS policy applied to the interface that uplinks the enterprise the provider cloud, the interface is not congested. Assuming the interface doesn’t have a shaper or policer included in the QoS policy, the traffic is simply forwarded to the provider. Note that while switches will strip marks from traffic that comes in on an untrusted interface, routers will not. The value set in the IP ToS byte (our DSCP mark) will be preserved and forwarded to the carrier.

Step 5. Even though our LAN was not congested, the provider cloud might be. Assuming a QoS agreement is in place with the provider, the provider will honor your marked traffic, placing it into the correct queue for the desired service level based on your marks. In most scenarios, real-time voice packets (as indicated by DSCP EF) will be individually carried across the provider cloud by unicorn-drawn chariots of platinum that carefully swaddle each datagram in a warm rainbow-colored blanket, then storm across the cloud at astounding speeds, delivering your voice to the other side of the cloud in a shower of majestic, jitter-free sparkles.

Step 6. As with step 4, traffic arrives at an uncongested WAN Router 2, and is forwarded to Core Switch 2.

Step 7. Like Core Switch 1, Core Switch 2 has been configured to trust CoS and DSCP values. We’re assuming no core switch congestion, so traffic is forwarded to Access Switch 2. We could assume that DSCP values are mapped to CoS values, but that’s not necessarily critical here, so long as the DSCP value is preserved.

Step 8. Access Switch 2 receives the traffic, trusting the CoS and/or DSCP values that are included. The frame is switched from the ingress port (facing the core switch) to the egress port (facing Phone 2). As there is no congestion as well as no interface-level policy for egress traffic that could impact things, nothing exciting happens; the frame is delivered to the phone.

Step 9. Phone 2 receives the traffic and kicks it up to an application that does something useful with it, like make a noise in someone’s ear.

Congested Packet Walk

In this scenario, we’re not going to re-walk every step. Instead, we’re going to define congestion in two places, and see how traffic forwarding is impacted. The first congestion point will be between steps 4 and 5 (i.e. the WAN uplink is full). The second congestion point will be between steps 8 & 9 (i.e. the uplink between the switch and phone is full, most likely a microburst situation as user-facing access ports are rarely full for any length of time, assuming Gigabit Ethernet).

Step 4. The packet arrives at WAN Router 1, and is forwarded to the egress interface – the WAN uplink that connects the enterprise to the provider cloud. Marks have been preserved to this point; marked traffic has a DSCP value. The WAN uplink is congested, and traffic heading across the provider cloud has begun to queue into buffers. Since congestion is present, the QoS egress policy applied to the interface takes effect. There are four traffic classes observed by this policy. Three are defined as IP precedence values (the first 3 bits of the 6-bit DSCP field). The fourth equates to “everything else”.

  • The first class is defined as IP precedence 5, and will match DSCP EF (46) traffic. This traffic is serviced by a low latency queue. LLQs are dequeued until empty or until the defined bits-per-second is reached. LLQs are therefore given priority treatment. Real-time voice traffic should land in this queue and is therefore assured of a minimal wait time in the buffer before being dequeued. The only exception to this is if traffic landing in this queue exceeds 300Kbps, as IP precedence 5 traffic exceeding this rate will be dropped. The drop behavior prevents other traffic classes in the policy map from being starved for bandwidth.
  • The second and third classes are defined by IP precedence values as well. These will be dequeued using class-based weighted fair queueing. The second class will have a guaranteed minimum of 750Kbps (intended for video traffic). The third class will have a guaranteed minimum of 40Kbps (intended for signaling traffic). The guaranteed minimums are not maximums; if more bandwidth is needed and available, the router will allow the needy traffic class to use it.
  • The fourth class is “class-default”, a reserved keyword in Cisco IOS. Traffic that does not match any of the other classes referenced in a policy-map will fall through to class-default. Traffic landing in this queue will be “fair-queued”, meaning that all conversations will be given equal treatment.

Step 8. Access Switch 2 receives the traffic, trusting the CoS and/or DSCP values that are included. The frame is switched from the ingress port (facing the core switch) to the egress port (facing Phone 2). Yes, it’s possible that traffic would need to land in the ingress queueing system of the ingress port, but I’m not sure that adds much but length to the description, as similar logic applies to the egress queueing. So, we’re going to assume egress queueing. The egress port is congested at this particular millisecond, and traffic is queued. In our example of a 3750X, there are four egress queues that traffic could possibly be put into. Instead of a MQC policy governing which traffic goes into which queue, various global “mls qos srr-queue output” mapping statements make the determination. As the egress interface is configured with a “priority-queue out” statement, traffic mapped to queue 1 will receive priority treatment. For this example, let’s assume DSCP values are the ones that have been preserved.

  • Traffic with a DSCP value of EF (46) will be mapped to output queue 1, the priority queue. Traffic in this queue will be delivered before traffic in other queues. Assuming the traffic in this queue is real-time voice and nothing else, this is entirely appropriate.
  • Traffic with a DSCP value of AF41 (34) will be also be mapped to output queue 2; tail-drop will happen when threshold 1 is exceeded. Egress queue 2, threshold 1 is set at 125% of maximum. (Yes, thresholds can be oversubscribed. The oversubscription will be utilized if there is enough buffer space available.)
  • Traffic with a DSCP value of CS3 (24) will be mapped to output queue 2; tail-drop will happen when threshold 2 is exceeded. Egress queue 2, threshold 2 is also set at 125% of maximum, the same as threshold 1.
  • Unmarked traffic will be mapped to output queue 3.

We have not set any special shaping or sharing weights to the egress queues, so they are treated equally by default. The end result is that traffic in queue 1 will be delivered first. Traffic in queues 2, 3, and 4 will be dequeued equally, the only potential difference being the length of the queues and likelihood of traffic being tail-dropped if a queue fills to a point that a threshold is exceeded.

Search for all parts in the Enterprise QoS series.