Introduction
This lab is the continuation of our previous PBR lab, so if you haven't read through it yet, please read
Lab: Policy-based routing (Part 1) first. In Part 1, we took our OSPF topology and forced our "VoIP" traffic across a serial link, leaving the rest of the traffic to use our "point to point wireless" link. In order to accomplish this, we used a concept called Policy-Based Routing (PBR). This allowed us to cherry-pick traffic and force it across a link other than what our OSPF-driven routing table would have chosen.
In today's lab, we'll take it a few steps further. To review the reasons one may choose to avoid PBR, there are some definite support issues. Since we are manually forcing traffic to choose a different path than what the routing table would normally choose, we are opening ourselves up to minor issues like inefficient routing, but in worst-case scenarios, an increased likelihood of routing loops. Furthermore, when PBR is implemented, the normal diagnostic commands we are all used to (show ip route, for instance) become potentially inaccurate. Remember that PBR is in effect intercepting the routing decision from the routing table, so it can often be tricky to understand what routing is taking place. Due to this, it can also be hard to track where traffic is actually going.
We'll tackle some of these issues as we walk through the following sections:
- Gain better visibility of traffic flow throughout our topology
- Test failure scenarios to see how PBR reacts
- Configure PBR to behave more dynamically
- Re-test any failure scenarios that previously did not recover
Please reference the Part 1 lab mentioned at the top of this post for configurations up to this point. With any luck, by the time we are done with part 2, we'll have a better understanding of how we can make our PBR topology a bit more resilient and transparent.
Gaining visibility
In the previous lab, we used commands like 'show ip policy' and 'show route-map' to see what traffic would be policy-routed. However, if the policy routing is applied to multiple interfaces (as is the case with Wan2 router), the packet counts on the 'show route-map' output can quickly become useless.
One way to gain better visibility is to take advantage of a feature set IOS already has for efficient identification of traffic: Quality of Service (QoS) configuration. While this lab will not dive into QoS, we will take advantage of the QoS command framework in IOS to at least give us some better visibility into the traffic on our network. In a real-world scenario, we would likely want to use QoS anyway to help protect important traffic, so the visibility would be in place anyway.
We'll add the following configuration to all interfaces Wan1, Wan2, and Branch1 to best understand where traffic is entering and exiting these routers. Since these are the three routers where we have PBR implemented, it makes the most sense to try to better view this traffic. The example below shows adding the QoS class-map and policy-map to the router's global config, and then applying it to one particular interface. Remember, we'll be applying this interface-level configuration to each non-loopback ip-addressed interface on Wan1, Wan2, and Branch1. Also, there is an additional access-list being added, and it is representative of the opposite direction VoIP traffic than what was defined for PBR. The example below would be for the Wan1 or Wan2 side.
!
ip access-list extended VoIP-Incoming
10 permit udp 172.16.10.0 0.0.0.255 any range 16384 32768
!
class-map match-any PBR-VoIP
match access-group name PBR-VoIP-to-T1-ACL
match access-group name VoIP-Incoming
!
policy-map PBR-Counters
class PBR-VoIP
class class-default
!
interface FastEthernet2/0
service-policy input PBR-Counters
service-policy output PBR-Counters
! |
Now that we have this configuration in place, we can go ahead and test the generation of "VoIP" traffic from Core1, and see where it goes. Remember that for sake of simplicity in the lab environment, we tricked IOS syslog into sending its logs to the TestPC (172.16.10.10) off Branch1, and sending the UDP syslogs on a nonstandard port that fit into the VoIP RTP range. Below, witness the "VoIP" traffic being generated by Core1 and received by the Test PC:
! ------ Initiate Traffic from Core1 ------
!
Core1#send log Testing PBR with all links up
Core1#
*Jun 18 13:30:19.047: %SYS-2-LOGMSG: Message from 0(): Testing PBR with all links up
Core1#
!
!
! ------ See Traffic at TestPC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
connect to [172.16.10.10] from (UNKNOWN) [10.0.11.4] 57473
<186>25: *Jun 18 13:30:19.047: %SYS-2-LOGMSG: Message from 0(): Testing PBR with all links up |
Now that we have seen that the traffic successfully routes (no surprise) - it's time to confirm that the path is as expected. Note that nothing has really changed since the end of Part 1, so all we are accomplishing is validating the path with a different set of commands. The following output will show the successive "show policy-map interface" command issued on each router. Note that the output is being filtered as the IOS command prompt to make it a bit more legible, and I've also cut a bit of the output for completely irrelevant interfaces. The output is a bit long, so pay attention to the color highlighting to help pick out the important pieces.
! ------ Check QoS Stats on Wan1 ------
!
Wan1#show policy-map interface | inc /|Service|Class|,
GigabitEthernet1/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 135 bytes
1 packets, 135 bytes
Class-map: class-default (match-any)
491 packets, 46354 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
1021 packets, 101657 bytes
5 minute offered rate 0 bps, drop rate 0 bps
FastEthernet2/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
478 packets, 45100 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
1004 packets, 96847 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
495 packets, 46794 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 135 bytes
1 packets, 135 bytes
Class-map: class-default (match-any)
569 packets, 73868 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Wan1#
!
!
! ------ Check QoS Stats on Wan2 ------
!
Wan2#show policy-map interface | inc /|Service|Class|,
GigabitEthernet1/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
512 packets, 48548 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
1068 packets, 94670 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Serial2/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
498 packets, 42084 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 125 bytes
1 packets, 125 bytes
Class-map: class-default (match-any)
1043 packets, 74844 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 135 bytes
1 packets, 135 bytes
Class-map: class-default (match-any)
516 packets, 48976 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
596 packets, 70163 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Wan2#
!
!
! ------ Check QoS Stats on Branch1 ------
!
Branch1#show policy-map interface | inc /|Service|Class|,
FastEthernet1/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
811 packets, 76518 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
1702 packets, 148025 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Serial2/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 125 bytes
0 packets, 0 bytes
1 packets, 125 bytes
Class-map: class-default (match-any)
811 packets, 68473 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
1686 packets, 115744 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
1 packets, 43 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 135 bytes
0 packets, 0 bytes
1 packets, 135 bytes
Class-map: class-default (match-any)
1692 packets, 145181 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Branch1# |
By following the blue-colored output, it is apparent which paths were used and which were not used for the "VoIP" traffic. As expected, the path matched the last section of Part 1. It followed OSPF from Core1 to Wan1, at which point it matched our PBR policy and instead of following OSPF directly to Branch1, it routed across to Wan2. Again, due to PBR on inbound at Wan2, the packet ignored the routing table and routed across the T1 to Branch1.
Failure Scenario: T1 Link Down
Now that we've tested packet traversal while all links are up and working fine, we need to test some failure scenarios to see if our traffic still routes. To create our first and most obvious failure, we will 'shut' the T1 interface from the Branch1 side, simulating loss of link on the T1 line. This is probably the best type of failure to happen, as both Wan2 and Branch1 lose the link from a physical standpoint. There are plenty of topologies where one side could fail and the other stays up. An example of this would be two routers connected via a switch. However, in this case, we luck out because a link failure will be seen immediately on both ends.
As you will soon find out, PBR during failures can be, at its best, a tricky situation to follow. Because of this, I'll walk through this one a bit more step-by-step. First, we will break the T1 connection at the "Branch1" side.
With this link down, we will next generate our test packet and then observe the results. Following the same process, we will start a listener on the Test PC at IP 172.16.10.10 hanging off the Branch1 router. Then, we will generate a syslog message from Core1 to act as our VoIP packet.
! ------ Start listener on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
!
!
! ------ Generate Syslog on Core1 ------
!
Core1#send log Testing PBR with T1 in 'down' state
Core1#
*Jun 30 09:43:20.199: %SYS-2-LOGMSG: Message from 0(): Testing PBR with T1 in 'down' state
Core1#
!
!
! ------ Observe Syslog on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
connect to [172.16.10.10] from (UNKNOWN) [10.0.11.4] 51054
<186>22: *Jun 30 09:43:20.199: %SYS-2-LOGMSG: Message from 0(): Testing PBR with T1 in 'down' state |
Without any further analysis, we already know the most important part -- the traffic made it to its destination! This means that the policy routing is not black-holing traffic with this failure scenario. We should take a few minutes to observe the traffic path and understand why. Core1 will send its traffic to Wan1, as this is the OSPF best route, and there is no PBR applied locally on that router. So, we will start our tracing on Wan1. Traffic is received on Wan1's Gi1/0 interface, which has PBR inbound policy set. That policy dictates that traffic will be sent to Wan2 via the G3/0.1 interface. Now, we know this is probably a bad idea since the T1 interface is down on Wan2. As Wan1 has no way of knowing this, PBR will continue as expected. Observe the counters on Wan1 below:
Wan1#show policy-map interface | inc /|Service|Class|,
GigabitEthernet1/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 141 bytes
1 packets, 141 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
327 packets, 31398 bytes
5 minute offered rate 0 bps, drop rate 0 bps
...
GigabitEthernet3/0.1
...
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 141 bytes
1 packets, 141 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
385 packets, 47634 bytes
5 minute offered rate 0 bps, drop rate 0 bps
... |
Notice that the VoIP packet came in on Gi1/0 (from Core1) as expected, and the policy-routing forces the packet out G3/0.1 towards Wan2. This is observed in the output direction in the above output. So now, we have observed PBR still working 'as expected' and sending the packet towards Wan2, even though Wan2 no longer has the T1 link up.
It will be interesting to see what Wan2 does with the packet next. Keep in mind that Wan2 has a policy-map on its inbound interface from Wan1, stating that VoIP traffic coming in from Wan1 will be policy-routed out the T1. It is important to understand, though, that PBR will only execute if the next-hop interface is up/up. If this criteria is not met, PBR will instead step away and the packet will route according to the IP routing table on the router. This is critical to understand, as the T1 "next hop" is currently down, we should expect PBR to have no effect on Wan2's routing decision. Therefore, we should see the packet follow OSPF's selected route right back to Wan1:
Wan2#show policy-map interface | inc /|Service|Class|,
...
Serial2/0
...
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
650 packets, 46763 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 141 bytes
1 packets, 141 bytes
Class-map: class-default (match-any)
340 packets, 32396 bytes
5 minute offered rate 0 bps, drop rate 0 bps
...
GigabitEthernet3/0.5
...
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 141 bytes
1 packets, 141 bytes
Class-map: class-default (match-any)
337 packets, 25718 bytes
5 minute offered rate 0 bps, drop rate 0 bps
... |
Pay careful attention to the input/output directions of the above output. Notice that, as expected from seeing Wan1's output, Wan2 shows an ingress VoIP packet on Gi3/0.1. Since Serial2/0 is in a down state, our theory proves correct that no VoIP packet is seen egressing that interface. However, we can observe that the VoIP packet leaves Wan2, destined back to Wan1 on interface Gi3/0.5.
That is fairly interesting to observe; the packet came in on the area 0 interface, but left on the area 5 interface. Why? On the way from Wan1 to Wan2, the area 0 interface was hard-set by the policy routing. On the way back from Wan2 to Wan1, since PBR was no longer valid, Wan2 followed its IP routing table. Remember that if OSPF has a choice between going inter-area or staying intra-area to reach a destination, it will always choose intra-area. Therefore, since Wan2 and Wan1 both shared an interface in area 5 (the destination area) this is the link that is chosen.
Following our packet, Wan2's IP routing table has led us back to Wan1. Let's check stats to see if we can see where it came in and left again. For those following along and thinking, "Hey, wouldn't I have seen this the first time we checked Wan1?" The answer is yes; this is one of the reasons I chose to omit portions of the output. Otherwise it can be misleading and confusing. Like I said, policy routing can easily lead to confusion! We'll pick up that same show command in its entirety this time, and I'll selectively highlight the portions related to this last leg of the packet's journey to Branch1. Fair warning, there's a lot of output below.
Wan1#show policy-map interface | inc /|Service|Class|,
GigabitEthernet1/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 141 bytes
1 packets, 141 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
327 packets, 31398 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
685 packets, 65100 bytes
5 minute offered rate 0 bps, drop rate 0 bps
FastEthernet2/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
325 packets, 30978 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 141 bytes
1 packets, 141 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
679 packets, 64857 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
328 packets, 31360 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 141 bytes
1 packets, 141 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
385 packets, 47634 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.5
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 145 bytes
1 packets, 145 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
325 packets, 30882 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
329 packets, 26834 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Wan1# |
Through all of that, notice the lines colored in blue. We see the same VoIP packet come back to Wan1 for the second time, this time from Wan2. Because there is no policy-map for inbound traffic on Gi3/0.5, Wan1 now forwards this packet via its IP routing table, choosing Fa2/0 for its connection to Branch1. As a memory refresher, this is the point-to-point transparent wireless bridge in our lab scenario. While there was definitely sub-optimal routing during this failure scenario, the packet still arrives where it's supposed to. So, with this basic type of failure, our policy-based routing seems to have survived.
Failure Scenario: Semi-failed Interface
This type of failure scenario could be caused by several issues, so note that our simulation efforts are only to mimic the typical symptoms. The generic issue we are tackling in this failure scenario is when a router still shows a link as up/up, but the router on the other end is in a 'down' state of some sort. One of the common examples of this is when two routers form a layer-3 relationship across a layer-2 switch. This means that the switch could drop its link to router A, but leave router B's link up. Therefore, router B still believes it has an interface up to send to a presumed-listening router A.
These partial failures are no issue for dynamic routing protocols to handle; if a neighbor becomes unresponsive for any reason, the neighborship is torn down after a time period and other routes are used. When dealing with static routes and policy-based routing, there is no dynamic protocol to manage a neighbor relationship. This means that we are susceptible to these types of failures. To demonstrate, I'll put a deny any any access-list ingress on Branch1's serial interface. This will keep the interface up, but traffic will not be able to pass from Wan2 to Branch1. Note that in order to more effectively troubleshoot these types of issues, we'll want to make the following configuration change to each router:
router ospf 1
log-adjacency-changes detail |
This change will allow the routers to log all OSPF state changes. Since the ACL is messing with one-way communication, we will see a different impact than the typical "dead timer expired" behavior. When the ACL is applied inbound on Branch1's serial interface, Wan2's OSPF hellos stop reaching Branch1. However, Branch1's hellos are still received by Wan2. When Branch1's dead timer expires for Wan2, it removed Wan2 from its adjacency table. Now, when Branch1 advertises its hello towards Wan2, Wan2 sees that it is no longer in Branch1's neighbor list and moves the neighbor state to Init. This is seen below:
! ------ Create and Apply ACL on Branch1 ------
!
Branch1#config t
Enter configuration commands, one per line. End with CNTL/Z.
Branch1(config)#ip access-list extended DenyAll
Branch1(config-ext-nacl)#10 deny ip any any
Branch1(config-ext-nacl)#exit
Branch1(config)#int s2/0
Branch1(config-if)#ip access-group DenyAll in
Branch1(config-if)#end
Branch1#
*Jun 30 19:38:49.073: %SYS-5-CONFIG_I: Configured from console by console
Branch1#
!
!
! ------ After dead timer expires... ------
!
Branch1#
*Jun 30 19:39:15.741: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.2 on Serial2/0 from FULL to DOWN, Neighbor Down: Dead timer expired
Branch1#
!
!
! ------ Wan2 moves Branch1 to Init ------
!
Wan2#
*Jun 30 19:39:15.561: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from FULL to INIT, 1-Way
Wan2#
|
Now that we're deep in our semi-failed state, it is time to revisit PBR to see how it is handling this situation. We'll redo our test VoIP packet and see where it ends up:
! ------ Start listener on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
!
!
! ------ Generate Syslog on Core1 ------
!
Core1#send log Testing PBR with T1 in 'semi-failed' state
Core1#
*Jun 30 20:50:22.665: %SYS-2-LOGMSG: Message from 0(): Testing PBR with T1 in 'semi-failed' state
Core1#
!
!
! ------ Observe Syslog on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ... |
As we can observe, there is likely a routing problem here, as evidenced by the fact that the packet never made it to its destination. Knowing the behavior from the previous scenario, we can assume that the VoIP packet will be policy-routed towards Wan2 on the Gi3/0.1 interface. Let's start there in our troubleshooting.
Wan2#show policy-map interface | inc /|Service|Class|,
...
Serial2/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
139 packets, 11120 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 138 bytes
1 packets, 138 bytes
Class-map: class-default (match-any)
429 packets, 31256 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 148 bytes
1 packets, 148 bytes
Class-map: class-default (match-any)
143 packets, 13614 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
164 packets, 19002 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.5
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
143 packets, 13502 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
141 packets, 10651 bytes
5 minute offered rate 0 bps, drop rate 0 bps |
Again, note the color-highlighted sections. We can observe that 1 VoIP packet came ingress to Wan2 from Wan1's Gi3/0.1 interface (as expected due to Wan1's policy routing). It is also apparent that the packet left out the serial interface, as opposed to hairpinning back to Wan1 as in the previous link failure scenario. Because it left Wan2 out the T1, the packet obviously met its fate in the form of the DenyAll ACL applied at Branch1. This explains why the packet never made it to the Test PC.
Again, I want to stress that while we are mimicking this behavior by using an ingress ACL on Branch1, this is not the particular scenario we are really worried about. But it does go to prove the point that some partial failure in communication can degrade the ability for two routers to talk, even though a physical interface stays up/up. This example shows a setback in using policy-based routing for a decision point.
Make PBR More Dynamic
In order to make PBR a bit more dynamic, we are going to take a bit difference approach than what is normally used for this. If you look for common ways to make static routing or policy-based routing more dynamic, there are plenty of scenarios that show how to set up IP SLA tracking. This works fine and can definitely do the job. Another method that can solve this problem is Bidirectional Forwarding Detection (BFD), which I may cover at a higher level in a different post. In this post, however, I'd like to introduce another method.
We already have a pretty high confidence level of the neighbor's health because we're running OSPF. We rely on this neighbor state and dynamic route calculation for most all of our routing, so why not piggy-back on it for our PBR? We can take advantage of EEM scripting to allow us to do this.
! ------ Script for when Neighbor drops ------
!
event manager applet PBR-Down
event syslog pattern ".*%OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from FULL to.*"
action 1.0 cli command "enable"
action 10.0 syslog msg "PBR Removed from Interfaces due to OSPF Nei State S2/0"
action 2.0 cli command "config t"
action 3.0 cli command "interface Gig1/0"
action 4.0 cli command "no ip policy route-map PBR-VoIP-to-T1"
action 5.0 cli command "interface G3/0.1"
action 6.0 cli command "no ip policy route-map PBR-VoIP-to-T1"
action 7.0 cli command "interface G3/0.5"
action 8.0 cli command "no ip policy route-map PBR-VoIP-to-T1"
action 9.0 cli command "end"
!
!
! ------ Script for when Neighbor returns ------
!
event manager applet PBR-Up
event syslog pattern ".*%OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from LOADING to FULL, Loading Done$"
action 1.0 cli command "enable"
action 10.0 syslog msg "PBR Enabled on Interfaces due to OSPF Nei State Full S2/0"
action 2.0 cli command "config t"
action 3.0 cli command "interface Gig1/0"
action 4.0 cli command "ip policy route-map PBR-VoIP-to-T1"
action 5.0 cli command "interface Gig3/0.1"
action 6.0 cli command "ip policy route-map PBR-VoIP-to-T1"
action 7.0 cli command "interface Gig3/0.5"
action 8.0 cli command "ip policy route-map PBR-VoIP-to-T1"
action 9.0 cli command "end"
! |
By configuring the above EEM scripts on Wan2, we should now be relying on OSPF neighbor state to control the enforcement of our PBR. If this works, of course we would want to implement a similar configuration on Branch1 to avoid the reverse situation. In any case, it seems as though we are ready to test.
Retrying Failures with Dynamic PBR
Since the partial failure scenario blew a giant hole in our PBR, we should test it again now that we have tried to make PBR a bit smarter. Let's start by implementing the DenyAll ACL in Branch1's T1 interface, in the inbound direction. Note the difference in behavior on Wan2 this time:
! ------ "Break" the T1 at Branch1 ------
!
Branch1#config t
Enter configuration commands, one per line. End with CNTL/Z.
Branch1(config)#int s2/0
Branch1(config-if)#ip access-group DenyAll in
Branch1(config-if)#end
Branch1#
*Jun 30 22:23:55.872: %SYS-5-CONFIG_I: Configured from console by console
Branch1#
*Jun 30 22:24:25.772: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.2 on Serial2/0 from FULL to DOWN, Neighbor Down: Dead timer expired
Branch1#
!
!
! ------ Observe our EEM script at Wan2 ------
!
Wan2#
*Jun 30 22:24:27.800: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from FULL to INIT, 1-Way
*Jun 30 22:24:27.876: %HA_EM-6-LOG: PBR-Down: PBR Removed from Interfaces due to OSPF Nei State S2/0
Wan2#
*Jun 30 22:24:28.052: %SYS-5-CONFIG_I: Configured from console by on vty0 (EEM:PBR-Down)
Wan2# |
Now that we've successfully broken the T1 again, let's set up our Test PC to listen and generate a "VoIP" test packet from Core1.
! ------ Start listener on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
!
!
! ------ Generate Syslog on Core1 ------
!
Core1#send log Testing 'Smart PBR' with T1 in 'semi-failed' state
Core1#
*Jun 30 22:29:44.792: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 in 'semi-failed' state
Core1#
!
!
! ------ Observe Syslog on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
connect to [172.16.10.10] from (UNKNOWN) [10.0.11.4] 51054
<186>24: *Jun 30 22:29:44.792: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 in 'semi-failed' state |
Alright, that's a good sign! By using EEM to automatically remove the PBR policy when the "target" router loses OSPF neighborship, we have successfully forced Wan2 to hairpin the VoIP traffic right back to Wan1 (just like in the link-failure scenario). Again, the routing is sub-optimal, but it will get the job done. See the output below for confirmation.
Wan2#show policy-map interface | inc /|Service|Class|,
...
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 156 bytes
1 packets, 156 bytes
Class-map: class-default (match-any)
95 packets, 8930 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
109 packets, 12677 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.5
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
97 packets, 9154 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 160 bytes
1 packets, 160 bytes
Class-map: class-default (match-any)
96 packets, 7688 bytes
5 minute offered rate 0 bps, drop rate 0 bps |
The last thing to check with this before calling it a day is to make sure that once we restore connectivity, the PBR kicks back in and Wan2 sends the packet out its serial interface. In the output below, we'll remove the ACL on Branch1, and then watch EEM restore our PBR on Wan2:
! ------ Fix the T1 ------
!
Branch1#config t
Enter configuration commands, one per line. End with CNTL/Z.
Branch1(config)#int s2/0
Branch1(config-if)#no ip access-group DenyAll in
Branch1(config-if)#end
Branch1#
*Jun 30 22:57:26.344: %SYS-5-CONFIG_I: Configured from console by console
Branch1#
*Jun 30 22:57:29.108: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.2 on Serial2/0 from LOADING to FULL, Loading Done
Branch1#
!
!
! ------ Watch EEM turn on PBR ------
!
Wan2#
*Jun 30 22:57:28.876: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from INIT to 2WAY, 2-Way Received
*Jun 30 22:57:28.876: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from 2WAY to EXSTART, AdjOK?
*Jun 30 22:57:28.880: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from EXSTART to EXCHANGE, Negotiation Done
*Jun 30 22:57:28.896: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from EXCHANGE to LOADING, Exchange Done
*Jun 30 22:57:28.896: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from LOADING to FULL, Loading Done
*Jun 30 22:57:28.956: %HA_EM-6-LOG: PBR-Up: PBR Enabled on Interfaces due to OSPF Nei State Full S2/0
Wan2#
*Jun 30 22:57:29.144: %SYS-5-CONFIG_I: Configured from console by on vty0 (EEM:PBR-Up)
Wan2#
Wan2#clear counters
Clear "show interface" counters on all interfaces [confirm]
Wan2#
*Jun 30 22:58:38.944: %CLEAR-5-COUNTERS: Clear counter on all interfaces by console
Wan2# |
Finally, we can generate our "VoIP" packet once again from Core1 and ensure it not only still arrives to the Test PC in Branch1, but arrives via the T1 link. The output below will walk us through it.
! ------ Initiate Traffic from Core1 ------
!
Core1#send log Testing 'Smart PBR' with T1 in 'semi-failed' state
Core1#
*Jun 30 22:29:44.792: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 in 'semi-failed' state
Core1#send log Testing 'Smart PBR' with T1 after 'semi-failed' state
Core1#
*Jun 30 23:01:00.364: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 after 'semi-failed' state
Core1#
!
!
! ------ Observe Syslog on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
connect to [172.16.10.10] from (UNKNOWN) [10.0.11.4] 51054
<186>25: *Jun 30 23:01:00.364: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 after 'semi-failed' state
!
!
! ------ View Wan2 Policy-map Counts ------
!
Wan2#show policy-map interface | inc /|Service|Class|,
...
Serial2/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
143 packets, 12012 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 149 bytes
1 packets, 149 bytes
Class-map: class-default (match-any)
300 packets, 20166 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 159 bytes
1 packets, 159 bytes
Class-map: class-default (match-any)
143 packets, 13442 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
167 packets, 19082 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Wan2# |
As seen above, when the OSPF neighborship came back, PBR came back on. This allowed our VoIP traffic to continue along the T1 as we had wanted.
Conclusion
This has been a long, drawn-out lab, so thanks for sticking with it. I'll take a moment to repeat my sentiments on policy-based routing: it's ugly, hard to troubleshoot, and can get out of hand quick. However, it can be a necessity in certain scenarios. It should also be noted that the above lab does not solve every issue; especially as it essentially ignored the Branch1 PBR configuration. Also, there are other potential routing loops we did not address. Regardless, this lab should serve as a decent primer as for why and how to implement policy-based routing on Cisco routers.