Sunday, June 30, 2013

Lab: Policy-based routing (Part 2)

Introduction

This lab is the continuation of our previous PBR lab, so if you haven't read through it yet, please read Lab: Policy-based routing (Part 1) first. In Part 1, we took our OSPF topology and forced our "VoIP" traffic across a serial link, leaving the rest of the traffic to use our "point to point wireless" link. In order to accomplish this, we used a concept called Policy-Based Routing (PBR). This allowed us to cherry-pick traffic and force it across a link other than what our OSPF-driven routing table would have chosen.

In today's lab, we'll take it a few steps further. To review the reasons one may choose to avoid PBR, there are some definite support issues. Since we are manually forcing traffic to choose a different path than what the routing table would normally choose, we are opening ourselves up to minor issues like inefficient routing, but in worst-case scenarios, an increased likelihood of routing loops. Furthermore, when PBR is implemented, the normal diagnostic commands we are all used to (show ip route, for instance) become potentially inaccurate. Remember that PBR is in effect intercepting the routing decision from the routing table, so it can often be tricky to understand what routing is taking place. Due to this, it can also be hard to track where traffic is actually going.

We'll tackle some of these issues as we walk through the following sections:
  • Gain better visibility of traffic flow throughout our topology
  • Test failure scenarios to see how PBR reacts
  • Configure PBR to behave more dynamically
  • Re-test any failure scenarios that previously did not recover
Please reference the Part 1 lab mentioned at the top of this post for configurations up to this point. With any luck, by the time we are done with part 2, we'll have a better understanding of how we can make our PBR topology a bit more resilient and transparent.

Gaining visibility

In the previous lab, we used commands like 'show ip policy' and 'show route-map' to see what traffic would be policy-routed. However, if the policy routing is applied to multiple interfaces (as is the case with Wan2 router), the packet counts on the 'show route-map' output can quickly become useless.


One way to gain better visibility is to take advantage of a feature set IOS already has for efficient identification of traffic: Quality of Service (QoS) configuration. While this lab will not dive into QoS, we will take advantage of the QoS command framework in IOS to at least give us some better visibility into the traffic on our network. In a real-world scenario, we would likely want to use QoS anyway to help protect important traffic, so the visibility would be in place anyway.

We'll add the following configuration to all interfaces Wan1, Wan2, and Branch1 to best understand where traffic is entering and exiting these routers. Since these are the three routers where we have PBR implemented, it makes the most sense to try to better view this traffic. The example below shows adding the QoS class-map and policy-map to the router's global config, and then applying it to one particular interface. Remember, we'll be applying this interface-level configuration to each non-loopback ip-addressed interface on Wan1, Wan2, and Branch1. Also, there is an additional access-list being added, and it is representative of the opposite direction VoIP traffic than what was defined for PBR. The example below would be for the Wan1 or Wan2 side.

!
ip access-list extended VoIP-Incoming
 10 permit udp 172.16.10.0 0.0.0.255 any range 16384 32768
!
class-map match-any PBR-VoIP
 match access-group name PBR-VoIP-to-T1-ACL
 match access-group name VoIP-Incoming
!
policy-map PBR-Counters
 class PBR-VoIP
 class class-default
!
interface FastEthernet2/0
 service-policy input PBR-Counters
 service-policy output PBR-Counters
!


Now that we have this configuration in place, we can go ahead and test the generation of "VoIP" traffic from Core1, and see where it goes. Remember that for sake of simplicity in the lab environment, we tricked IOS syslog into sending its logs to the TestPC (172.16.10.10) off Branch1, and sending the UDP syslogs on a nonstandard port that fit into the VoIP RTP range. Below, witness the "VoIP" traffic being generated by Core1 and received by the Test PC:

! ------ Initiate Traffic from Core1 ------
!
Core1#send log Testing PBR with all links up
Core1#
*Jun 18 13:30:19.047: %SYS-2-LOGMSG: Message from 0(): Testing PBR with all links up
Core1#
!
!
! ------ See Traffic at TestPC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
connect to [172.16.10.10] from (UNKNOWN) [10.0.11.4] 57473
<186>25: *Jun 18 13:30:19.047: %SYS-2-LOGMSG: Message from 0(): Testing PBR with all links up


Now that we have seen that the traffic successfully routes (no surprise) - it's time to confirm that the path is as expected. Note that nothing has really changed since the end of Part 1, so all we are accomplishing is validating the path with a different set of commands. The following output will show the successive "show policy-map interface" command issued on each router. Note that the output is being filtered as the IOS command prompt to make it a bit more legible, and I've also cut a bit of the output for completely irrelevant interfaces. The output is a bit long, so pay attention to the color highlighting to help pick out the important pieces.

! ------ Check QoS Stats on Wan1 ------
!
Wan1#show policy-map interface | inc /|Service|Class|,
 GigabitEthernet1/0 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 135 bytes
        1 packets, 135 bytes
    Class-map: class-default (match-any)
      491 packets, 46354 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      1021 packets, 101657 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 FastEthernet2/0 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      478 packets, 45100 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      1004 packets, 96847 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 GigabitEthernet3/0.1 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      495 packets, 46794 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 135 bytes
        1 packets, 135 bytes
    Class-map: class-default (match-any)
      569 packets, 73868 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
Wan1#
!
!
! ------ Check QoS Stats on Wan2 ------
!
Wan2#show policy-map interface | inc /|Service|Class|,
 GigabitEthernet1/0 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      512 packets, 48548 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      1068 packets, 94670 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 Serial2/0 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      498 packets, 42084 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 125 bytes
        1 packets, 125 bytes
    Class-map: class-default (match-any)
      1043 packets, 74844 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 GigabitEthernet3/0.1 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 135 bytes
        1 packets, 135 bytes
    Class-map: class-default (match-any)
      516 packets, 48976 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      596 packets, 70163 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
Wan2#
!
!
! ------ Check QoS Stats on Branch1 ------
!
Branch1#show policy-map interface | inc /|Service|Class|,
 FastEthernet1/0 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      811 packets, 76518 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      1702 packets, 148025 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 Serial2/0 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 125 bytes
        0 packets, 0 bytes
        1 packets, 125 bytes
    Class-map: class-default (match-any)
      811 packets, 68473 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      1686 packets, 115744 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 GigabitEthernet3/0 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      1 packets, 43 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 135 bytes
        0 packets, 0 bytes
        1 packets, 135 bytes
    Class-map: class-default (match-any)
      1692 packets, 145181 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
Branch1#


By following the blue-colored output, it is apparent which paths were used and which were not used for the "VoIP" traffic. As expected, the path matched the last section of Part 1. It followed OSPF from Core1 to Wan1, at which point it matched our PBR policy and instead of following OSPF directly to Branch1, it routed across to Wan2. Again, due to PBR on inbound at Wan2, the packet ignored the routing table and routed across the T1 to Branch1.

Failure Scenario: T1 Link Down

Now that we've tested packet traversal while all links are up and working fine, we need to test some failure scenarios to see if our traffic still routes. To create our first and most obvious failure, we will 'shut' the T1 interface from the Branch1 side, simulating loss of link on the T1 line. This is probably the best type of failure to happen, as both Wan2 and Branch1 lose the link from a physical standpoint. There are plenty of topologies where one side could fail and the other stays up. An example of this would be two routers connected via a switch. However, in this case, we luck out because a link failure will be seen immediately on both ends.

As you will soon find out, PBR during failures can be, at its best, a tricky situation to follow. Because of this, I'll walk through this one a bit more step-by-step. First, we will break the T1 connection at the "Branch1" side.

With this link down, we will next generate our test packet and then observe the results. Following the same process, we will start a listener on the Test PC at IP 172.16.10.10 hanging off the Branch1 router. Then, we will generate a syslog message from Core1 to act as our VoIP packet.

! ------ Start listener on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
!
!
! ------ Generate Syslog on Core1 ------
!
Core1#send log Testing PBR with T1 in 'down' state
Core1#
*Jun 30 09:43:20.199: %SYS-2-LOGMSG: Message from 0(): Testing PBR with T1 in 'down' state
Core1#
!
!
! ------ Observe Syslog on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
connect to [172.16.10.10] from (UNKNOWN) [10.0.11.4] 51054
<186>22: *Jun 30 09:43:20.199: %SYS-2-LOGMSG: Message from 0(): Testing PBR with T1 in 'down' state


Without any further analysis, we already know the most important part -- the traffic made it to its destination! This means that the policy routing is not black-holing traffic with this failure scenario. We should take a few minutes to observe the traffic path and understand why. Core1 will send its traffic to Wan1, as this is the OSPF best route, and there is no PBR applied locally on that router. So, we will start our tracing on Wan1. Traffic is received on Wan1's Gi1/0 interface, which has PBR inbound policy set. That policy dictates that traffic will be sent to Wan2 via the G3/0.1 interface. Now, we know this is probably a bad idea since the T1 interface is down on Wan2. As Wan1 has no way of knowing this, PBR will continue as expected. Observe the counters on Wan1 below:

Wan1#show policy-map interface | inc /|Service|Class|,
 GigabitEthernet1/0 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 141 bytes
        1 packets, 141 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      327 packets, 31398 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
...
 GigabitEthernet3/0.1 
...
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 141 bytes
        1 packets, 141 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      385 packets, 47634 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
...


Notice that the VoIP packet came in on Gi1/0 (from Core1) as expected, and the policy-routing forces the packet out G3/0.1 towards Wan2. This is observed in the output direction in the above output. So now, we have observed PBR still working 'as expected' and sending the packet towards Wan2, even though Wan2 no longer has the T1 link up.

It will be interesting to see what Wan2 does with the packet next. Keep in mind that Wan2 has a policy-map on its inbound interface from Wan1, stating that VoIP traffic coming in from Wan1 will be policy-routed out the T1. It is important to understand, though, that PBR will only execute if the next-hop interface is up/up. If this criteria is not met, PBR will instead step away and the packet will route according to the IP routing table on the router. This is critical to understand, as the T1 "next hop" is currently down, we should expect PBR to have no effect on Wan2's routing decision. Therefore, we should see the packet follow OSPF's selected route right back to Wan1:

Wan2#show policy-map interface | inc /|Service|Class|,
...
 Serial2/0 
...
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      650 packets, 46763 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 GigabitEthernet3/0.1 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 141 bytes
        1 packets, 141 bytes
    Class-map: class-default (match-any)
      340 packets, 32396 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
...
 GigabitEthernet3/0.5 
...
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 141 bytes
        1 packets, 141 bytes
    Class-map: class-default (match-any)
      337 packets, 25718 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
...


Pay careful attention to the input/output directions of the above output. Notice that, as expected from seeing Wan1's output, Wan2 shows an ingress VoIP packet on Gi3/0.1. Since Serial2/0 is in a down state, our theory proves correct that no VoIP packet is seen egressing that interface. However, we can observe that the VoIP packet leaves Wan2, destined back to Wan1 on interface Gi3/0.5.

That is fairly interesting to observe; the packet came in on the area 0 interface, but left on the area 5 interface. Why? On the way from Wan1 to Wan2, the area 0 interface was hard-set by the policy routing. On the way back from Wan2 to Wan1, since PBR was no longer valid, Wan2 followed its IP routing table. Remember that if OSPF has a choice between going inter-area or staying intra-area to reach a destination, it will always choose intra-area. Therefore, since Wan2 and Wan1 both shared an interface in area 5 (the destination area) this is the link that is chosen.

Following our packet, Wan2's IP routing table has led us back to Wan1. Let's check stats to see if we can see where it came in and left again. For those following along and thinking, "Hey, wouldn't I have seen this the first time we checked Wan1?" The answer is yes; this is one of the reasons I chose to omit portions of the output. Otherwise it can be misleading and confusing. Like I said, policy routing can easily lead to confusion! We'll pick up that same show command in its entirety this time, and I'll selectively highlight the portions related to this last leg of the packet's journey to Branch1. Fair warning, there's a lot of output below.

Wan1#show policy-map interface | inc /|Service|Class|,
 GigabitEthernet1/0 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 141 bytes
        1 packets, 141 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      327 packets, 31398 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      685 packets, 65100 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 FastEthernet2/0 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      325 packets, 30978 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 141 bytes
        1 packets, 141 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      679 packets, 64857 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 GigabitEthernet3/0.1 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      328 packets, 31360 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 141 bytes
        1 packets, 141 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      385 packets, 47634 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 GigabitEthernet3/0.5 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 145 bytes
        1 packets, 145 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      325 packets, 30882 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      329 packets, 26834 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
Wan1#


Through all of that, notice the lines colored in blue. We see the same VoIP packet come back to Wan1 for the second time, this time from Wan2. Because there is no policy-map for inbound traffic on Gi3/0.5, Wan1 now forwards this packet via its IP routing table, choosing Fa2/0 for its connection to Branch1. As a memory refresher, this is the point-to-point transparent wireless bridge in our lab scenario. While there was definitely sub-optimal routing during this failure scenario, the packet still arrives where it's supposed to. So, with this basic type of failure, our policy-based routing seems to have survived.

Failure Scenario: Semi-failed Interface

This type of failure scenario could be caused by several issues, so note that our simulation efforts are only to mimic the typical symptoms. The generic issue we are tackling in this failure scenario is when a router still shows a link as up/up, but the router on the other end is in a 'down' state of some sort. One of the common examples of this is when two routers form a layer-3 relationship across a layer-2 switch. This means that the switch could drop its link to router A, but leave router B's link up. Therefore, router B still believes it has an interface up to send to a presumed-listening router A.

These partial failures are no issue for dynamic routing protocols to handle; if a neighbor becomes unresponsive for any reason, the neighborship is torn down after a time period and other routes are used. When dealing with static routes and policy-based routing, there is no dynamic protocol to manage a neighbor relationship. This means that we are susceptible to these types of failures. To demonstrate, I'll put a deny any any access-list ingress on Branch1's serial interface. This will keep the interface up, but traffic will not be able to pass from Wan2 to Branch1. Note that in order to more effectively troubleshoot these types of issues, we'll want to make the following configuration change to each router:

router ospf 1
 log-adjacency-changes detail


This change will allow the routers to log all OSPF state changes. Since the ACL is messing with one-way communication, we will see a different impact than the typical "dead timer expired" behavior. When the ACL is applied inbound on Branch1's serial interface, Wan2's OSPF hellos stop reaching Branch1. However, Branch1's hellos are still received by Wan2. When Branch1's dead timer expires for Wan2, it removed Wan2 from its adjacency table. Now, when Branch1 advertises its hello towards Wan2, Wan2 sees that it is no longer in Branch1's neighbor list and moves the neighbor state to Init. This is seen below:

! ------ Create and Apply ACL on Branch1 ------
!
Branch1#config t
Enter configuration commands, one per line.  End with CNTL/Z.
Branch1(config)#ip access-list extended DenyAll
Branch1(config-ext-nacl)#10 deny ip any any
Branch1(config-ext-nacl)#exit
Branch1(config)#int s2/0
Branch1(config-if)#ip access-group DenyAll in 
Branch1(config-if)#end
Branch1#
*Jun 30 19:38:49.073: %SYS-5-CONFIG_I: Configured from console by console
Branch1#
!
!
! ------ After dead timer expires... ------
!
Branch1#
*Jun 30 19:39:15.741: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.2 on Serial2/0 from FULL to DOWN, Neighbor Down: Dead timer expired
Branch1#
!
!
! ------ Wan2 moves Branch1 to Init ------
!
Wan2#
*Jun 30 19:39:15.561: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from FULL to INIT, 1-Way
Wan2#


Now that we're deep in our semi-failed state, it is time to revisit PBR to see how it is handling this situation. We'll redo our test VoIP packet and see where it ends up:

! ------ Start listener on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
!
!
! ------ Generate Syslog on Core1 ------
!
Core1#send log Testing PBR with T1 in 'semi-failed' state
Core1#
*Jun 30 20:50:22.665: %SYS-2-LOGMSG: Message from 0(): Testing PBR with T1 in 'semi-failed' state
Core1#
!
!
! ------ Observe Syslog on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...


As we can observe, there is likely a routing problem here, as evidenced by the fact that the packet never made it to its destination. Knowing the behavior from the previous scenario, we can assume that the VoIP packet will be policy-routed towards Wan2 on the Gi3/0.1 interface. Let's start there in our troubleshooting.

Wan2#show policy-map interface | inc /|Service|Class|,
...
 Serial2/0
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      139 packets, 11120 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 138 bytes
        1 packets, 138 bytes
    Class-map: class-default (match-any)
      429 packets, 31256 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 GigabitEthernet3/0.1
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 148 bytes
        1 packets, 148 bytes
    Class-map: class-default (match-any)
      143 packets, 13614 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      164 packets, 19002 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 GigabitEthernet3/0.5
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      143 packets, 13502 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      141 packets, 10651 bytes
      5 minute offered rate 0 bps, drop rate 0 bps


Again, note the color-highlighted sections. We can observe that 1 VoIP packet came ingress to Wan2 from Wan1's Gi3/0.1 interface (as expected due to Wan1's policy routing). It is also apparent that the packet left out the serial interface, as opposed to hairpinning back to Wan1 as in the previous link failure scenario. Because it left Wan2 out the T1, the packet obviously met its fate in the form of the DenyAll ACL applied at Branch1. This explains why the packet never made it to the Test PC.

Again, I want to stress that while we are mimicking this behavior by using an ingress ACL on Branch1, this is not the particular scenario we are really worried about. But it does go to prove the point that some partial failure in communication can degrade the ability for two routers to talk, even though a physical interface stays up/up. This example shows a setback in using policy-based routing for a decision point.

Make PBR More Dynamic

In order to make PBR a bit more dynamic, we are going to take a bit difference approach than what is normally used for this. If you look for common ways to make static routing or policy-based routing more dynamic, there are plenty of scenarios that show how to set up IP SLA tracking. This works fine and can definitely do the job. Another method that can solve this problem is Bidirectional Forwarding Detection (BFD), which I may cover at a higher level in a different post. In this post, however, I'd like to introduce another method.

We already have a pretty high confidence level of the neighbor's health because we're running OSPF. We rely on this neighbor state and dynamic route calculation for most all of our routing, so why not piggy-back on it for our PBR? We can take advantage of EEM scripting to allow us to do this.

! ------ Script for when Neighbor drops  ------

event manager applet PBR-Down 
 event syslog pattern ".*%OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from FULL to.*"
 action 1.0  cli command "enable"
 action 10.0 syslog msg "PBR Removed from Interfaces due to OSPF Nei State S2/0"
 action 2.0  cli command "config t"
 action 3.0  cli command "interface Gig1/0"
 action 4.0  cli command "no ip policy route-map PBR-VoIP-to-T1"
 action 5.0  cli command "interface G3/0.1"
 action 6.0  cli command "no ip policy route-map PBR-VoIP-to-T1"
 action 7.0  cli command "interface G3/0.5"
 action 8.0  cli command "no ip policy route-map PBR-VoIP-to-T1"
 action 9.0  cli command "end"
!
!
! ------ Script for when Neighbor returns  ------

event manager applet PBR-Up 
 event syslog pattern ".*%OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from LOADING to FULL, Loading Done$"
 action 1.0  cli command "enable"
 action 10.0 syslog msg "PBR Enabled on Interfaces due to OSPF Nei State Full S2/0"
 action 2.0  cli command "config t"
 action 3.0  cli command "interface Gig1/0"
 action 4.0  cli command "ip policy route-map PBR-VoIP-to-T1"
 action 5.0  cli command "interface Gig3/0.1"
 action 6.0  cli command "ip policy route-map PBR-VoIP-to-T1"
 action 7.0  cli command "interface Gig3/0.5"
 action 8.0  cli command "ip policy route-map PBR-VoIP-to-T1"
 action 9.0  cli command "end"
!


By configuring the above EEM scripts on Wan2, we should now be relying on OSPF neighbor state to control the enforcement of our PBR. If this works, of course we would want to implement a similar configuration on Branch1 to avoid the reverse situation. In any case, it seems as though we are ready to test.

Retrying Failures with Dynamic PBR

Since the partial failure scenario blew a giant hole in our PBR, we should test it again now that we have tried to make PBR a bit smarter. Let's start by implementing the DenyAll ACL in Branch1's T1 interface, in the inbound direction. Note the difference in behavior on Wan2 this time:

! ------ "Break" the T1 at Branch1  ------

Branch1#config t
Enter configuration commands, one per line.  End with CNTL/Z.
Branch1(config)#int s2/0
Branch1(config-if)#ip access-group DenyAll in
Branch1(config-if)#end
Branch1#
*Jun 30 22:23:55.872: %SYS-5-CONFIG_I: Configured from console by console
Branch1#
*Jun 30 22:24:25.772: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.2 on Serial2/0 from FULL to DOWN, Neighbor Down: Dead timer expired
Branch1#
!
!
! ------ Observe our EEM script at Wan2  ------

Wan2#
*Jun 30 22:24:27.800: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from FULL to INIT, 1-Way
*Jun 30 22:24:27.876: %HA_EM-6-LOG: PBR-Down: PBR Removed from Interfaces due to OSPF Nei State S2/0
Wan2#
*Jun 30 22:24:28.052: %SYS-5-CONFIG_I: Configured from console by  on vty0 (EEM:PBR-Down)
Wan2#


Now that we've successfully broken the T1 again, let's set up our Test PC to listen and generate a "VoIP" test packet from Core1.

! ------ Start listener on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
!
!
! ------ Generate Syslog on Core1 ------
!
Core1#send log Testing 'Smart PBR' with T1 in 'semi-failed' state
Core1#
*Jun 30 22:29:44.792: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 in 'semi-failed' state
Core1#
!
!
! ------ Observe Syslog on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
connect to [172.16.10.10] from (UNKNOWN) [10.0.11.4] 51054
<186>24: *Jun 30 22:29:44.792: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 in 'semi-failed' state


Alright, that's a good sign! By using EEM to automatically remove the PBR policy when the "target" router loses OSPF neighborship, we have successfully forced Wan2 to hairpin the VoIP traffic right back to Wan1 (just like in the link-failure scenario). Again, the routing is sub-optimal, but it will get the job done. See the output below for confirmation.

Wan2#show policy-map interface | inc /|Service|Class|,
...
 GigabitEthernet3/0.1 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 156 bytes
        1 packets, 156 bytes
    Class-map: class-default (match-any)
      95 packets, 8930 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      109 packets, 12677 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 GigabitEthernet3/0.5 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      97 packets, 9154 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 160 bytes
        1 packets, 160 bytes
    Class-map: class-default (match-any)
      96 packets, 7688 bytes
      5 minute offered rate 0 bps, drop rate 0 bps


The last thing to check with this before calling it a day is to make sure that once we restore connectivity, the PBR kicks back in and Wan2 sends the packet out its serial interface. In the output below, we'll remove the ACL on Branch1, and then watch EEM restore our PBR on Wan2:

! ------ Fix the T1  ------

Branch1#config t
Enter configuration commands, one per line.  End with CNTL/Z.
Branch1(config)#int s2/0
Branch1(config-if)#no ip access-group DenyAll in
Branch1(config-if)#end
Branch1#
*Jun 30 22:57:26.344: %SYS-5-CONFIG_I: Configured from console by console
Branch1#
*Jun 30 22:57:29.108: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.2 on Serial2/0 from LOADING to FULL, Loading Done
Branch1#
!
!
! ------ Watch EEM turn on PBR  ------

Wan2#
*Jun 30 22:57:28.876: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from INIT to 2WAY, 2-Way Received
*Jun 30 22:57:28.876: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from 2WAY to EXSTART, AdjOK?
*Jun 30 22:57:28.880: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from EXSTART to EXCHANGE, Negotiation Done
*Jun 30 22:57:28.896: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from EXCHANGE to LOADING, Exchange Done
*Jun 30 22:57:28.896: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from LOADING to FULL, Loading Done
*Jun 30 22:57:28.956: %HA_EM-6-LOG: PBR-Up: PBR Enabled on Interfaces due to OSPF Nei State Full S2/0
Wan2#
*Jun 30 22:57:29.144: %SYS-5-CONFIG_I: Configured from console by  on vty0 (EEM:PBR-Up)
Wan2#
Wan2#clear counters
Clear "show interface" counters on all interfaces [confirm]
Wan2#
*Jun 30 22:58:38.944: %CLEAR-5-COUNTERS: Clear counter on all interfaces by console
Wan2#


Finally, we can generate our "VoIP" packet once again from Core1 and ensure it not only still arrives to the Test PC in Branch1, but arrives via the T1 link. The output below will walk us through it.

! ------ Initiate Traffic from Core1  ------

Core1#send log Testing 'Smart PBR' with T1 in 'semi-failed' state
Core1#
*Jun 30 22:29:44.792: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 in 'semi-failed' state
Core1#send log Testing 'Smart PBR' with T1 after 'semi-failed' state
Core1#
*Jun 30 23:01:00.364: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 after 'semi-failed' state
Core1#
!
!
! ------ Observe Syslog on Test PC  ------

root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
connect to [172.16.10.10] from (UNKNOWN) [10.0.11.4] 51054
<186>25: *Jun 30 23:01:00.364: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 after 'semi-failed' state
!
!
! ------ View Wan2 Policy-map Counts  ------

Wan2#show policy-map interface | inc /|Service|Class|,
...
 Serial2/0 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      143 packets, 12012 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 149 bytes
        1 packets, 149 bytes
    Class-map: class-default (match-any)
      300 packets, 20166 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
 GigabitEthernet3/0.1 
  Service-policy input: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      1 packets, 159 bytes
        1 packets, 159 bytes
    Class-map: class-default (match-any)
      143 packets, 13442 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
  Service-policy output: PBR-Counters
    Class-map: PBR-VoIP (match-any)
      0 packets, 0 bytes
        0 packets, 0 bytes
    Class-map: class-default (match-any)
      167 packets, 19082 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
Wan2#


As seen above, when the OSPF neighborship came back, PBR came back on. This allowed our VoIP traffic to continue along the T1 as we had wanted.

Conclusion

This has been a long, drawn-out lab, so thanks for sticking with it. I'll take a moment to repeat my sentiments on policy-based routing: it's ugly, hard to troubleshoot, and can get out of hand quick. However, it can be a necessity in certain scenarios. It should also be noted that the above lab does not solve every issue; especially as it essentially ignored the Branch1 PBR configuration. Also, there are other potential routing loops we did not address. Regardless, this lab should serve as a decent primer as for why and how to implement policy-based routing on Cisco routers.

Saturday, June 22, 2013

OSPF passive-interface

It is common when running OSPF to enable "passive-interface default" in the ospf sub-configuration. This little command makes it so that no interfaces are automatically enabled for OSPF; only explicitly defined interfaces via "no passive-interface <interface>" are active from an OSPF standpoint. The reason for this is relatively simple; it offers protection against rogue neighborships forming on unexpected ports.

The reason for this post, though, is that I discovered something interesting about the command 'passive-interface default' yesterday. Whereas most commands that show up in a running-configuration can be re-applied without any impact (hey, the command's already there, right?), that's not the case with this one.

When re-applying "passive-interface default" to an OSPF configuration, any previously-defined "no passive-interface <interface>" commands will simply drop out of the configuration. The impact? This means that any neighborships formed will quickly drop, as OSPF no longer will consider those interfaces active.

I realize it's a corner case that someone would push this command to a router where it already exists. The fact that it does go against the typical IOS rule-of-thumb that "re-applying config lines is safe" is worth pointing out. So, all you copy-and-pasters, take note that a copy/paste of certain already-applied commands could put you in hot water.

Also, part 2 of the PBR lab is underway will be published this coming week. Cheers!

Friday, June 14, 2013

Lab: Policy-based routing (Part 1)

Introduction

The purpose of this lab is to provide a potential solution for a problem that occurs when operating outside of ideal solutions. While it is nice to lab up an environment that represents having privately owned dark fiber, low latency connectivity all over the place, this is often not reality.

Today's lab will build off the previous OSPF lab topology, but will be adding some interesting twists, so to speak. Referring to the topology below, for the sake of an interesting lab, we said the Fast Ethernet link between Branch1 (remote site) and Wan1 (head-end edge WAN router) was really a transparent point-to-point wireless bridge. The serial link between Branch1 and Wan2, to add some variety, is a privately leased T1 point-to-point circuit.


A problem appears...

With the network in its final configuration from the previous lab, users at Branch1 are complaining of quality problems with the Voice-over-IP system. However, all their web applications and file transfers seem to be performing within expectations. After digging in, it is apparent that there is some packet loss over the wireless point-to-point bridge, and the UDP-based real-time VoIP system is being impacted. Luckily, TCP is doing a decent enough job at 'hiding' the problem in the other applications.

After looking at bandwidth counters, we can observe that during the peak business hours, the 100 Mbit link between Branch1 and Wan1 is operating with its 95th percentile around 40 Mbps. With that much bandwidth, using the T1 "backup" link as a primary path is not viable; the link would be far over-saturated.

In review, there's too much traffic to use the T1 as the primary routed link, and packet loss on the 100 Mbit link is causing issues with VoIP. Since we can't run all-or-nothing on either link, what if we can cherry-pick the VoIP traffic to send over the T1, and leave the rest of the traffic across the wireless bridge? To accomplish this, we can attempt to use Policy-Based Routing (PBR).

Alright! So we'll start by reviewing our relevant base configurations in the lab.

Base Lab Configuration


Branch1:

!
interface Loopback0
 ip address 172.16.255.3 255.255.255.255
!
interface FastEthernet1/0
 description To-WAN1
 ip address 172.16.11.1 255.255.255.254
 ip ospf network point-to-point
 duplex auto
 speed auto
!      
interface Serial2/0
 description To-WAN2
 ip address 172.16.11.3 255.255.255.254
 ip ospf network point-to-point
 serial restart-delay 0
!      
interface GigabitEthernet3/0
 description Branch-LAN
 ip address 172.16.10.1 255.255.255.0
 negotiation auto
!
router ospf 1
 router-id 172.16.255.3
 log-adjacency-changes
 auto-cost reference-bandwidth 100000
 area 0.0.0.5 stub
 network 172.16.10.0 0.0.0.255 area 0.0.0.5
 network 172.16.11.0 0.0.0.1 area 0.0.0.5
 network 172.16.11.2 0.0.0.1 area 0.0.0.5
 network 172.16.255.3 0.0.0.0 area 0.0.0.5
!


Wan1:

!
interface Loopback0
 ip address 172.16.255.1 255.255.255.255
!
interface Loopback1
 description Fake branch
 ip address 172.16.12.1 255.255.255.0
!
interface GigabitEthernet1/0
 description To-Core1
 ip address 10.0.11.1 255.255.255.254
 ip ospf network point-to-point
 negotiation auto
!      
interface FastEthernet2/0
 description To-Branch1
 ip address 172.16.11.0 255.255.255.254
 ip ospf network point-to-point
 duplex auto
 speed auto
!      
interface GigabitEthernet3/0
 no ip address
 negotiation auto
!
interface GigabitEthernet3/0.1
 encapsulation dot1Q 1 native
 ip address 10.0.11.6 255.255.255.254
 ip ospf network point-to-point
!
interface GigabitEthernet3/0.5
 encapsulation dot1Q 5
 ip address 172.16.11.4 255.255.255.254
 ip ospf network point-to-point
!
router ospf 1
 router-id 172.16.255.1
 log-adjacency-changes
 auto-cost reference-bandwidth 100000
 area 0.0.0.5 stub no-summary
 area 0.0.0.5 range 172.16.0.0 255.255.240.0
 network 10.0.11.0 0.0.0.1 area 0.0.0.0
 network 10.0.11.6 0.0.0.1 area 0.0.0.0
 network 172.16.11.0 0.0.0.1 area 0.0.0.5
 network 172.16.11.4 0.0.0.1 area 0.0.0.5
 network 172.16.12.0 0.0.0.255 area 0.0.0.5
 network 172.16.255.1 0.0.0.0 area 0.0.0.5
!


Wan2:

!
interface Loopback0
 ip address 172.16.255.2 255.255.255.255
!
interface Loopback1
 description Fake branch2
 ip address 172.16.14.1 255.255.255.0
!
interface GigabitEthernet1/0
 description To-Core2
 ip address 10.0.11.3 255.255.255.254
 ip ospf network point-to-point
 negotiation auto
!      
interface Serial2/0
 description To-Branch1
 ip address 172.16.11.2 255.255.255.254
 ip ospf network point-to-point
 serial restart-delay 0
!      
interface GigabitEthernet3/0
 no ip address
 negotiation auto
!
interface GigabitEthernet3/0.1
 encapsulation dot1Q 1 native
 ip address 10.0.11.7 255.255.255.254
 ip ospf network point-to-point
!
interface GigabitEthernet3/0.5
 encapsulation dot1Q 5
 ip address 172.16.11.5 255.255.255.254
 ip ospf network point-to-point
!
router ospf 1
 router-id 172.16.255.2
 log-adjacency-changes
 auto-cost reference-bandwidth 100000
 area 0.0.0.5 stub no-summary
 area 0.0.0.5 range 172.16.0.0 255.255.240.0
 network 10.0.11.2 0.0.0.1 area 0.0.0.0
 network 10.0.11.6 0.0.0.1 area 0.0.0.0
 network 172.16.11.2 0.0.0.1 area 0.0.0.5
 network 172.16.11.4 0.0.0.1 area 0.0.0.5
 network 172.16.14.0 0.0.0.255 area 0.0.0.5
 network 172.16.255.2 0.0.0.0 area 0.0.0.5
!


Core1:

!
interface Loopback0
 ip address 10.0.255.1 255.255.255.255
!
interface GigabitEthernet1/0
 description To-Core2
 ip address 10.0.11.4 255.255.255.254
 ip ospf network point-to-point
 negotiation auto
!
interface GigabitEthernet2/0
 description To-Wan1
 ip address 10.0.11.0 255.255.255.254
 ip ospf network point-to-point
 negotiation auto
!
router ospf 1
 router-id 10.0.255.1
 log-adjacency-changes
 auto-cost reference-bandwidth 100000
 network 10.0.11.0 0.0.0.1 area 0.0.0.0
 network 10.0.11.4 0.0.0.1 area 0.0.0.0
 network 10.0.255.1 0.0.0.0 area 0.0.0.0
!


Core2:

!
interface Loopback0
 ip address 10.0.255.2 255.255.255.255
!
interface GigabitEthernet1/0
 description To-Core1
 ip address 10.0.11.5 255.255.255.254
 ip ospf network point-to-point
 negotiation auto
!
interface GigabitEthernet2/0
 description To-Wan2
 ip address 10.0.11.2 255.255.255.254
 ip ospf network point-to-point
 negotiation auto
!
router ospf 1
 router-id 10.0.255.2
 log-adjacency-changes
 auto-cost reference-bandwidth 100000
 network 10.0.11.2 0.0.0.1 area 0.0.0.0
 network 10.0.11.4 0.0.0.1 area 0.0.0.0
 network 10.0.255.2 0.0.0.0 area 0.0.0.0
!


Implementing Policy-based Routing

For starters, I want to make sure that this blog post is in no way, shape, or form making it sound like I am giving a glowing endorsement of PBR (neither the routing technique nor the beer). Frankly, it is hard to support and manage, and it has a tendency to get out of control quickly. Whenever possible, I try to avoid using it. However, there are times when PBR is among a short list of viable solutions. In these cases, I believe it is more important to deploy it in such a way that it can be managed easier, even if this means that traffic is flowing across a non-optimal route.

In order to successfully implement PBR, it is important to remember the basics about how PBR works. In order to influence the routed path beyond what the implemented routing protocols are doing, PBR allows intervention by effectively skipping the routing table lookup. The first step is to define a route-map, which contains two parts: a 'match' clause and a 'set' clause. The match clause defines which traffic will be routed according to PBR, and the set clause defines how the matched traffic should be routed. As PBR needs to 'intercept' the normal routing decision, it is applied to interfaces in the inbound direction. In other words, in order to policy-route traffic, the policy has to be applied to the interface on which the traffic is entering the said router. The PBR-selected outbound interface does not require the policy to be applied.

If we want to policy-route the VoIP traffic to use the T1 line, we need to determine at which points we need to manually intervene in the routed path. We can start by picking a direction and working out the details, and then once we are satisfied, we can move on to the opposite direction. I am going to recommend starting with policy-routing traffic sourcing from Branch1, as it is the simpler of the two. First, we will create the route-map that will be used to define the "policy" of our PBR, using an access-list as its criteria. Then, we will apply it to the inbound LAN interface to match appropriate traffic sourcing from the Branch1 local network.

!
ip access-list extended PBR-VoIP-to-T1-ACL
 permit udp 172.16.10.0 0.0.0.255 10.0.11.0 0.0.0.255 range 16384 32768
!
route-map PBR-VoIP-to-T1 permit 10
 match ip address PBR-VoIP-to-T1-ACL
 set ip next-hop 172.16.11.2
!
interface GigabitEthernet3/0
 ip policy route-map PBR-VoIP-to-T1
!
end

Branch1#show route-map
route-map PBR-VoIP-to-T1, permit, sequence 10
  Match clauses:
    ip address (access-lists): PBR-VoIP-to-T1-ACL 
  Set clauses:
    ip next-hop 172.16.11.2
  Policy routing matches: 0 packets, 0 bytes
Branch1#


Note that the output of 'show route-map' includes a line that shows counters for PBR matches. One of the problems with PBR is lack of transparency, as 'show ip route' is no longer 100% accurate. So, those counters in 'show route-map' - as well as 'show ip policy' to point out which interfaces have which route-maps associated with them - are pretty much the way to see what's happening. Now that we have the policy applied, we can test by injecting traffic in with our test PC off Branch1, and make sure it routes as expected.

root@bt:~# hping3 10.0.11.10 --udp -p 16386

HPING 10.0.11.10 (tap0 10.0.11.10): udp mode set, 28 headers + 0 data bytes
ICMP Host Unreachable from ip=172.16.11.2 name=UNKNOWN  
ICMP Host Unreachable from ip=172.16.11.2 name=UNKNOWN  
ICMP Host Unreachable from ip=172.16.11.2 name=UNKNOWN  
^C
--- 10.0.0.10 hping statistic ---
3 packets tramitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms


Now that we've sent three UDP packets into the test network, which match the route-map's ACL, let's see if our PBR worked:

Branch1#show ip policy

Interface      Route map
Gi3/0          PBR-VoIP-to-T1
Branch1#show route-map
route-map PBR-VoIP-to-T1, permit, sequence 10
  Match clauses:
    ip address (access-lists): PBR-VoIP-to-T1-ACL
  Set clauses:
    ip next-hop 172.16.11.2
  Policy routing matches: 3 packets, 126 bytes
Branch1#


Success! For kicks, I also injected TCP traffic to the same destination port, and confirmed that the packet count in 'show route-map' did not increase. So, we can safely say that VoIP traffic from the branch office will use the T1, while the TCP applications will go forth using the Fast Ethernet link.

Now that we've tackled the branch office, we can take on the head-end. This one takes a bit more planning as there are more routers and paths involved. With Branch1, all ingress traffic towards the WAN was sourced from one interface, and the outbound connections to the WAN were both directly connected. At the head-end side, there are two different WAN routers, so it's not as clear-cut.

This is probably a great time to point out something that should already be obvious -- even though the methods used to achieve the goals in this lab will work -- it does not mean it is the only way, nor possibly even the best way. It's just one way, and my mindset behind PBR is to limit its deployment to the smallest number of devices possible. This may lead to inefficient routing, but as long as it provides the application-specific routing that cannot be achieved otherwise, that's good enough. With all that being said, when considering the routing from the core to branch1, the traffic is guaranteed to hit either Wan1 or Wan2. Because of this, we'll configure PBR on Wan1 and Wan2 exclusively.

Wan1:

ip access-list extended PBR-VoIP-to-T1-ACL
 permit udp 10.0.11.0 0.0.0.255 172.16.10.0 0.0.0.255 range 16384 32768
!
route-map PBR-VoIP-to-T1 permit 10
 match ip address PBR-VoIP-to-T1-ACL
 set ip next-hop 10.0.11.7
!
interface GigabitEthernet1/0
 description To-Core1
 ip policy route-map PBR-VoIP-to-T1
!

Wan2:

ip access-list extended PBR-VoIP-to-T1-ACL
 permit udp 10.0.11.0 0.0.0.255 172.16.10.0 0.0.0.255 range 16384 32768
!
route-map PBR-VoIP-to-T1 permit 10
 match ip address PBR-VoIP-to-T1-ACL
 set ip next-hop 172.16.11.3
!
interface GigabitEthernet1/0
 description To-Core2
 ip policy route-map PBR-VoIP-to-T1
!
interface GigabitEthernet3/0.1
 description To-Wan2-OSPF-0
 ip policy route-map PBR-VoIP-to-T1
!
interface GigabitEthernet3/0.5
 description To-Wan2-OSPF-5
 ip policy route-map PBR-VoIP-to-T1
!


With this configuration of Wan1 and Wan2, policy-based routing has been configured to take traffic coming into either Wan router from the Core, and ensure it is sent out Wan2's T1 interface. To keep the amount of PBR involved to a minimum, we are allowing the inefficient route of traffic routing from Core1 to Wan1, then across to Wan2 so it can leave the T1. The following picture depicts a visual representation of where PBR is being used:


Note that however traffic enters Wan2, it will be forwarded out the T1 interface due to PBR.  Now, PBR only works if the next-hop interface is in up/up state.  Otherwise PBR is ignored and the routing table is followed.  Why is this important? If the T1 is down, Wan2 will theoretically follow OSPF telling it to send traffic destined to Branch1 via its point-to-point interface with Wan1. It is because of this that we do not want to put an ingress PBR policy on Wan1's ingress interface from Wan2.  If we did, we would risk a routing loop.

I will also take a moment to point out that this configuration, as it stands, is not really doing much to protect against potential black-hole scenarios.  Part 2 of this post will spend more time in making PBR a bit more resilient and also giving a bit more visibility for troubleshooting.

As it stands, we still need to test the policy as it was implemented.  What we really want is a test PC hanging off the "Core" network, sending traffic destined to the Branch1 subnet.  Since we don't have that, we can 'hack' it by forcing the Core1 and Core2 routers to send traffic that matches our VoIP policy. We can add the following configuration to Core1 and Core2, and then test our PBR by generating syslogs. Is this considered cheating? I do believe so. :)

------------ Core1 -----------
logging source-interface GigabitEthernet1/0
logging host 172.16.10.10 transport udp port 16390
!
!
! ------------ Core2 -----------
logging source-interface GigabitEthernet1/0
logging host 172.16.10.10 transport udp port 16392
!
! ------------ Test Syslog from Core2 -----------
!
!
Core2#config t
Enter configuration commands, one per line.  End with CNTL/Z.
Core2(config)#^Z
Core2#
*Jun 14 15:50:07.995: %SYS-5-CONFIG_I: Configured from console by console
Core2#
!
! ------------ Check Wan2 Routemap Counters -----------
!
!
Wan2#show route-map
route-map PBR-VoIP-to-T1, permit, sequence 10
  Match clauses:
    ip address (access-lists): PBR-VoIP-to-T1-ACL
  Set clauses:
    ip next-hop 172.16.11.3
  Policy routing matches: 1 packets, 124 bytes
Wan2#
!
! ------------ Test Syslog from Core1 -----------
!
!
Core1#config t
Enter configuration commands, one per line.  End with CNTL/Z.
Core1(config)#^Z
Core1#
*Jun 14 15:52:56.763: %SYS-5-CONFIG_I: Configured from console by console
Core1#
!
! ------------ Check Wan1 Routemap Counters -----------
!
!
Wan1#show route-map
route-map PBR-VoIP-to-T1, permit, sequence 10
  Match clauses:
    ip address (access-lists): PBR-VoIP-to-T1-ACL
  Set clauses:
    ip next-hop 10.0.11.7
  Policy routing matches: 1 packets, 124 bytes
Wan1#
!
! ------------ Check Wan2 Routemap Counters -----------
!
!
Wan2#show route-map
route-map PBR-VoIP-to-T1, permit, sequence 10
  Match clauses:
    ip address (access-lists): PBR-VoIP-to-T1-ACL
  Set clauses:
    ip next-hop 172.16.11.3
  Policy routing matches: 2 packets, 248 bytes
Wan2#

As evidenced here, PBR is working as expected.  We generated syslogs specially crafted to mimic a VoIP packet based on what our ACL considered VoIP, so that we could test PBR on each Wan router.

UDP "VoIP" traffic from Core2 was sent to Wan2 (following OSPF/routing table), where PBR intercepted it and sent it out the T1 link.

Furthermore, the syslog generated by Core1 was sent to Wan1 (following OSPF/routing table), where PBR intercepted it and instead of sending it directly to Branch1, it sent the "VoIP" traffic over to Wan2. Since Wan2 has an inbound PBR policy on that interface, it again intercepted that traffic and changed the next-hop to send it down the T1 towards Branch1.

Conclusion

With the configuration as it stands, PBR is allowing bidirectional VoIP traffic between the head-end and the branch office to flow across the T1, while all other traffic is going through the FastEthernet interface. This meets our lab criteria of allowing the bulk traffic to use the high-bandwidth point-to-point wireless link, since TCP is mitigating the packet loss.  It also allows the VoIP traffic that had been impacted by the packet loss to use the lower bandwidth link that runs without error.

Next time we will take this configuration further by making PBR a bit more dynamic with respect to its decision-making, which will allow for better failure handling.  We will also look at ways to improve visibility of traffic with PBR, since the ip routing table is not used to decide how traffic is routed.