Packet Travel Guide: ospf

Showing posts with label ospf. Show all posts

Sunday, June 30, 2013

Lab: Policy-based routing (Part 2)

Introduction

This lab is the continuation of our previous PBR lab, so if you haven't read through it yet, please read Lab: Policy-based routing (Part 1) first. In Part 1, we took our OSPF topology and forced our "VoIP" traffic across a serial link, leaving the rest of the traffic to use our "point to point wireless" link. In order to accomplish this, we used a concept called Policy-Based Routing (PBR). This allowed us to cherry-pick traffic and force it across a link other than what our OSPF-driven routing table would have chosen.

In today's lab, we'll take it a few steps further. To review the reasons one may choose to avoid PBR, there are some definite support issues. Since we are manually forcing traffic to choose a different path than what the routing table would normally choose, we are opening ourselves up to minor issues like inefficient routing, but in worst-case scenarios, an increased likelihood of routing loops. Furthermore, when PBR is implemented, the normal diagnostic commands we are all used to (show ip route, for instance) become potentially inaccurate. Remember that PBR is in effect intercepting the routing decision from the routing table, so it can often be tricky to understand what routing is taking place. Due to this, it can also be hard to track where traffic is actually going.

We'll tackle some of these issues as we walk through the following sections:

Gain better visibility of traffic flow throughout our topology
Test failure scenarios to see how PBR reacts
Configure PBR to behave more dynamically
Re-test any failure scenarios that previously did not recover

Please reference the Part 1 lab mentioned at the top of this post for configurations up to this point. With any luck, by the time we are done with part 2, we'll have a better understanding of how we can make our PBR topology a bit more resilient and transparent.

Gaining visibility

In the previous lab, we used commands like 'show ip policy' and 'show route-map' to see what traffic would be policy-routed. However, if the policy routing is applied to multiple interfaces (as is the case with Wan2 router), the packet counts on the 'show route-map' output can quickly become useless.

One way to gain better visibility is to take advantage of a feature set IOS already has for efficient identification of traffic: Quality of Service (QoS) configuration. While this lab will not dive into QoS, we will take advantage of the QoS command framework in IOS to at least give us some better visibility into the traffic on our network. In a real-world scenario, we would likely want to use QoS anyway to help protect important traffic, so the visibility would be in place anyway.

We'll add the following configuration to all interfaces Wan1, Wan2, and Branch1 to best understand where traffic is entering and exiting these routers. Since these are the three routers where we have PBR implemented, it makes the most sense to try to better view this traffic. The example below shows adding the QoS class-map and policy-map to the router's global config, and then applying it to one particular interface. Remember, we'll be applying this interface-level configuration to each non-loopback ip-addressed interface on Wan1, Wan2, and Branch1. Also, there is an additional access-list being added, and it is representative of the opposite direction VoIP traffic than what was defined for PBR. The example below would be for the Wan1 or Wan2 side.

!
ip access-list extended VoIP-Incoming
10 permit udp 172.16.10.0 0.0.0.255 any range 16384 32768
!
class-map match-any PBR-VoIP
match access-group name PBR-VoIP-to-T1-ACL
match access-group name VoIP-Incoming
!
policy-map PBR-Counters
class PBR-VoIP
class class-default
!
interface FastEthernet2/0
service-policy input PBR-Counters
service-policy output PBR-Counters
!

Now that we have this configuration in place, we can go ahead and test the generation of "VoIP" traffic from Core1, and see where it goes. Remember that for sake of simplicity in the lab environment, we tricked IOS syslog into sending its logs to the TestPC (172.16.10.10) off Branch1, and sending the UDP syslogs on a nonstandard port that fit into the VoIP RTP range. Below, witness the "VoIP" traffic being generated by Core1 and received by the Test PC:

! ------ Initiate Traffic from Core1 ------
!
Core1#send log Testing PBR with all links up
Core1#
*Jun 18 13:30:19.047: %SYS-2-LOGMSG: Message from 0(): Testing PBR with all links up
Core1#
!
!
! ------ See Traffic at TestPC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
connect to [172.16.10.10] from (UNKNOWN) [10.0.11.4] 57473
<186>25: *Jun 18 13:30:19.047: %SYS-2-LOGMSG: Message from 0(): Testing PBR with all links up

Now that we have seen that the traffic successfully routes (no surprise) - it's time to confirm that the path is as expected. Note that nothing has really changed since the end of Part 1, so all we are accomplishing is validating the path with a different set of commands. The following output will show the successive "show policy-map interface" command issued on each router. Note that the output is being filtered as the IOS command prompt to make it a bit more legible, and I've also cut a bit of the output for completely irrelevant interfaces. The output is a bit long, so pay attention to the color highlighting to help pick out the important pieces.

! ------ Check QoS Stats on Wan1 ------
!
Wan1#show policy-map interface | inc /|Service|Class|,
GigabitEthernet1/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 135 bytes
1 packets, 135 bytes
Class-map: class-default (match-any)
491 packets, 46354 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
1021 packets, 101657 bytes
5 minute offered rate 0 bps, drop rate 0 bps
FastEthernet2/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
478 packets, 45100 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
1004 packets, 96847 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
495 packets, 46794 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 135 bytes
1 packets, 135 bytes
Class-map: class-default (match-any)
569 packets, 73868 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Wan1#
!
!
! ------ Check QoS Stats on Wan2 ------
!
Wan2#show policy-map interface | inc /|Service|Class|,
GigabitEthernet1/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
512 packets, 48548 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
1068 packets, 94670 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Serial2/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
498 packets, 42084 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 125 bytes
1 packets, 125 bytes
Class-map: class-default (match-any)
1043 packets, 74844 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 135 bytes
1 packets, 135 bytes
Class-map: class-default (match-any)
516 packets, 48976 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
596 packets, 70163 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Wan2#
!
!
! ------ Check QoS Stats on Branch1 ------
!
Branch1#show policy-map interface | inc /|Service|Class|,
FastEthernet1/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
811 packets, 76518 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
1702 packets, 148025 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Serial2/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 125 bytes
0 packets, 0 bytes
1 packets, 125 bytes
Class-map: class-default (match-any)
811 packets, 68473 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
1686 packets, 115744 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
1 packets, 43 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 135 bytes
0 packets, 0 bytes
1 packets, 135 bytes
Class-map: class-default (match-any)
1692 packets, 145181 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Branch1#

By following the blue-colored output, it is apparent which paths were used and which were not used for the "VoIP" traffic. As expected, the path matched the last section of Part 1. It followed OSPF from Core1 to Wan1, at which point it matched our PBR policy and instead of following OSPF directly to Branch1, it routed across to Wan2. Again, due to PBR on inbound at Wan2, the packet ignored the routing table and routed across the T1 to Branch1.

Failure Scenario: T1 Link Down

Now that we've tested packet traversal while all links are up and working fine, we need to test some failure scenarios to see if our traffic still routes. To create our first and most obvious failure, we will 'shut' the T1 interface from the Branch1 side, simulating loss of link on the T1 line. This is probably the best type of failure to happen, as both Wan2 and Branch1 lose the link from a physical standpoint. There are plenty of topologies where one side could fail and the other stays up. An example of this would be two routers connected via a switch. However, in this case, we luck out because a link failure will be seen immediately on both ends.

As you will soon find out, PBR during failures can be, at its best, a tricky situation to follow. Because of this, I'll walk through this one a bit more step-by-step. First, we will break the T1 connection at the "Branch1" side.

With this link down, we will next generate our test packet and then observe the results. Following the same process, we will start a listener on the Test PC at IP 172.16.10.10 hanging off the Branch1 router. Then, we will generate a syslog message from Core1 to act as our VoIP packet.

! ------ Start listener on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
!
!
! ------ Generate Syslog on Core1 ------
!
Core1#send log Testing PBR with T1 in 'down' state
Core1#
*Jun 30 09:43:20.199: %SYS-2-LOGMSG: Message from 0(): Testing PBR with T1 in 'down' state
Core1#
!
!
! ------ Observe Syslog on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
connect to [172.16.10.10] from (UNKNOWN) [10.0.11.4] 51054
<186>22: *Jun 30 09:43:20.199: %SYS-2-LOGMSG: Message from 0(): Testing PBR with T1 in 'down' state

Without any further analysis, we already know the most important part -- the traffic made it to its destination! This means that the policy routing is not black-holing traffic with this failure scenario. We should take a few minutes to observe the traffic path and understand why. Core1 will send its traffic to Wan1, as this is the OSPF best route, and there is no PBR applied locally on that router. So, we will start our tracing on Wan1. Traffic is received on Wan1's Gi1/0 interface, which has PBR inbound policy set. That policy dictates that traffic will be sent to Wan2 via the G3/0.1 interface. Now, we know this is probably a bad idea since the T1 interface is down on Wan2. As Wan1 has no way of knowing this, PBR will continue as expected. Observe the counters on Wan1 below:

Notice that the VoIP packet came in on Gi1/0 (from Core1) as expected, and the policy-routing forces the packet out G3/0.1 towards Wan2. This is observed in the output direction in the above output. So now, we have observed PBR still working 'as expected' and sending the packet towards Wan2, even though Wan2 no longer has the T1 link up.

It will be interesting to see what Wan2 does with the packet next. Keep in mind that Wan2 has a policy-map on its inbound interface from Wan1, stating that VoIP traffic coming in from Wan1 will be policy-routed out the T1. It is important to understand, though, that PBR will only execute if the next-hop interface is up/up. If this criteria is not met, PBR will instead step away and the packet will route according to the IP routing table on the router. This is critical to understand, as the T1 "next hop" is currently down, we should expect PBR to have no effect on Wan2's routing decision. Therefore, we should see the packet follow OSPF's selected route right back to Wan1:

Wan2#show policy-map interface | inc /|Service|Class|,
...
Serial2/0
...
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
650 packets, 46763 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 141 bytes
1 packets, 141 bytes
Class-map: class-default (match-any)
340 packets, 32396 bytes
5 minute offered rate 0 bps, drop rate 0 bps
...
GigabitEthernet3/0.5
...
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 141 bytes
1 packets, 141 bytes
Class-map: class-default (match-any)
337 packets, 25718 bytes
5 minute offered rate 0 bps, drop rate 0 bps
...

Pay careful attention to the input/output directions of the above output. Notice that, as expected from seeing Wan1's output, Wan2 shows an ingress VoIP packet on Gi3/0.1. Since Serial2/0 is in a down state, our theory proves correct that no VoIP packet is seen egressing that interface. However, we can observe that the VoIP packet leaves Wan2, destined back to Wan1 on interface Gi3/0.5.

That is fairly interesting to observe; the packet came in on the area 0 interface, but left on the area 5 interface. Why? On the way from Wan1 to Wan2, the area 0 interface was hard-set by the policy routing. On the way back from Wan2 to Wan1, since PBR was no longer valid, Wan2 followed its IP routing table. Remember that if OSPF has a choice between going inter-area or staying intra-area to reach a destination, it will always choose intra-area. Therefore, since Wan2 and Wan1 both shared an interface in area 5 (the destination area) this is the link that is chosen.

Following our packet, Wan2's IP routing table has led us back to Wan1. Let's check stats to see if we can see where it came in and left again. For those following along and thinking, "Hey, wouldn't I have seen this the first time we checked Wan1?" The answer is yes; this is one of the reasons I chose to omit portions of the output. Otherwise it can be misleading and confusing. Like I said, policy routing can easily lead to confusion! We'll pick up that same show command in its entirety this time, and I'll selectively highlight the portions related to this last leg of the packet's journey to Branch1. Fair warning, there's a lot of output below.

Wan1#show policy-map interface | inc /|Service|Class|,
GigabitEthernet1/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 141 bytes
1 packets, 141 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
327 packets, 31398 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
685 packets, 65100 bytes
5 minute offered rate 0 bps, drop rate 0 bps
FastEthernet2/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
325 packets, 30978 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 141 bytes
1 packets, 141 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
679 packets, 64857 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
328 packets, 31360 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 141 bytes
1 packets, 141 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
385 packets, 47634 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.5
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 145 bytes
1 packets, 145 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
325 packets, 30882 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
329 packets, 26834 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Wan1#

Through all of that, notice the lines colored in blue. We see the same VoIP packet come back to Wan1 for the second time, this time from Wan2. Because there is no policy-map for inbound traffic on Gi3/0.5, Wan1 now forwards this packet via its IP routing table, choosing Fa2/0 for its connection to Branch1. As a memory refresher, this is the point-to-point transparent wireless bridge in our lab scenario. While there was definitely sub-optimal routing during this failure scenario, the packet still arrives where it's supposed to. So, with this basic type of failure, our policy-based routing seems to have survived.

Failure Scenario: Semi-failed Interface

This type of failure scenario could be caused by several issues, so note that our simulation efforts are only to mimic the typical symptoms. The generic issue we are tackling in this failure scenario is when a router still shows a link as up/up, but the router on the other end is in a 'down' state of some sort. One of the common examples of this is when two routers form a layer-3 relationship across a layer-2 switch. This means that the switch could drop its link to router A, but leave router B's link up. Therefore, router B still believes it has an interface up to send to a presumed-listening router A.

These partial failures are no issue for dynamic routing protocols to handle; if a neighbor becomes unresponsive for any reason, the neighborship is torn down after a time period and other routes are used. When dealing with static routes and policy-based routing, there is no dynamic protocol to manage a neighbor relationship. This means that we are susceptible to these types of failures. To demonstrate, I'll put a deny any any access-list ingress on Branch1's serial interface. This will keep the interface up, but traffic will not be able to pass from Wan2 to Branch1. Note that in order to more effectively troubleshoot these types of issues, we'll want to make the following configuration change to each router:

router ospf 1
log-adjacency-changes detail

This change will allow the routers to log all OSPF state changes. Since the ACL is messing with one-way communication, we will see a different impact than the typical "dead timer expired" behavior. When the ACL is applied inbound on Branch1's serial interface, Wan2's OSPF hellos stop reaching Branch1. However, Branch1's hellos are still received by Wan2. When Branch1's dead timer expires for Wan2, it removed Wan2 from its adjacency table. Now, when Branch1 advertises its hello towards Wan2, Wan2 sees that it is no longer in Branch1's neighbor list and moves the neighbor state to Init. This is seen below:

! ------ Create and Apply ACL on Branch1 ------
!
Branch1#config t
Enter configuration commands, one per line. End with CNTL/Z.
Branch1(config)#ip access-list extended DenyAll
Branch1(config-ext-nacl)#10 deny ip any any
Branch1(config-ext-nacl)#exit
Branch1(config)#int s2/0
Branch1(config-if)#ip access-group DenyAll in
Branch1(config-if)#end
Branch1#
*Jun 30 19:38:49.073: %SYS-5-CONFIG_I: Configured from console by console
Branch1#
!
!
! ------ After dead timer expires... ------
!
Branch1#
*Jun 30 19:39:15.741: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.2 on Serial2/0 from FULL to DOWN, Neighbor Down: Dead timer expired
Branch1#
!
!
! ------ Wan2 moves Branch1 to Init ------
!
Wan2#
*Jun 30 19:39:15.561: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from FULL to INIT, 1-Way
Wan2#

Now that we're deep in our semi-failed state, it is time to revisit PBR to see how it is handling this situation. We'll redo our test VoIP packet and see where it ends up:

! ------ Start listener on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
!
!
! ------ Generate Syslog on Core1 ------
!
Core1#send log Testing PBR with T1 in 'semi-failed' state
Core1#
*Jun 30 20:50:22.665: %SYS-2-LOGMSG: Message from 0(): Testing PBR with T1 in 'semi-failed' state
Core1#
!
!
! ------ Observe Syslog on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...

As we can observe, there is likely a routing problem here, as evidenced by the fact that the packet never made it to its destination. Knowing the behavior from the previous scenario, we can assume that the VoIP packet will be policy-routed towards Wan2 on the Gi3/0.1 interface. Let's start there in our troubleshooting.

Wan2#show policy-map interface | inc /|Service|Class|,
...
Serial2/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
139 packets, 11120 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 138 bytes
1 packets, 138 bytes
Class-map: class-default (match-any)
429 packets, 31256 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 148 bytes
1 packets, 148 bytes
Class-map: class-default (match-any)
143 packets, 13614 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
164 packets, 19002 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.5
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
143 packets, 13502 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
141 packets, 10651 bytes
5 minute offered rate 0 bps, drop rate 0 bps

Again, note the color-highlighted sections. We can observe that 1 VoIP packet came ingress to Wan2 from Wan1's Gi3/0.1 interface (as expected due to Wan1's policy routing). It is also apparent that the packet left out the serial interface, as opposed to hairpinning back to Wan1 as in the previous link failure scenario. Because it left Wan2 out the T1, the packet obviously met its fate in the form of the DenyAll ACL applied at Branch1. This explains why the packet never made it to the Test PC.

Again, I want to stress that while we are mimicking this behavior by using an ingress ACL on Branch1, this is not the particular scenario we are really worried about. But it does go to prove the point that some partial failure in communication can degrade the ability for two routers to talk, even though a physical interface stays up/up. This example shows a setback in using policy-based routing for a decision point.

Make PBR More Dynamic

In order to make PBR a bit more dynamic, we are going to take a bit difference approach than what is normally used for this. If you look for common ways to make static routing or policy-based routing more dynamic, there are plenty of scenarios that show how to set up IP SLA tracking. This works fine and can definitely do the job. Another method that can solve this problem is Bidirectional Forwarding Detection (BFD), which I may cover at a higher level in a different post. In this post, however, I'd like to introduce another method.

We already have a pretty high confidence level of the neighbor's health because we're running OSPF. We rely on this neighbor state and dynamic route calculation for most all of our routing, so why not piggy-back on it for our PBR? We can take advantage of EEM scripting to allow us to do this.

! ------ Script for when Neighbor drops ------
!
event manager applet PBR-Down
event syslog pattern ".*%OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from FULL to.*"
action 1.0 cli command "enable"
action 10.0 syslog msg "PBR Removed from Interfaces due to OSPF Nei State S2/0"
action 2.0 cli command "config t"
action 3.0 cli command "interface Gig1/0"
action 4.0 cli command "no ip policy route-map PBR-VoIP-to-T1"
action 5.0 cli command "interface G3/0.1"
action 6.0 cli command "no ip policy route-map PBR-VoIP-to-T1"
action 7.0 cli command "interface G3/0.5"
action 8.0 cli command "no ip policy route-map PBR-VoIP-to-T1"
action 9.0 cli command "end"
!
!
! ------ Script for when Neighbor returns ------
!
event manager applet PBR-Up
event syslog pattern ".*%OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from LOADING to FULL, Loading Done$"
action 1.0 cli command "enable"
action 10.0 syslog msg "PBR Enabled on Interfaces due to OSPF Nei State Full S2/0"
action 2.0 cli command "config t"
action 3.0 cli command "interface Gig1/0"
action 4.0 cli command "ip policy route-map PBR-VoIP-to-T1"
action 5.0 cli command "interface Gig3/0.1"
action 6.0 cli command "ip policy route-map PBR-VoIP-to-T1"
action 7.0 cli command "interface Gig3/0.5"
action 8.0 cli command "ip policy route-map PBR-VoIP-to-T1"
action 9.0 cli command "end"
!

By configuring the above EEM scripts on Wan2, we should now be relying on OSPF neighbor state to control the enforcement of our PBR. If this works, of course we would want to implement a similar configuration on Branch1 to avoid the reverse situation. In any case, it seems as though we are ready to test.

Retrying Failures with Dynamic PBR

Since the partial failure scenario blew a giant hole in our PBR, we should test it again now that we have tried to make PBR a bit smarter. Let's start by implementing the DenyAll ACL in Branch1's T1 interface, in the inbound direction. Note the difference in behavior on Wan2 this time:

! ------ "Break" the T1 at Branch1 ------
!
Branch1#config t
Enter configuration commands, one per line. End with CNTL/Z.
Branch1(config)#int s2/0
Branch1(config-if)#ip access-group DenyAll in
Branch1(config-if)#end
Branch1#
*Jun 30 22:23:55.872: %SYS-5-CONFIG_I: Configured from console by console
Branch1#
*Jun 30 22:24:25.772: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.2 on Serial2/0 from FULL to DOWN, Neighbor Down: Dead timer expired
Branch1#
!
!
! ------ Observe our EEM script at Wan2 ------
!
Wan2#
*Jun 30 22:24:27.800: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from FULL to INIT, 1-Way
*Jun 30 22:24:27.876: %HA_EM-6-LOG: PBR-Down: PBR Removed from Interfaces due to OSPF Nei State S2/0
Wan2#
*Jun 30 22:24:28.052: %SYS-5-CONFIG_I: Configured from console by on vty0 (EEM:PBR-Down)
Wan2#

Now that we've successfully broken the T1 again, let's set up our Test PC to listen and generate a "VoIP" test packet from Core1.

! ------ Start listener on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
!
!
! ------ Generate Syslog on Core1 ------
!
Core1#send log Testing 'Smart PBR' with T1 in 'semi-failed' state
Core1#
*Jun 30 22:29:44.792: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 in 'semi-failed' state
Core1#
!
!
! ------ Observe Syslog on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
connect to [172.16.10.10] from (UNKNOWN) [10.0.11.4] 51054
<186>24: *Jun 30 22:29:44.792: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 in 'semi-failed' state

Alright, that's a good sign! By using EEM to automatically remove the PBR policy when the "target" router loses OSPF neighborship, we have successfully forced Wan2 to hairpin the VoIP traffic right back to Wan1 (just like in the link-failure scenario). Again, the routing is sub-optimal, but it will get the job done. See the output below for confirmation.

Wan2#show policy-map interface | inc /|Service|Class|,
...
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 156 bytes
1 packets, 156 bytes
Class-map: class-default (match-any)
95 packets, 8930 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
109 packets, 12677 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.5
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
97 packets, 9154 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 160 bytes
1 packets, 160 bytes
Class-map: class-default (match-any)
96 packets, 7688 bytes
5 minute offered rate 0 bps, drop rate 0 bps

The last thing to check with this before calling it a day is to make sure that once we restore connectivity, the PBR kicks back in and Wan2 sends the packet out its serial interface. In the output below, we'll remove the ACL on Branch1, and then watch EEM restore our PBR on Wan2:

! ------ Fix the T1 ------
!
Branch1#config t
Enter configuration commands, one per line. End with CNTL/Z.
Branch1(config)#int s2/0
Branch1(config-if)#no ip access-group DenyAll in
Branch1(config-if)#end
Branch1#
*Jun 30 22:57:26.344: %SYS-5-CONFIG_I: Configured from console by console
Branch1#
*Jun 30 22:57:29.108: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.2 on Serial2/0 from LOADING to FULL, Loading Done
Branch1#
!
!
! ------ Watch EEM turn on PBR ------
!
Wan2#
*Jun 30 22:57:28.876: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from INIT to 2WAY, 2-Way Received
*Jun 30 22:57:28.876: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from 2WAY to EXSTART, AdjOK?
*Jun 30 22:57:28.880: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from EXSTART to EXCHANGE, Negotiation Done
*Jun 30 22:57:28.896: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from EXCHANGE to LOADING, Exchange Done
*Jun 30 22:57:28.896: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from LOADING to FULL, Loading Done
*Jun 30 22:57:28.956: %HA_EM-6-LOG: PBR-Up: PBR Enabled on Interfaces due to OSPF Nei State Full S2/0
Wan2#
*Jun 30 22:57:29.144: %SYS-5-CONFIG_I: Configured from console by on vty0 (EEM:PBR-Up)
Wan2#
Wan2#clear counters
Clear "show interface" counters on all interfaces [confirm]
Wan2#
*Jun 30 22:58:38.944: %CLEAR-5-COUNTERS: Clear counter on all interfaces by console
Wan2#

Finally, we can generate our "VoIP" packet once again from Core1 and ensure it not only still arrives to the Test PC in Branch1, but arrives via the T1 link. The output below will walk us through it.

! ------ Initiate Traffic from Core1 ------
!
Core1#send log Testing 'Smart PBR' with T1 in 'semi-failed' state
Core1#
*Jun 30 22:29:44.792: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 in 'semi-failed' state
Core1#send log Testing 'Smart PBR' with T1 after 'semi-failed' state
Core1#
*Jun 30 23:01:00.364: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 after 'semi-failed' state
Core1#
!
!
! ------ Observe Syslog on Test PC ------
!
root@bt:~# nc -luvvnp 16390
listening on [any] 16390 ...
connect to [172.16.10.10] from (UNKNOWN) [10.0.11.4] 51054
<186>25: *Jun 30 23:01:00.364: %SYS-2-LOGMSG: Message from 0(): Testing 'Smart PBR' with T1 after 'semi-failed' state
!
!
! ------ View Wan2 Policy-map Counts ------
!
Wan2#show policy-map interface | inc /|Service|Class|,
...
Serial2/0
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
143 packets, 12012 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 149 bytes
1 packets, 149 bytes
Class-map: class-default (match-any)
300 packets, 20166 bytes
5 minute offered rate 0 bps, drop rate 0 bps
GigabitEthernet3/0.1
Service-policy input: PBR-Counters
Class-map: PBR-VoIP (match-any)
1 packets, 159 bytes
1 packets, 159 bytes
Class-map: class-default (match-any)
143 packets, 13442 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Service-policy output: PBR-Counters
Class-map: PBR-VoIP (match-any)
0 packets, 0 bytes
0 packets, 0 bytes
Class-map: class-default (match-any)
167 packets, 19082 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Wan2#

As seen above, when the OSPF neighborship came back, PBR came back on. This allowed our VoIP traffic to continue along the T1 as we had wanted.

Conclusion

This has been a long, drawn-out lab, so thanks for sticking with it. I'll take a moment to repeat my sentiments on policy-based routing: it's ugly, hard to troubleshoot, and can get out of hand quick. However, it can be a necessity in certain scenarios. It should also be noted that the above lab does not solve every issue; especially as it essentially ignored the Branch1 PBR configuration. Also, there are other potential routing loops we did not address. Regardless, this lab should serve as a decent primer as for why and how to implement policy-based routing on Cisco routers.

Saturday, June 22, 2013

OSPF passive-interface

It is common when running OSPF to enable "passive-interface default" in the ospf sub-configuration. This little command makes it so that no interfaces are automatically enabled for OSPF; only explicitly defined interfaces via "no passive-interface <interface>" are active from an OSPF standpoint. The reason for this is relatively simple; it offers protection against rogue neighborships forming on unexpected ports.

The reason for this post, though, is that I discovered something interesting about the command 'passive-interface default' yesterday. Whereas most commands that show up in a running-configuration can be re-applied without any impact (hey, the command's already there, right?), that's not the case with this one.

When re-applying "passive-interface default" to an OSPF configuration, any previously-defined "no passive-interface <interface>" commands will simply drop out of the configuration. The impact? This means that any neighborships formed will quickly drop, as OSPF no longer will consider those interfaces active.

I realize it's a corner case that someone would push this command to a router where it already exists. The fact that it does go against the typical IOS rule-of-thumb that "re-applying config lines is safe" is worth pointing out. So, all you copy-and-pasters, take note that a copy/paste of certain already-applied commands could put you in hot water.

Also, part 2 of the PBR lab is underway will be published this coming week. Cheers!

Monday, June 3, 2013

Lab: Routing to a branch office using OSPF

Introduction

In this lab, we will spend time with using OSPF for dynamic routing. OSPF is an open protocol commonly supported by many vendors and is therefore widely deployed in many enterprise networks. This lab will focus on using OSPF to dynamically route between a remote branch office and the main office. The goals of this lab are as follows:

Review the basics of OSPF metric calculation
Understand the benefits of area segmentation and where totally stub areas can fit
Configure OSPF route summarization and understand the benefits
Identify design flaws with the lab network that can cause harm in using both totally stub areas and route summarization

We'll walk into this lab with OSPF up and running already. The OSPF topology is split into two areas. There are two Area Border Routers (ABR's) that split the branch office into a separate area than the backbone area. It should be assumed that other similarly configured branch offices would likely be in this area too. Physical connectivity for the branch office is provided by dual uplinks for redundancy. One is a 100Mbit connection -- for fun, let's pretend it's a bridged wireless point-to-point link. The other connection is a T1; for story-telling, we'll say it's a private leased-line connection.

Base Lab Configuration

Branch1:

!
interface Loopback0
ip address 172.16.255.3 255.255.255.255
!
interface FastEthernet1/0
description To-WAN1
ip address 172.16.11.1 255.255.255.254
ip ospf network point-to-point
duplex auto
speed auto
!
interface Serial2/0
description To-WAN2
ip address 172.16.11.3 255.255.255.254
ip ospf network point-to-point
serial restart-delay 0
!
interface GigabitEthernet3/0
ip address 172.16.10.1 255.255.255.0
negotiation auto
!
router ospf 1
router-id 172.16.255.3
log-adjacency-changes
auto-cost reference-bandwidth 100000
network 172.16.10.0 0.0.0.255 area 0.0.0.5
network 172.16.11.0 0.0.0.1 area 0.0.0.5
network 172.16.11.2 0.0.0.1 area 0.0.0.5
network 172.16.255.3 0.0.0.0 area 0.0.0.5
!

Wan1:

!
interface Loopback0
ip address 172.16.255.1 255.255.255.255
!
interface GigabitEthernet1/0
description To-Core1
ip address 10.0.11.1 255.255.255.254
ip ospf network point-to-point
negotiation auto
!
interface FastEthernet2/0
description To-Branch1
ip address 172.16.11.0 255.255.255.254
ip ospf network point-to-point
duplex auto
speed auto
!
router ospf 1
router-id 172.16.255.1
log-adjacency-changes
auto-cost reference-bandwidth 100000
network 10.0.11.0 0.0.0.1 area 0.0.0.0
network 172.16.11.0 0.0.0.1 area 0.0.0.5
network 172.16.255.1 0.0.0.0 area 0.0.0.0
!

Wan2:

!
interface Loopback0
ip address 172.16.255.2 255.255.255.255
!
interface GigabitEthernet1/0
description To-Core2
ip address 10.0.11.3 255.255.255.254
ip ospf network point-to-point
negotiation auto
!
interface Serial2/0
description To-Branch1
ip address 172.16.11.2 255.255.255.254
ip ospf network point-to-point
serial restart-delay 0
!
router ospf 1
router-id 172.16.255.2
log-adjacency-changes
auto-cost reference-bandwidth 100000
network 10.0.11.2 0.0.0.1 area 0.0.0.0
network 172.16.11.2 0.0.0.1 area 0.0.0.5
network 172.16.255.2 0.0.0.0 area 0.0.0.0
!

Core1:

!
interface Loopback0
ip address 10.0.255.1 255.255.255.255
!
interface FastEthernet0/0
no ip address
shutdown
duplex half
!
interface GigabitEthernet1/0
description To-Core2
ip address 10.0.11.4 255.255.255.254
ip ospf network point-to-point
negotiation auto
!
interface GigabitEthernet2/0
description To-Wan1
ip address 10.0.11.0 255.255.255.254
ip ospf network point-to-point
negotiation auto
!
router ospf 1
router-id 10.0.255.1
log-adjacency-changes
auto-cost reference-bandwidth 100000
network 10.0.11.0 0.0.0.1 area 0.0.0.0
network 10.0.11.4 0.0.0.1 area 0.0.0.0
network 10.0.255.1 0.0.0.0 area 0.0.0.0
!

Core2:

!
interface Loopback0
ip address 10.0.255.2 255.255.255.255
!
interface FastEthernet0/0
no ip address
shutdown
duplex half
!
interface GigabitEthernet1/0
description To-Core1
ip address 10.0.11.5 255.255.255.254
ip ospf network point-to-point
negotiation auto
!
interface GigabitEthernet2/0
description To-Wan2
ip address 10.0.11.2 255.255.255.254
ip ospf network point-to-point
negotiation auto
!
router ospf 1
router-id 10.0.255.2
log-adjacency-changes
auto-cost reference-bandwidth 100000
network 10.0.11.2 0.0.0.1 area 0.0.0.0
network 10.0.11.4 0.0.0.1 area 0.0.0.0
network 10.0.255.2 0.0.0.0 area 0.0.0.0
!

I'll point out a couple highlights from the starting configuration in this lab. First, I've used /31 subnets to interconnect routers, as they're all point-to-point connections. Some people prefer /30's, but as long as the devices support /31's I don't see a reason to burn the address space. These interfaces are also set to use the OSPF topology type "point-to-point." A quick refresher on this -- OSPF will use multicast Hello's for neighbor discovery, but no DR/BDR election will occur. The latter part is important because it decreases the time to reach an adjacent state. Final note from the starting configuration, I chose to use the dotted decimal notation for OSPF areas. I don't have any particular reason for this choice, but it is important to be aware that this option exists.

Looking at the topology and remembering that OSPF's metric is bandwidth-driven, we should modify our reference-bandwidth to allow for modern interface speeds to be weighted accurately. Note that this really won't have much impact on this lab, but it's good practice. This can be accomplished by adding auto-cost reference-bandwidth 100000 to all routers' configurations. This effectively sets 100 Gbit as the bandwidth ceiling as far as metric calculation goes. Since our lab has nothing over 1 Gbit, this is more than enough head space.

Since OSPF metric is inversely proportional to link bandwidth, it makes sense that with a vanilla OSPF configuration, the 100Mbit link is seen as preferable when compared to the Serial interface. The difference in bandwidth means that the route for Branch1 to reach Wan2's loopback is to go through Wan1 -- pretty significant! Note the output below, showing that every learned OSPF route is avoiding the T1.

Branch1#show ip route ospf
172.16.0.0/16 is variably subnetted, 6 subnets, 3 masks
O IA 172.16.255.2/32 [110/1301] via 172.16.11.0, 00:22:43, FastEthernet1/0
O IA 172.16.255.1/32 [110/1001] via 172.16.11.0, 00:24:13, FastEthernet1/0
10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
O IA 10.0.11.0/31 [110/1100] via 172.16.11.0, 00:23:54, FastEthernet1/0
O IA 10.0.11.2/31 [110/1300] via 172.16.11.0, 00:22:43, FastEthernet1/0
O IA 10.0.11.4/31 [110/1200] via 172.16.11.0, 00:23:05, FastEthernet1/0
O IA 10.0.255.1/32 [110/1101] via 172.16.11.0, 00:23:54, FastEthernet1/0
O IA 10.0.255.2/32 [110/1201] via 172.16.11.0, 00:23:05, FastEthernet1/0

Stub Routing

With a branch router like this, it is often a good idea to utilize a separate OSPF area, and utilizing a stub area fits the bill well. There are several advantages to this type of design.

First, by segmenting the OSPF domain into multiple areas, it allows OSPF to scale well. When routers share the same OSPF area, they share all the details of their topology, including Type 1 and Type 2 LSA's. However, to share topology information between areas, the ABR's use Type 3 LSA's. These act as a summary of information about the networks in an area, as opposed to all the minutia that would be passed along between all of the Type 1's and 2's. Also, the separation of areas could help reduce the impact from an issue impacting one particular area.

Another reason it is good to create a separate area for a logical "block" of the network is that OSPF allows route summarization at area boundaries. If you are able to plan your IP addressing in such a way that the networks in that area can be summarized, even if only partially, it can help to reduce the overall size of the enterprise routing table.

When considering a stub area versus a normal area, there are more advantages in this design. First, it would be a bad idea to have traffic from the high-bandwidth area 0 network use a low-bandwidth branch office as a transit. This could saturate the site, or in some cases where bandwidth is metered, could cost a significant amount of money. Defining the area for branch offices as a stub area prevents the branch offices from being a transit hop.

Furthermore, using a Totally Stub area allows the ABR's to only inject a default route to the WAN routers, as opposed to advertising all other area or externally injected routes. This keeps the routing tables of the stub area routers small, which may allow for a lower-cost model router or multilayer switch to be used for the remote site. These branch site routers would only have routes for their own stub area, plus a default route pointing outward towards one or more ABR's.

In order to take advantage of both stub areas and summarization at area boundaries, some amount of configuration will need to take place. Keep in mind that OSPF has several pieces of information that need to match in order to establish neighborship, and one such piece is OSPF area type. This is important to note because changing the OSPF area will cause the neighborships to fail and re-establish. We'll implement the configuration on the branch router and the ABR's, and analyze afterwards.

Branch1#config t
Enter configuration commands, one per line. End with CNTL/Z.
Branch1(config)#router ospf 1
Branch1(config-router)#area 0.0.0.5 stub
Branch1(config-router)#end
Branch1#
*Jun 2 11:51:38.352: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.2 on Serial2/0 from FULL to DOWN, Neighbor Down: Adjacency forced to reset
*Jun 2 11:51:38.356: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.1 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Adjacency forced to reset
*Jun 2 11:51:38.704: %SYS-5-CONFIG_I: Configured from console by console
Branch1#

Wan1#config t
Enter configuration commands, one per line. End with CNTL/Z.
Wan1(config)#router ospf 1
Wan1(config-router)#area 0.0.0.5 stub no-summary
Wan1(config-router)#end
Wan1#

Wan2#config t
Enter configuration commands, one per line. End with CNTL/Z.
Wan2(config)#router ospf 1
*Jun 2 11:52:13.376: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from FULL to DOWN, Neighbor Down: Dead timer expired
Wan2(config-router)#area 0.0.0.5 stub no-summary
Wan2(config-router)#end
Wan2#

Note that when changing the area on Branch1, the neighborships were forced to reset. On Wan1, I reconfigured the OSPF area configuration (also forcing reset, not shown) to match. With Wan2, though, you can see the direct result of resetting the neighborship on Branch1. When the dead timer expired, Wan2 also marked the neighborship down. Also, note that in order to specify the area as totally stub, the ABR configuration needs to contain stub no-summary instead of just stub.

Now that these changes have been made, compare the previous IP routing table (output limited to OSPF routes) for Branch1 to how it is now:

Branch1#show ip route ospf
O*IA 0.0.0.0/0 [110/1001] via 172.16.11.0, 00:05:05, FastEthernet1/0
Branch1#

Perfect! The totally stub area is working to automatically summarize the rest of the network at the ABR's. Now, let's configure summarization from area 5 into area 0. Even though we only have one site defined, for argument's sake, say that all branch offices have subnets that fall into 172.16.0.0/20. We will use that as our summary of routes in area 5. Note that with area summarization - there is no requirement to summarize everything; partial summarization is definitely acceptable.

Wan1#config t
Enter configuration commands, one per line. End with CNTL/Z.
Wan1(config)#router ospf 1
Wan1(config-router)#area 0.0.0.5 range 172.16.0.0 255.255.240.0
Wan1(config-router)#end
Wan1#

Wan2#config t
Enter configuration commands, one per line. End with CNTL/Z.
Wan2(config)#router ospf 1
Wan2(config-router)#area 0.0.0.5 range 172.16.0.0 255.255.240.0
Wan2(config-router)#end
Wan2#

After this identical configuration has been made on the ABR's, a quick verification on an area 0 router will show that the routing table no longer has the 172.16.10.0/24 route and now has a summarized /20 route instead. Also, note that we did not match the loopbacks in the summarization, so there is still an inter-area OSPF route for the Branch1 loopback address.

Core1#show ip route ospf
172.16.0.0/16 is variably subnetted, 4 subnets, 2 masks
O IA 172.16.255.3/32 [110/1101] via 10.0.11.1, 00:00:55, GigabitEthernet2/0
O 172.16.255.2/32 [110/201] via 10.0.11.5, 10:08:11, GigabitEthernet1/0
O 172.16.255.1/32 [110/101] via 10.0.11.1, 10:08:32, GigabitEthernet2/0
O IA 172.16.0.0/20 [110/1100] via 10.0.11.1, 00:00:35, GigabitEthernet2/0
10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
O 10.0.11.2/31 [110/200] via 10.0.11.5, 10:08:11, GigabitEthernet1/0
O 10.0.255.2/32 [110/101] via 10.0.11.5, 10:08:32, GigabitEthernet1/0
Core1#

Watching it all unravel...

So far, so good, right? We've successfully segmented our OSPF domain and used route summarization to keep our routing tables lightweight. At first glance, we might just call this project a success and move on. However, a good test plan would include different failure scenarios. Upon testing some, it should become apparent if there are problems with our design as implemented.

Test 1: Fiber cut between Wan1 and Core1

Wan1 is the currently-favored router for Branch1 to route through, and it only has 1 link coming off it in area 0, which leads to Core1. We might assume then, that if we drop connectivity to Core1 the default route flooding into area 5 should also be revoked, allowing traffic from Branch1 to instead take the T1 link to Wan2.

However, we find that when we shutdown the link between Core1 and Wan1, the default route is still appearing! To show the impact, let's test connectivity to Core2's loopback. The following output is taken from the Test PC shown in the network diagram, its source IP is 172.16.10.10 (for reference).

root@bt:~# ping 10.0.255.2
PING 10.0.255.2 (10.0.255.2) 56(84) bytes of data.
From 172.16.11.0 icmp_seq=1 Time to live exceeded
From 172.16.11.0 icmp_seq=2 Time to live exceeded
^C
--- 10.0.255.2 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1001ms

root@bt:~# traceroute -m 5 10.0.255.2
traceroute to 10.0.255.2 (10.0.255.2), 5 hops max, 60 byte packets
1 172.16.10.1 (172.16.10.1) 4.169 ms 14.349 ms 24.523 ms
2 172.16.11.0 (172.16.11.0) 34.796 ms 44.909 ms 55.080 ms
3 172.16.11.1 (172.16.11.1) 65.240 ms 75.494 ms 85.698 ms
4 172.16.11.0 (172.16.11.0) 146.623 ms 156.868 ms 167.097 ms
5 172.16.11.1 (172.16.11.1) 177.256 ms 187.501 ms 197.608 ms
root@bt:~#

Note that we have a routing loop (TTL exceeded is often a routing loop, and then confirmed by the traceroute). Wan1 is now adding the area 5 default route from Wan2 into its routing table (as learned through Branch1), but Wan1 is still advertising a default route into area 5 as well, and its metric is better than Wan2. This means that Branch1 will send traffic to Wan1, but Wan1 will want to send it through Branch1 to Wan2. While we should definitely note the fact that Branch1 is being used as a transit for area 5 between the ABR's, let's hold that for now. The crux of the issue sits with the fact that Wan1 is advertising a default route when it has no links in area 0. Or are we missing something?

Wan1#show ip ospf interface brief
Interface PID Area IP Address/Mask Cost State Nbrs F/C
Lo0 1 0.0.0.0 172.16.255.1/32 1 LOOP 0/0
Gi1/0 1 0.0.0.0 10.0.11.1/31 100 DOWN 0/0
Fa2/0 1 0.0.0.5 172.16.11.0/31 1000 P2P 1/1
Wan1#

Lo and behold, notice that we have Wan1's loopback sitting in area 0! Let's move that to area 5 and see what happens next.

Wan1#config t
Enter configuration commands, one per line. End with CNTL/Z.
Wan1(config)#router ospf 1
Wan1(config-router)#network 172.16.255.1 0.0.0.0 area 0.0.0.5
Wan1(config-router)#end
Wan1#
*Jun 2 12:56:37.328: %OSPF-6-AREACHG: 172.16.255.1/32 changed from area 0.0.0.0 to area 0.0.0.5
Wan1#

Branch1#show ip route ospf
172.16.0.0/16 is variably subnetted, 5 subnets, 3 masks
O 172.16.255.1/32 [110/1001] via 172.16.11.0, 00:00:14, FastEthernet1/0
O*IA 0.0.0.0/0 [110/64767] via 172.16.11.2, 00:00:20, Serial2/0
Branch1#

root@bt:~# ping -c 1 10.0.255.2
PING 10.0.255.2 (10.0.255.2) 56(84) bytes of data.
64 bytes from 10.0.255.2: icmp_seq=1 ttl=253 time=31.1 ms

With that, it is apparent that having a loopback in area 0 on an ABR configured for a totally stub network is a bad idea. After moving the loopback to area 5, the branch router no longer sees a summary from Wan1, and it allows the test PC to successfully ping Core2's loopback through the T1 interface. Note that the loopback on Wan2 is also in area 0, so this will need to be fixed too.

Test 2: Failure of one branch office in area

This test will require a slight modification of the configuration in order to simulate having another branch office connected in area 5. To achieve this, we will add a loopback on each Wan router to simulate another branch office. Next, we will take down connectivity between Wan1 and Branch1, leaving the test PC network only uplinked to Wan2.

Wan1(config)#int lo1
Wan1(config-if)#descr Fake branch
Wan1(config-if)#ip address 172.16.12.1 255.255.255.0
Wan1(config-if)#exit
Wan1(config)#router ospf 1
Wan1(config-router)#network 172.16.12.0 0.0.0.255 area 0.0.0.5
Wan1(config-router)#end
Wan1#

Wan2(config)#int lo1
Wan2(config-if)#descr Fake branch2
Wan2(config-if)#ip address 172.16.14.1 255.255.255.0
Wan2(config-if)#router ospf 1
Wan2(config-router)#network 172.16.14.0 0.0.0.255 area 0.0.0.5
Wan2(config-router)#end
Wan2#

Branch1#config t
Enter configuration commands, one per line. End with CNTL/Z.
Branch1(config)#int fa1/0
Branch1(config-if)#shut
Branch1(config-if)#end
Branch1#show ip route ospf
172.16.0.0/16 is variably subnetted, 5 subnets, 3 masks
O 172.16.255.2/32 [110/64767] via 172.16.11.2, 00:00:21, Serial2/0
O 172.16.14.1/32 [110/64767] via 172.16.11.2, 00:00:21, Serial2/0
O*IA 0.0.0.0/0 [110/64767] via 172.16.11.2, 00:00:21, Serial2/0
Branch1#

On the surface, everything looks fine from Branch1. The default route is coming from the T1. However, pinging between Core1's loopback and the test PC network fails. Let's look at why:

Core1#ping 172.16.10.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.10.1, timeout is 2 seconds:
U.U.U
Success rate is 0 percent (0/5)
Core1#show ip route ospf
172.16.0.0/16 is variably subnetted, 4 subnets, 2 masks
O IA 172.16.255.3/32
[110/64967] via 10.0.11.5, 00:06:00, GigabitEthernet1/0
O IA 172.16.255.2/32 [110/201] via 10.0.11.5, 00:19:34, GigabitEthernet1/0
O IA 172.16.255.1/32 [110/101] via 10.0.11.1, 00:18:24, GigabitEthernet2/0
O IA 172.16.0.0/20 [110/101] via 10.0.11.1, 00:11:54, GigabitEthernet2/0
10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
O 10.0.11.2/31 [110/200] via 10.0.11.5, 00:27:38, GigabitEthernet1/0
O 10.0.255.2/32 [110/101] via 10.0.11.5, 00:27:38, GigabitEthernet1/0
Core1#

Wan1#show ip route ospf
172.16.0.0/16 is variably subnetted, 5 subnets, 3 masks
O IA 172.16.255.3/32
[110/65067] via 10.0.11.0, 00:08:33, GigabitEthernet1/0
O IA 172.16.255.2/32 [110/301] via 10.0.11.0, 00:08:33, GigabitEthernet1/0
O 172.16.0.0/20 is a summary, 00:08:33, Null0
10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
O 10.0.11.2/31 [110/300] via 10.0.11.0, 00:20:51, GigabitEthernet1/0
O 10.0.11.4/31 [110/200] via 10.0.11.0, 00:20:51, GigabitEthernet1/0
O 10.0.255.1/32 [110/101] via 10.0.11.0, 00:20:51, GigabitEthernet1/0
O 10.0.255.2/32 [110/201] via 10.0.11.0, 00:20:51, GigabitEthernet1/0

Notice that Core1 is receiving a summary route from Wan1 for 172.16.0.0/20, as expected, because there is at least one remote site in that summary range that is still up (our fake branch loopback). However, Core1 no longer has a route for 172.16.10.0/24 as it is no longer connected. It's best matching route is the Null0 route from the summary.

How can this issue be solved? We can create a point-to-point connection between the ABR's, with a subinterface on the link in each shared area. Coincidentally, this would also resolve both issues from Test1. This would have given an alternate area 0 path if Wan1 / Core1 fiber was lost, and also would have allowed an alternate path for area 5 without having to transit through a branch router. Below is the configuration added, with an updated topology map to reflect the changes.

! Wan1:
!
interface GigabitEthernet3/0.1
encapsulation dot1Q 1 native
ip address 10.0.11.6 255.255.255.254
ip ospf network point-to-point

!
interface GigabitEthernet3/0.5
encapsulation dot1Q 5
ip address 172.16.11.4 255.255.255.254
ip ospf network point-to-point

!
router ospf 1
network 10.0.11.6 0.0.0.1 area 0.0.0.0
network 172.16.11.4 0.0.0.1 area 0.0.0.5

! Wan2:
!
interface GigabitEthernet3/0.1
encapsulation dot1Q 1 native
ip address 10.0.11.7 255.255.255.254
ip ospf network point-to-point
!
interface GigabitEthernet3/0.5
encapsulation dot1Q 5
ip address 172.16.11.5 255.255.255.254
ip ospf network point-to-point
!
router ospf 1
network 10.0.11.6 0.0.0.1 area 0.0.0.0
network 172.16.11.4 0.0.0.1 area 0.0.0.5

At this point, Core1 will still receive its summarized route from Wan1. The difference is that Wan1 will now have a more specific route in area 5, across to Wan2 on interface Gi3/0.5.

Conclusion

OSPF solutions like area segmentation and route summarization can be quite beneficial, but it is important to understand what impact they could have on ensuring the network is free from routing loops and traffic is being routed across the most optimal path. In many cases, the risks can be properly mitigated -- it's just important to understand what risks you may be facing.