Monday, June 3, 2013

Lab: Routing to a branch office using OSPF

Introduction

In this lab, we will spend time with using OSPF for dynamic routing.  OSPF is an open protocol commonly supported by many vendors and is therefore widely deployed in many enterprise networks. This lab will focus on using OSPF to dynamically route between a remote branch office and the main office. The goals of this lab are as follows:

  • Review the basics of OSPF metric calculation
  • Understand the benefits of area segmentation and where totally stub areas can fit
  • Configure OSPF route summarization and understand the benefits
  • Identify design flaws with the lab network that can cause harm in using both totally stub areas and route summarization



We'll walk into this lab with OSPF up and running already. The OSPF topology is split into two areas.  There are two Area Border Routers (ABR's) that split the branch office into a separate area than the backbone area. It should be assumed that other similarly configured branch offices would likely be in this area too. Physical connectivity for the branch office is provided by dual uplinks for redundancy. One is a 100Mbit connection -- for fun, let's pretend it's a bridged wireless point-to-point link. The other connection is a T1; for story-telling, we'll say it's a private leased-line connection.

Base Lab Configuration


Branch1:

!
interface Loopback0
 ip address 172.16.255.3 255.255.255.255
!
interface FastEthernet1/0
 description To-WAN1
 ip address 172.16.11.1 255.255.255.254
 ip ospf network point-to-point
 duplex auto
 speed auto
!
interface Serial2/0
 description To-WAN2
 ip address 172.16.11.3 255.255.255.254
 ip ospf network point-to-point
 serial restart-delay 0
!
interface GigabitEthernet3/0
 ip address 172.16.10.1 255.255.255.0
 negotiation auto
!
router ospf 1
 router-id 172.16.255.3
 log-adjacency-changes
 auto-cost reference-bandwidth 100000
 network 172.16.10.0 0.0.0.255 area 0.0.0.5
 network 172.16.11.0 0.0.0.1 area 0.0.0.5
 network 172.16.11.2 0.0.0.1 area 0.0.0.5
 network 172.16.255.3 0.0.0.0 area 0.0.0.5
!


Wan1:

!
interface Loopback0
 ip address 172.16.255.1 255.255.255.255
!
interface GigabitEthernet1/0
 description To-Core1
 ip address 10.0.11.1 255.255.255.254
 ip ospf network point-to-point
 negotiation auto
!
interface FastEthernet2/0
 description To-Branch1
 ip address 172.16.11.0 255.255.255.254
 ip ospf network point-to-point
 duplex auto
 speed auto
!
router ospf 1
 router-id 172.16.255.1
 log-adjacency-changes
 auto-cost reference-bandwidth 100000
 network 10.0.11.0 0.0.0.1 area 0.0.0.0
 network 172.16.11.0 0.0.0.1 area 0.0.0.5
 network 172.16.255.1 0.0.0.0 area 0.0.0.0
!


Wan2:

!
interface Loopback0
 ip address 172.16.255.2 255.255.255.255
!
interface GigabitEthernet1/0
 description To-Core2
 ip address 10.0.11.3 255.255.255.254
 ip ospf network point-to-point
 negotiation auto
!
interface Serial2/0
 description To-Branch1
 ip address 172.16.11.2 255.255.255.254
 ip ospf network point-to-point
 serial restart-delay 0
!
router ospf 1
 router-id 172.16.255.2
 log-adjacency-changes
 auto-cost reference-bandwidth 100000
 network 10.0.11.2 0.0.0.1 area 0.0.0.0
 network 172.16.11.2 0.0.0.1 area 0.0.0.5
 network 172.16.255.2 0.0.0.0 area 0.0.0.0
!      


Core1:

!
interface Loopback0
 ip address 10.0.255.1 255.255.255.255
!
interface FastEthernet0/0
 no ip address
 shutdown
 duplex half
!
interface GigabitEthernet1/0
 description To-Core2
 ip address 10.0.11.4 255.255.255.254
 ip ospf network point-to-point
 negotiation auto
!
interface GigabitEthernet2/0
 description To-Wan1
 ip address 10.0.11.0 255.255.255.254
 ip ospf network point-to-point
 negotiation auto
!      
router ospf 1
 router-id 10.0.255.1
 log-adjacency-changes
 auto-cost reference-bandwidth 100000
 network 10.0.11.0 0.0.0.1 area 0.0.0.0
 network 10.0.11.4 0.0.0.1 area 0.0.0.0
 network 10.0.255.1 0.0.0.0 area 0.0.0.0
!      


Core2:

!
interface Loopback0
 ip address 10.0.255.2 255.255.255.255
!
interface FastEthernet0/0
 no ip address
 shutdown
 duplex half
!
interface GigabitEthernet1/0
 description To-Core1
 ip address 10.0.11.5 255.255.255.254
 ip ospf network point-to-point
 negotiation auto
!
interface GigabitEthernet2/0
 description To-Wan2
 ip address 10.0.11.2 255.255.255.254
 ip ospf network point-to-point
 negotiation auto
!      
router ospf 1
 router-id 10.0.255.2
 log-adjacency-changes
 auto-cost reference-bandwidth 100000
 network 10.0.11.2 0.0.0.1 area 0.0.0.0
 network 10.0.11.4 0.0.0.1 area 0.0.0.0
 network 10.0.255.2 0.0.0.0 area 0.0.0.0
!      


I'll point out a couple highlights from the starting configuration in this lab.  First, I've used /31 subnets to interconnect routers, as they're all point-to-point connections.  Some people prefer /30's, but as long as the devices support /31's I don't see a reason to burn the address space. These interfaces are also set to use the OSPF topology type "point-to-point."  A quick refresher on this -- OSPF will use multicast Hello's for neighbor discovery, but no DR/BDR election will occur. The latter part is important because it decreases the time to reach an adjacent state. Final note from the starting configuration, I chose to use the dotted decimal notation for OSPF areas. I don't have any particular reason for this choice, but it is important to be aware that this option exists.

Looking at the topology and remembering that OSPF's metric is bandwidth-driven, we should modify our reference-bandwidth to allow for modern interface speeds to be weighted accurately.  Note that this really won't have much impact on this lab, but it's good practice.  This can be accomplished by adding auto-cost reference-bandwidth 100000 to all routers' configurations.  This effectively sets 100 Gbit as the bandwidth ceiling as far as metric calculation goes.  Since our lab has nothing over 1 Gbit, this is more than enough head space.

Since OSPF metric is inversely proportional to link bandwidth, it makes sense that with a vanilla OSPF configuration, the 100Mbit link is seen as preferable when compared to the Serial interface.  The difference in bandwidth means that the route for Branch1 to reach Wan2's loopback is to go through Wan1 -- pretty significant!  Note the output below, showing that every learned OSPF route is avoiding the T1.


Branch1#show ip route ospf
     172.16.0.0/16 is variably subnetted, 6 subnets, 3 masks
O IA    172.16.255.2/32 [110/1301] via 172.16.11.0, 00:22:43, FastEthernet1/0
O IA    172.16.255.1/32 [110/1001] via 172.16.11.0, 00:24:13, FastEthernet1/0
     10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
O IA    10.0.11.0/31 [110/1100] via 172.16.11.0, 00:23:54, FastEthernet1/0
O IA    10.0.11.2/31 [110/1300] via 172.16.11.0, 00:22:43, FastEthernet1/0
O IA    10.0.11.4/31 [110/1200] via 172.16.11.0, 00:23:05, FastEthernet1/0
O IA    10.0.255.1/32 [110/1101] via 172.16.11.0, 00:23:54, FastEthernet1/0
O IA    10.0.255.2/32 [110/1201] via 172.16.11.0, 00:23:05, FastEthernet1/0

Stub Routing

With a branch router like this, it is often a good idea to utilize a separate OSPF area, and utilizing a stub area fits the bill well. There are several advantages to this type of design.

First, by segmenting the OSPF domain into multiple areas, it allows OSPF to scale well. When routers share the same OSPF area, they share all the details of their topology, including Type 1 and Type 2 LSA's.  However, to share topology information between areas, the ABR's use Type 3 LSA's. These act as a summary of information about the networks in an area, as opposed to all the minutia that would be passed along between all of the Type 1's and 2's.  Also, the separation of areas could help reduce the impact from an issue impacting one particular area.

Another reason it is good to create a separate area for a logical "block" of the network is that OSPF allows route summarization at area boundaries.  If you are able to plan your IP addressing in such a way that the networks in that area can be summarized, even if only partially, it can help to reduce the overall size of the enterprise routing table.

When considering a stub area versus a normal area, there are more advantages in this design.  First, it would be a bad idea to have traffic from the high-bandwidth area 0 network use a low-bandwidth branch office as a transit.  This could saturate the site, or in some cases where bandwidth is metered, could cost a significant amount of money.  Defining the area for branch offices as a stub area prevents the branch offices from being a transit hop.

Furthermore, using a Totally Stub area allows the ABR's to only inject a default route to the WAN routers, as opposed to advertising all other area or externally injected routes.  This keeps the routing tables of the stub area routers small, which may allow for a lower-cost model router or multilayer switch to be used for the remote site.  These branch site routers would only have routes for their own stub area, plus a default route pointing outward towards one or more ABR's.

In order to take advantage of both stub areas and summarization at area boundaries, some amount of configuration will need to take place.  Keep in mind that OSPF has several pieces of information that need to match in order to establish neighborship, and one such piece is OSPF area type.  This is important to note because changing the OSPF area will cause the neighborships to fail and re-establish. We'll implement the configuration on the branch router and the ABR's, and analyze afterwards.

Branch1#config t 
Enter configuration commands, one per line.  End with CNTL/Z. 
Branch1(config)#router ospf 1
Branch1(config-router)#area 0.0.0.5 stub
Branch1(config-router)#end
Branch1#
*Jun  2 11:51:38.352: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.2 on Serial2/0 from FULL to DOWN, Neighbor Down: Adjacency forced to reset
*Jun  2 11:51:38.356: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.1 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Adjacency forced to reset
*Jun  2 11:51:38.704: %SYS-5-CONFIG_I: Configured from console by console
Branch1#


Wan1#config t
Enter configuration commands, one per line.  End with CNTL/Z.
Wan1(config)#router ospf 1
Wan1(config-router)#area 0.0.0.5 stub no-summary
Wan1(config-router)#end
Wan1#


Wan2#config t
Enter configuration commands, one per line.  End with CNTL/Z.
Wan2(config)#router ospf 1
*Jun  2 11:52:13.376: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.255.3 on Serial2/0 from FULL to DOWN, Neighbor Down: Dead timer expired
Wan2(config-router)#area 0.0.0.5 stub no-summary 
Wan2(config-router)#end
Wan2#


Note that when changing the area on Branch1, the neighborships were forced to reset.  On Wan1, I reconfigured the OSPF area configuration (also forcing reset, not shown) to match.  With Wan2, though, you can see the direct result of resetting the neighborship on Branch1.  When the dead timer expired, Wan2 also marked the neighborship down.  Also, note that in order to specify the area as totally stub, the ABR configuration needs to contain stub no-summary instead of just stub.

Now that these changes have been made, compare the previous IP routing table (output limited to OSPF routes) for Branch1 to how it is now:

Branch1#show ip route ospf   
O*IA 0.0.0.0/0 [110/1001] via 172.16.11.0, 00:05:05, FastEthernet1/0
Branch1#


Perfect! The totally stub area is working to automatically summarize the rest of the network at the ABR's.  Now, let's configure summarization from area 5 into area 0.  Even though we only have one  site defined, for argument's sake, say that all branch offices have subnets that fall into 172.16.0.0/20.  We will use that as our summary of routes in area 5.  Note that with area summarization - there is no requirement to summarize everything; partial summarization is definitely acceptable. 

Wan1#config t
Enter configuration commands, one per line.  End with CNTL/Z.
Wan1(config)#router ospf 1
Wan1(config-router)#area 0.0.0.5 range 172.16.0.0 255.255.240.0
Wan1(config-router)#end
Wan1#


Wan2#config t
Enter configuration commands, one per line.  End with CNTL/Z.
Wan2(config)#router ospf 1
Wan2(config-router)#area 0.0.0.5 range 172.16.0.0 255.255.240.0 
Wan2(config-router)#end
Wan2#


After this identical configuration has been made on the ABR's, a quick verification on an area 0 router will show that the routing table no longer has the 172.16.10.0/24 route and now has a summarized /20 route instead.  Also, note that we did not match the loopbacks in the summarization, so there is still an inter-area OSPF route for the Branch1 loopback address.

Core1#show ip route ospf
     172.16.0.0/16 is variably subnetted, 4 subnets, 2 masks
O IA    172.16.255.3/32 [110/1101] via 10.0.11.1, 00:00:55, GigabitEthernet2/0
O       172.16.255.2/32 [110/201] via 10.0.11.5, 10:08:11, GigabitEthernet1/0
O       172.16.255.1/32 [110/101] via 10.0.11.1, 10:08:32, GigabitEthernet2/0
O IA    172.16.0.0/20 [110/1100] via 10.0.11.1, 00:00:35, GigabitEthernet2/0
     10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
O       10.0.11.2/31 [110/200] via 10.0.11.5, 10:08:11, GigabitEthernet1/0
O       10.0.255.2/32 [110/101] via 10.0.11.5, 10:08:32, GigabitEthernet1/0
Core1#


Watching it all unravel...

So far, so good, right? We've successfully segmented our OSPF domain and used route summarization to keep our routing tables lightweight. At first glance, we might just call this project a success and move on.  However, a good test plan would include different failure scenarios. Upon testing some, it should become apparent if there are problems with our design as implemented.

Test 1: Fiber cut between Wan1 and Core1

Wan1 is the currently-favored router for Branch1 to route through, and it only has 1 link coming off it in area 0, which leads to Core1.  We might assume then, that if we drop connectivity to Core1 the default route flooding into area 5 should also be revoked, allowing traffic from Branch1 to instead take the T1 link to Wan2.

However, we find that when we shutdown the link between Core1 and Wan1, the default route is still appearing! To show the impact, let's test connectivity to Core2's loopback. The following output is taken from the Test PC shown in the network diagram, its source IP is 172.16.10.10 (for reference).

root@bt:~# ping 10.0.255.2
PING 10.0.255.2 (10.0.255.2) 56(84) bytes of data.
From 172.16.11.0 icmp_seq=1 Time to live exceeded
From 172.16.11.0 icmp_seq=2 Time to live exceeded
^C
--- 10.0.255.2 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1001ms

root@bt:~# traceroute -m 5 10.0.255.2
traceroute to 10.0.255.2 (10.0.255.2), 5 hops max, 60 byte packets
 1  172.16.10.1 (172.16.10.1)  4.169 ms  14.349 ms  24.523 ms
 2  172.16.11.0 (172.16.11.0)  34.796 ms  44.909 ms  55.080 ms
 3  172.16.11.1 (172.16.11.1)  65.240 ms  75.494 ms  85.698 ms
 4  172.16.11.0 (172.16.11.0)  146.623 ms  156.868 ms  167.097 ms
 5  172.16.11.1 (172.16.11.1)  177.256 ms  187.501 ms  197.608 ms
root@bt:~# 


Note that we have a routing loop (TTL exceeded is often a routing loop, and then confirmed by the traceroute). Wan1 is now adding the area 5 default route from Wan2 into its routing table (as learned through Branch1), but Wan1 is still advertising a default route into area 5 as well, and its metric is better than Wan2.  This means that Branch1 will send traffic to Wan1, but Wan1 will want to send it through Branch1 to Wan2.  While we should definitely note the fact that Branch1 is being used as a transit for area 5 between the ABR's, let's hold that for now. The crux of the issue sits with the fact that Wan1 is advertising a default route when it has no links in area 0.  Or are we missing something?

Wan1#show ip ospf interface brief
Interface    PID   Area            IP Address/Mask    Cost  State Nbrs F/C
Lo0          1     0.0.0.0         172.16.255.1/32    1     LOOP  0/0
Gi1/0        1     0.0.0.0         10.0.11.1/31       100   DOWN  0/0
Fa2/0        1     0.0.0.5         172.16.11.0/31     1000  P2P   1/1
Wan1#


Lo and behold, notice that we have Wan1's loopback sitting in area 0!  Let's move that to area 5 and see what happens next.

Wan1#config t
Enter configuration commands, one per line.  End with CNTL/Z.
Wan1(config)#router ospf 1
Wan1(config-router)#network 172.16.255.1 0.0.0.0 area 0.0.0.5
Wan1(config-router)#end
Wan1#
*Jun  2 12:56:37.328: %OSPF-6-AREACHG: 172.16.255.1/32 changed from area 0.0.0.0 to area 0.0.0.5
Wan1#


Branch1#show ip route ospf
     172.16.0.0/16 is variably subnetted, 5 subnets, 3 masks
O       172.16.255.1/32 [110/1001] via 172.16.11.0, 00:00:14, FastEthernet1/0
O*IA 0.0.0.0/0 [110/64767] via 172.16.11.2, 00:00:20, Serial2/0
Branch1#


root@bt:~# ping -c 1 10.0.255.2
PING 10.0.255.2 (10.0.255.2) 56(84) bytes of data.
64 bytes from 10.0.255.2: icmp_seq=1 ttl=253 time=31.1 ms


With that, it is apparent that having a loopback in area 0 on an ABR configured for a totally stub network is a bad idea.  After moving the loopback to area 5, the branch router no longer sees a summary from Wan1, and it allows the test PC to successfully ping Core2's loopback through the T1 interface.  Note that the loopback on Wan2 is also in area 0, so this will need to be fixed too.

Test 2: Failure of one branch office in area

This test will require a slight modification of the configuration in order to simulate having another branch office connected in area 5. To achieve this, we will add a loopback on each Wan router to simulate another branch office.  Next, we will take down connectivity between Wan1 and Branch1, leaving the test PC network only uplinked to Wan2.

Wan1(config)#int lo1
Wan1(config-if)#descr Fake branch
Wan1(config-if)#ip address 172.16.12.1 255.255.255.0
Wan1(config-if)#exit
Wan1(config)#router ospf 1
Wan1(config-router)#network 172.16.12.0 0.0.0.255 area 0.0.0.5
Wan1(config-router)#end
Wan1#


Wan2(config)#int lo1
Wan2(config-if)#descr Fake branch2
Wan2(config-if)#ip address 172.16.14.1 255.255.255.0
Wan2(config-if)#router ospf 1
Wan2(config-router)#network 172.16.14.0 0.0.0.255 area 0.0.0.5
Wan2(config-router)#end
Wan2#


Branch1#config t
Enter configuration commands, one per line.  End with CNTL/Z.
Branch1(config)#int fa1/0
Branch1(config-if)#shut
Branch1(config-if)#end
Branch1#show ip route ospf
     172.16.0.0/16 is variably subnetted, 5 subnets, 3 masks
O       172.16.255.2/32 [110/64767] via 172.16.11.2, 00:00:21, Serial2/0
O       172.16.14.1/32 [110/64767] via 172.16.11.2, 00:00:21, Serial2/0
O*IA 0.0.0.0/0 [110/64767] via 172.16.11.2, 00:00:21, Serial2/0
Branch1#


On the surface, everything looks fine from Branch1.  The default route is coming from the T1. However, pinging between Core1's loopback and the test PC network fails.  Let's look at why:

Core1#ping 172.16.10.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.10.1, timeout is 2 seconds:
U.U.U
Success rate is 0 percent (0/5)
Core1#show ip route ospf
     172.16.0.0/16 is variably subnetted, 4 subnets, 2 masks
O IA    172.16.255.3/32 
           [110/64967] via 10.0.11.5, 00:06:00, GigabitEthernet1/0
O IA    172.16.255.2/32 [110/201] via 10.0.11.5, 00:19:34, GigabitEthernet1/0
O IA    172.16.255.1/32 [110/101] via 10.0.11.1, 00:18:24, GigabitEthernet2/0
O IA    172.16.0.0/20 [110/101] via 10.0.11.1, 00:11:54, GigabitEthernet2/0
     10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
O       10.0.11.2/31 [110/200] via 10.0.11.5, 00:27:38, GigabitEthernet1/0
O       10.0.255.2/32 [110/101] via 10.0.11.5, 00:27:38, GigabitEthernet1/0
Core1#


Wan1#show ip route ospf
     172.16.0.0/16 is variably subnetted, 5 subnets, 3 masks
O IA    172.16.255.3/32 
           [110/65067] via 10.0.11.0, 00:08:33, GigabitEthernet1/0
O IA    172.16.255.2/32 [110/301] via 10.0.11.0, 00:08:33, GigabitEthernet1/0
O       172.16.0.0/20 is a summary, 00:08:33, Null0
     10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
O       10.0.11.2/31 [110/300] via 10.0.11.0, 00:20:51, GigabitEthernet1/0
O       10.0.11.4/31 [110/200] via 10.0.11.0, 00:20:51, GigabitEthernet1/0
O       10.0.255.1/32 [110/101] via 10.0.11.0, 00:20:51, GigabitEthernet1/0
O       10.0.255.2/32 [110/201] via 10.0.11.0, 00:20:51, GigabitEthernet1/0


Notice that Core1 is receiving a summary route from Wan1 for 172.16.0.0/20, as expected, because there is at least one remote site in that summary range that is still up (our fake branch loopback).  However, Core1 no longer has a route for 172.16.10.0/24 as it is no longer connected.  It's best matching route is the Null0 route from the summary.

How can this issue be solved? We can create a point-to-point connection between the ABR's, with a subinterface on the link in each shared area.  Coincidentally, this would also resolve both issues from Test1. This would have given an alternate area 0 path if Wan1 / Core1 fiber was lost, and also would have allowed an alternate path for area 5 without having to transit through a branch router.  Below is the configuration added, with an updated topology map to reflect the changes.



! Wan1:
!
interface GigabitEthernet3/0.1
 encapsulation dot1Q 1 native
 ip address 10.0.11.6 255.255.255.254
 ip ospf network point-to-point


!
interface GigabitEthernet3/0.5
 encapsulation dot1Q 5
 ip address 172.16.11.4 255.255.255.254
 ip ospf network point-to-point


!
router ospf 1
 network 10.0.11.6 0.0.0.1 area 0.0.0.0
 network 172.16.11.4 0.0.0.1 area 0.0.0.5


! Wan2:
!
interface GigabitEthernet3/0.1
 encapsulation dot1Q 1 native
 ip address 10.0.11.7 255.255.255.254
 ip ospf network point-to-point
!
interface GigabitEthernet3/0.5
 encapsulation dot1Q 5
 ip address 172.16.11.5 255.255.255.254
 ip ospf network point-to-point
!
router ospf 1
 network 10.0.11.6 0.0.0.1 area 0.0.0.0
 network 172.16.11.4 0.0.0.1 area 0.0.0.5

At this point, Core1 will still receive its summarized route from Wan1. The difference is that Wan1 will now have a more specific route in area 5, across to Wan2 on interface Gi3/0.5.

Conclusion

OSPF solutions like area segmentation and route summarization can be quite beneficial, but it is important to understand what impact they could have on ensuring the network is free from routing loops and traffic is being routed across the most optimal path.  In many cases, the risks can be properly mitigated -- it's just important to understand what risks you may be facing.


No comments:

Post a Comment