5        Our recommended mechanisms and new ideas

 

5.1    Introduction

This chapter includes a proposal of which existing drafted mechanisms for OAM on MPLS to choose for large backbone networks. It is important to have a good comprehensive OAM solution for the backbone network that covers most of the OAM functionality desired by operators. The current proposed solutions are: LSP connectivity verification, mechanism for fast switching to a backup LSP when the LSP fails and a mechanism for monitoring the MPLS traffic on LSRs.

 

A new mechanism for backbone networks, which is planned to be a patent-application, is introduced. The mechanism is called Classifying the traffic and gives the operators a better utilization of their backbone and simultaneously provides the customers with their required network performance.

 

We have also discovered a new method that renders possible failure detection on the MPLS layer for protection switching. The intention is to differentiate the frequency of LSP connectivity verification traffic on LSPs that do not need protection switching and those LSPs that need it. This way, an avoidance of unnecessary bandwidth usage can be done.

5.2    Recommended OAM mechanisms for large backbone networks

Some of the various OAM mechanisms proposed for MPLS, both by IETF and ITU-T, will be more suitable for backbone networks than others. Our recommended OAM mechanisms for large backbone networks cover three different core areas of OAM on MPLS: Failure detection, mechanisms for reliable network and network monitoring. The first is covered by ITU-T’s recommendation on LSP connectivity verification, which provides a good solution for determining and alerting the affected routers about different LSP and node failures. The second area, which includes fast rerouting and protection switching, give the backbone network a reliable packet delivery. Finally, the ability to monitor the MPLS traffic at the different routers of the backbone gives MPLS MIBs a good management solution.

 

ITU-T’s LSP connectivity verification solution contains all the mechanisms that are needed for failure detection and alert messages within the MPLS network. Both defect due to loss of LSP connectivity, mis-configured LSRs and switched LSPs are detected. These failures, and even defects that are not MPLS-specific, will be alerted to the affected LSRs using for example BDI or FDI. It will also later on be possible to use ITU-T’s ad hoc mechanisms that have functionalities similar to MPLS Ping and Traceroute. Other connectivity verification mechanisms like MPLS Ping, Traceroute and RSVP node failure detection do not support the variety of failure detections and alert messages as ITU-T’s recommendation. One can say these mechanisms are just subsets of the ITU-T’s recommendation.

 

Even though failure detection of link- and node errors exist on lower layers than MPLS, it is not enough when it comes to LSP failures. A discussion on this subject is performed in chapter 5.3.

 

Fast rerouting and protection switching will provide the backbone with the necessary reduction of possible packet loss caused by both link and node failures. Some of the mechanisms will also protect against LSP failures as well. This makes the network operate more correctly, and increases the possibility for packets reaching their destination.

 

If one should choose a fast rerouting mechanism, one has to take into account how much OAM functionality one would need for a backbone. The mechanisms have different properties that must be taken into account when choosing a fast rerouting or protection switching mechanism. The properties to consider are:

 

When these criteria have been decided, the operator can look up the table in Figure 30 and find a suitable fast rerouting or protection switching mechanism. The Figure 30 describes a view on how the different mechanisms may affect the backbone network.

 

Type

Detection layer

Redundancy

Failure detection

Switching time

ITU-T’s LSP 1:1

MPLS and Link

Low

Needed

Medium

ITU-T’s LSP 1+1

MPLS and Link

High

Needed

Medium

Packet 1+1

Independent

High

Not needed

Very low

IETF’s One to one/exclusive

Link (and MPLS)

High

Needed

Low

IETF’s Facility/Shared

Link (and MPLS)

Medium

Needed

Low

Figure 30: Fast rerouting types

The type-column describes the different types of the proposed fast rerouting mechanisms. The Detection layer field explains which layer may perform the failure detection. Brackets are placed around MPLS, describing that only the control plane of the MPLS can be used. The redundancy field is divided into low, medium and high values, where low means the lowest amount of LSP redundancy. This property is about the backup entity utilization when not used for work traffic. The mechanism that let the backup entity transport extra traffic gives lowest redundancy and shared backup entities gives medium redundancy. The failure detection field explain the need for failure detection of the different mechanisms. Finally, the switching time indicates a gradation of how fast the switching to backup entity is performed. It is likely that a graduation on milliseconds between several of the mechanisms but still the figure say something about which mechanism that performs best according to the needed redundancy.

 

The choice between various fast rerouting mechanisms in which to use is up to the operator. This depends on what kind of alerts that have been chosen, the demand of reliability and the desired LSP redundancy needed for the particular network. If the mechanism giving the lowest packet loss is wanted, the packet 1+1 switching mechanism is to be chosen.

 

An operator for backbone networks should have the possibility to monitor the different MPLS routers and find out how they are functioning for making statistics of how well the backbone is performed. By using MPLS MIBs it will be possible to watch over different MPLS specific properties like the flow on the different LSPs. Since SNMP already is being used in high degree on IP and this protocol is also used for retrieving information from MPLS MIBs, the use of MPLS MIBs will be simple to carry out.

5.3    Differentiation of connectivity verification traffic

There will always be a need for connectivity verification (CV) of LSPs. Common LSPs need CV to detect failure within an appropriate time, and the LSRs will then carry out the necessary task of LSP restoration and alert other affected LSRs. For the protected LSPs in protection switching, the need is quite different. If it is necessary to have protection switching within tens of a second when MPLS LSP failure occurs, the requirement for fast failure detection is much higher.

 

The ITU-T’s CV traffic is proposed to be sent on each LSP periodically with a frequency of one packet per second. An LSP failure has occurred when defects on three consecutive CV packets have been detected. This means it takes three seconds before a failure alert for an LSP can be sent.

 

The probability of how often an LSP may fail can be discussed. There are many different failures that can occur on an LSP, described in the defect type codepoints. It is likely that errors on the link layer or nodes will happen more frequently than on LSPs, because of the link and nodes can be affected by external threats like power or cable failures. Still LSPs can never be fully trusted if they are incorrectly configured, the LSP mechanism works incorrectly or mis-merging or other errors occurs have occurred.

 

The time it takes to detect a failure, plus the time it takes to alert affected routers, may be too long to give foundation for protection switching. A huge amount of packets can be lost before switching to a backup LSP is done. This may be critical for real-time applications like video conferencing and IP telephony, if the backbone has many LSP errors.

 

A way to improve the LSP failure detection time will be to increase the frequency of the CV packets. To obtain switching of traffic to backup LSPs within seconds, the frequency should be about two or three packets each second. If one additionally consider there are many LSPs on one link, this will be quite amount of bandwidth usage.

 

Due to the distinctions in demand to failure detection time between LSPs that need protection switching and those that do not, we will propose to differentiate the frequency of CV packets in respect to the LSPs need. On the LSPs that need fast rerouting, the CV packet might be sent periodically with an interval much smaller than ITU-T’s proposal so far to be able to switch onto the backup LSP in tens of a second. The other common LSPs will be using, let say, ITU-T’s suggested interval of one packet for each second. In this way will we significantly reduce unnecessary high OAM traffic on LSPs that do not need protection switching for LSP failures, and at the same time achieve fast failure detection for LSPs that needs it.

 

On the basis of the discussion above, the operator has to decide the need for LSP failure detection in addiction to link and node failure detection for protection switching mechanisms.

 

5.4    Classifying the traffic

Fortunately, the contents of this chapter led to a planned patent-application that has, at current time, not gotten the patent pending status. Thus, we can not release the contents of this chapter to the public before the information has been accepted by the patent agency. Instead, most content of this chapter has been moved to Appendix D as restricted information, available only for Ericsson and the external examiner. The contents may be released to the general public at a later stage.

 

The main purpose of this new mechanism is to show how one can use the MPLS technology to detect specific traffic behavior, making the MPLS backbone handle this traffic more logic. This mechanism gives the operators a better utilization of their backbone and simultaneously provides the customers with their required network performance.


6        Conclusion

In this thesis, we have evaluated existing OAM mechanisms for MPLS backbone networks and compared these mechanisms to IP. This has shown that the MPLS OAM principles fully covers failure and reachability detection, avoidance of congested routers, SNMP features, fast rerouting and protection switching functions, traffic engineering and ad hoc mechanisms like Ping, We have also proposed the ITU-T LSP connectivity verification mechanism, fast rerouting and protection switching, and the use of MPLS MIB as recommended OAM mechanisms for large backbone networks. Also, we have three new ideas for OAM on MPLS in backbone networks.

 

Firstly, a new mechanism for classifying the traffic is provided by this thesis. It shows how one can use the MPLS technology to detect specific traffic behavior. This will make the MPLS backbone handle the traffic more logically. This mechanism gives the operators a better utilization of their backbone and simultaneously provides the customers with their required network performance. A patent on this mechanism is planned to be sent during this spring.

 

Secondly, we have found that the connectivity verification traffic load should be differentiated between the LSPs that need protection switching and those that do not. To achieve a better protection switching for detecting LSP errors faster, a shorter period between LSP connectivity verification packets than drafted by ITU-T is needed. This will result in an increased OAM bandwidth usage. At the same time, unnecessary OAM traffic needs to be removed to provide the best available bandwidth for working traffic. A well-thought differentiation of connectivity verification traffic will result in a reliable network while MPLS OAM traffic does not use unnecessary bandwidth.

 

Thirdly, a table describing the different proposed fast rerouting and protection switching mechanisms is provided. The table shows what layer that performs the failure detection, a gradation of their redundancy, if failure detection is needed and a gradation of their switching time. This will ease the operator’s choice of mechanisms to use in the large MPLS backbone networks.

 

Additionally, we have studied how MPLS has the possibility to detect different connectivity failures in respect to LSPs and nodes. When a failure is detected, it is possible to alert affected nodes both upstream and downstream to suppress alarm storms. This feature is important to reduce unnecessary OAM traffic, and to let only the failed LSP’s end point take appropriate action. In contrast to MPLS, where routers inside the backbone network handle the failures, IP let the source host outside the backbone handle the failures. The MPLS failure detection mechanisms seem to make MPLS a good choice for future backbone networks.

 

Further work should, as mentioned in Appendix D, test the algorithm presented and find optimal parameters for correct traffic classification. Also, further research has to find an appropriate interval for sending connectivity verification packets using testbeds. This must be done for achieving the best ratio between failure detection on LSPs for protection switching while limiting the OAM traffic on backbone networks.


MECHANISMS FOR OAM ON MPLS IN LARGE IP BACKBONE NETWORKS (c) 2002 Hallstein Lohne, Johannes Vea, a graduate thesis written for AUC/ERICSSON