3        A classification of OAM functionalities

 

3.1    Introduction

This chapter classifies the various OAM functionalities that exist or are proposed for IP and MPLS. Firstly, a description of a network management mechanism that can be used for manage both IP and MPLS networks. One can define network management as a generic solution for monitoring and checking the network for errors. The Simple Network Management Protocol (SNMP) has been created for this purpose. SNMP is used to retrieve information from routers be accessing their different Management Information Bases (MIBs) on nodes in the network.

 

Secondly, a classifying of the different OAM mechanisms for IP and MPLS is described. IP does not have any suchlike mechanisms itself, but IP extensions like Internet Control Message Protocol (ICMP), Ping, Traceroute and MIBs are the main functionalities being used for this technology. In contrast MPLS has proposals for many different OAM mechanisms. The LSP connectivity verification mechanism detects different defects on LSPs and offer a number of different packet formats. MPLS ping, traceroute and RSVP node failure detection are other methods for failure detection. Protection switching and fast rerouting gives the network reliable packet delivery while MPLS traffic engineering and MPLS SNMP MIBs gives operational mechanisms.

3.2    Network management

3.2.1     Network Management Architectures

When it comes to Network Management, there are usually two primary elements: a manager and agents. The Manager has two purposes, collecting and visualizing information. It collects information from agents and uses various mechanisms for sorting and picking out relevant data. The agents are responsible of delivering information about the hardware or software. Generally, the agents are used for the purpose of tasks like monitoring traffic usage, number of clients connected and similar activities.

Figure 18: Network Management Architecture

As one can see in Figure 18, the Network Management System (NMS) contacts the various routers and get the Management Information Bases (MIB) information from the router’s SNMP agents. The NMS can be some sort of network monitoring software running on a normal computer equipped with a network card. It exist a numerous different solutions for this purpose on the market, and they have all have various features.

3.2.2     SNMP

Since it was developed in 1998, the Simple Network Management Protocol (SNMP) has become a common way for monitoring the Internet Protocol (IP) network. SNMP is extensible, allowing vendors to easily add network management functions to their existing products. SNMP runs on top of UDP.

 

The strategy implicit in the SNMP is that the monitoring of network states at any significant level of detail is accomplished primarily by polling for appropriate information for making the best possible management solution. A limited number of unsolicited messages (traps) guide the timing and focus of the polling. Limiting the number of unsolicited messages is consistent with the goal of simplicity and minimizing the amount of traffic generated by the network management function. [30]

 

In other words, SNMP is a set of rules that makes many hardware devices, such as computers and routers, being able keep track of different statistics that measure important features, such as number of packets received on an interface. The different information SNMP retrieves is kept in each separate database, called Management Information Base (MIB). Other kinds of equipment have configuration information available through SNMP. The SNMP is an Application-layer protocol (see Figure 4) and is used almost exclusively in TCP/IP networks.

The MIB architecture

It exist a large amount of different MIBs, giving many different aspects of the operation and performance of different devices. Using SNMP one can connect to these MIBs, locate MIB variables and retrieve or edit them. MIB variables are defined by an Object Identifier (OID) that has a hierarchically address system, like a reversed version of the well-known Domain Name Service (DNS) system. OID uses a numeric system, where the first number is the root of the hierarchy, and the second is leaf one etc. As an example, the address for the sysDescr MIB is 1.3.6.1.2.1.1.1. The translated version of this would be:


            .iso(1).org(3).dod(6).internet(1).mgmt(2).mib-2(1).system(1).sysDescr(1).

 

 One can see that the root leaf is ISO and then the sub-objects are located using it’s well-known numeric path. See Figure 19 for a more descriptive view.

Figure 19: The OID Tree

The SNMP client puts the OID identifier for the MIB variable it wants to get into the request message, and it sends this message to the node. Then the server maps this identifier into a local variable for example by a memory location where the value for this variable is stored, and retrieves the current value held in this variable. [18c]

 

There are various tools that take use of SNMP and its statistics, locating it in a database or similar. These network management systems mostly uses periodic checking by issuing SNMP Get-Requests to read the MIBs of various routers within the network about the routers’ information located in their MIB variables. These pieces of information can be inserted into a central NMS database for latter giving valuable statistic information. As an example output, we have included the bandwidth usage from one of Uninett’s OAM solutions. Uninett are monitoring a lot of different routers in their network and compile overviews/maps on a periodic basis. They use a centralized and this system sends SNMP traffic-usage requests to their routers. See Figure 20 for the sample output.

 

[PNG plot]
Figure 20: Oslo-Bergen daily traffic in kbit/s based on SNMP Get-Requests [11]

SNMP on network devices is today becoming almost a requirement. The Internet is the single largest market for SNMP systems. A large portion of SNMP systems will be developed with the Internet as a target environment. Therefore, it may be expected that the Internet's needs and requirements will be the driving force for SNMP. SNMP over UDP/IP is specified as the "Internet Standard" protocol. Therefore, in order to operate in the Internet and be managed in that environment on a production basis, a device must support SNMP over UDP/IP. This situation will lead to SNMP over UDP/IP being the most common method of operating SNMP. Therefore, the widest degree of interoperability and widest acceptance of a commercial product will be attained by operating SNMP over UDP/IP. [39]

Security

To access the SNMP agents, the SNMP Get-Request are used and will be accepted or denied according to if the password sent by the client is correct or not. This password is defined as a Read-only Community String. Usually, the default password is public and some call it default public community string. Many operators change the default Read-only Community String to keep information for the operators only. One can, on some devices, also define an IP filter for SNMP connection, thus improving security.

 

There is also an SNMP Set-Request that can set and alter some MIB variables to a specific value. These Set-Requests are protected by the Write Community String that should be different than “public”.

 

The SNMP also defines a SNMP Trap, which is an interrupt from a device to an SNMP console about the state of the device. Traps can indicate link-down/up and information surrounding power state. These traps might improve SNMP information, since some of the traps are not detected when an NMS send SNMP requests on a periodic basis.

The structure of MIBs

The different MIBs are built up according to a specified structure. This structure exists of three parts: Resource, definition and value. These are explained below [41]:

 

·        Resource: Management of the MIB’s use of system resources
The resource section has objects to manage resource usage by wild carded delta expressions, a potential major consumer of CPU and memory.

·        Definition: Definition of expressions
The definition section contains the tables that define expressions. The expression table, indexed by expression owner and expression name, contains those parameters that apply to the entire expression, such as the expression itself, the data type of the result, and the sampling interval if it contains delta or change values. The object table, indexed by expression owner, expression name and object index within each expression, contains the parameters that apply to the individual objects that go into the expression, including the object identifier, sample type, discontinuity indicator, and such.

·        Value:  Values of evaluated expressions
The value section contains the values of evaluated expressions. The value table, indexed by expression owner, expression name and instance fragment contains a "discriminated union" of evaluated expression results. For a given expression only one of the columns is instantiated, depending on the result data type for the expression. The instance fragment is a constant or the final section of the object identifier that filled in a wildcard.


3.3    OAM on IP

To provide OAM on IP, a system operator can utilize different software management packages or advanced scripts for monitoring a network. These software solutions requests information from routers and switches, using ping, traceroute and SNMP. SNMP offers connectivity to various MIBs that contain information such as CPU-load, Traffic Load and other.

 

Figure 21 OAM on IP

The computer in Figure 21 collects information and stores the information on a periodic basis. This information gives valuable input for the OAM process for detecting failures or inconvenient behaviour.

 

Regardless of the size of your network, whether a dozen nodes or thousands, one must establish a way to monitor the status of your network to see where it is working and where it is not. If one does not, you will be in the dark about what is going on, and you will constantly be fighting fires that could have been avoided. [12]

3.3.1     Ping and ICMP

The most common mechanism used for verifying whether routers and other nodes in the network is reachable or not is Ping. Ping measures the two-way delay between the source and the destination. One can also monitor the response time of the various systems using this small program. It takes use of the Internet Control Message Protocol (ICMP) fields to determine the various aspects of failure:


 

Some of the ICMP Fields:

Type

   3

Code

   0 = net unreachable;

   1 = host unreachable;

   2 = protocol unreachable;

   3 = port unreachable;

   4 = fragmentation needed and DF set;

   5 = source route failed.

Types

   8 for echo message;

   0 for echo reply message.

Code

   0

(If code = 0, an identifier to aid in matching echoes and replies, may be zero.)

Figure 22: ICMP Destination Unreachable Message (type3) and ICMP Echo or Reply Message [19]

ICMP is a message control and error-reporting protocol that operates between the network device and the gateway. It uses datagrams and is actually a part of an IP implementation (See Figure 4). The messages are sent back to the requesting host, and are not handled by the routers. This is the easiest way to see if a network device is on-line, and it is also the lowest level of this type of reachability checks. Figure 22 describes one of the most used features on the ICMP layer.

 

ICMP have many error messages that can indicate that the destination host is unreachable (perhaps due to a link failure), that the reassembly process failed, that the TTL had reached 0 or that the IP header checksum failed, and so on.

 

The various message types for ICMP are: 0 (Echo Reply), 3 (Destination Unreachable), 4 (Source Quench), 5 (Redirect), 8 (Echo), 11 (Time Exceeded), 12 (Parameter Problem), 13 (Timestamp), 14 (Timestamp Reply), 15 (Information Request) and 16 (Information Reply). They all have their own explicit function for determining errors and response times.

 

ICMP Redirect tells the source host that there is a better route to the destination. ICMP Redirects are used in the following situation. Suppose a host is connected to a network, that has two routers attached to it, called R1 and R2, and the host uses R1 as its default router. Should R1 ever receive a datagram from the host, where based on its forwarding table it knows that R2 would have been better choice for a particular destination address, it sends an ICMP Redirect back to the host, instructing it to use R2 for all future datagrams addressed to that destination. The host then adds this new route to its forwarding table. [18a]

 


Error reporting
While IP is perfectly willing to drop datagrams when the going gets tough – for example, when a router does not know how to forward the datagram or when one fragment of a datagram fails to arrive at the destination – it does not fail silently. [18a]

 

ICMP takes care of these errors, using one of the earlier mentioned error messages, and report them back to the sending host.

3.3.2     Traceroute

Sometimes one can not completely rely on Ping. If Ping fails, it does not tell which of multiple of routers between the two endpoints that is failing to deliver the packet.

 

Traceroute fixes this problem by allowing to find out each intermediate router on the way from the host A to host B. It does this by causing each router along the path to send back an ICMP error message. IP packets contain Time-To-Live (TTL) value that each router decrements as it handles the packet. When this value drops to zero, the router discards the packet and sends an ICMP Time-to-live Exceeded message back to the sender. The first packet traceroute sends, are the TTL value of 1. The first router decrements this and sends back the ICMP error message, and traceroute has discovered the first hop router. It then sends a packet with a TTL value of 2, which the first router decrements and routes. But the second router decrements it to zero, which causes it to send an ICMP error message, and traceroute has learned the second hop. By continuing in this way, traceroute causes each router along the path to send an ICMP error message and identify itself. Ultimately, the TTL gets high enough for the packet to reach the destination host, and traceroute is done, or some maximum value (usually 30) is reached and traceroute ends the trace. [12]

 

What really matters is that this function can be scriptable and used in a larger NMS. But Traceroute is mostly used manually by system operators to locate errors in their network. Note that Traceroute cannot completely be trusted for such tasks, since IP-packets may travel different routes each time one perform an IP traceroute. Sometimes, operators might use a pre-tested traceroute (by logging the output to a file), and compare it to the current traceroute to see how their network is rerouted and such. If they differ, rerouting might have occurred. One can also use the pre-tested traceroute to locate routers that are unreachable by using ping on each hop in the traceroute.

 

A note about Time-To-Live (TTL)
About Time-To-Live, its name reflects its historical meaning rather than the way it is commonly used today. The intent of the field is to catch packets that have been going around in routing loops and discard them, rather than let them consume resources indefinitely. Originally, TTL was set to a specific number of seconds that the packet would be allowed to live, and routers along the path would decrement this field until it reached 0. However, since it was rare for a packet to sit for as long as 1 second in a router, and routers did not all have access to a common clock, most routers just decremented the TTL by 1 as they forwarded the packet. Thus, it became more of a hop count than a timer, which is still a perfectly good way to catch packets that are stuck in routing loops. [18a]

3.3.3     IP MIBs

Since the nodes we need to keep track of are distributed, our only real option is to use the network to manage the network. This means we need a protocol that allows us to read, and possibly, write, various pieces of state information on different network nodes. [18c]

 

MIB variables and such often just maintain hardware-specific information for the equipment in question. Manufacturers have a variety of information that can be monitored for their products.

 

Examples of these variables are:

 

It also exist IP-specific variables:

 

 

Variables adapted from [43] and [44]. These are just examples of variables in the jungle of MIBs.

3.3.4     New OAM functions in IPv6

In IPv6 it is support for address autoconfiguration of hosts and routers. There are two types of address autoconfiguration: Stateless and stateful. The stateless approach is used when a site is not particularly concerned with the exact addresses hosts use, as long as they are unique and properly routable. The stateful approach is used when a site requires tighter control over exact address assignments. Both stateful and stateless address autoconfiguration may be used simultaneously [34].

 

Stateless autoconfiguration requires no manual configuration of hosts, minimal (if any) configuration of routers, and no additional servers. The stateless mechanism allows a host to generate its own addresses using a combination of locally available information and information advertised by routers. Routers advertise prefixes that identify the subnet(s) associated with a link, while hosts generate an "interface identifier" that uniquely identifies an interface on a subnet. An address is formed by combining the two. But before the new local address can be used, the host must insure it selves against that other hosts are using or are going to use the same address [34].

 

In the stateful auto configuration model, hosts obtain interface addresses and/or configuration information and parameters from a server. Servers maintain a database that keeps track of which addresses have been assigned to which hosts [34].

3.3.5     ITU-T’s future OAM on IP

Study Group 13 is developing OAM network techniques that can be used to control and manage IP layer functions required in operations and maintenance, e.g. the Y.17xx Recommendations on MPLS. Study Group 15 is responsible for defining the implementation of these functions in IP network equipment, although much of this work is being done by IETF. Study Group 4 makes use of these OAM facilities to carry out management functions in the transport plane and control plane in concert with the TMN management capabilities. In an IP-based network environment, the distinction between control plane, signalling plane and management plane (TMN) is blurring. [3]

 

ITU-T have said that they would look into how they can make support mechanisms for collection of information that can be used for charging users of the resources, specifically the end users of the services. They would also see into supporting mechanisms for collection of information that can be used for the Settlement between users of the resources, and support mechanisms for collection of performance and quality of service (QoS) information that can be used to support management of QoS and service level agreements (SLAs). [3]

 

Also, ITU-T has said that OAM and protection switching issues of IP-based networks are to be considered. Requirements and issues are to be studied first. After requirements are decided, IP OAM functions have to be considered. [40]

 

The question 4/13 [42] at ITU-T describes what eras ITU-T is planning to study:

·        Define the traffic aspects of SLA for IP based services.

·        Specify the IP Transfer Capabilities and associated traffic contract derived from SLA statements. This should allow the support of real-time and non-real-time applications.

·        Policy guidelines for defining traffic aspects in a SLA for IP based services.

·        Specify a traffic and congestion control framework for IP traffic.

·        Specify resource management and congestion control functions.

·        Specify traffic engineering methods and traffic engineering tools for IP.

 

More information on work on this area may appear by end of 2002. [40]

3.4    OAM on MPLS

 

3.4.1     Current work overview

The ongoing work on OAM on MPLS is in a state where there has been created some drafts but rather few recommendations and specifications. ITU-T has pre-published the recommendation Requirements for OAM functionality for MPLS networks that provides the motivations and requirements for user-plane OAM functionality in MPLS networks. The user-plane refers to the set of traffic forwarding components through which traffic flows [21]. The main motivation for this work have been the network operators expressed need for OAM functionality to ensure reliability and performance of MPLS LSPs. [20]

 

IETF Network Working Group and Traffic Engineering Working Group have done a lot of research on OAM functionalities and most of their work on this area is still at draft state. Much of their work is dealing with how MPLS can give the best reliability when failures are detected. There is a need for minimize the packet loss when LSPs fails.

3.4.2     LSP connectivity

MPLS introduces new network architecture and therefore there will be new failure modes that are only relevant for the MPLS layer. Thus, layers above or below the MPLS layer cannot be used for MPLS-specific OAM needs.

 

User-plane OAM tools are required to verify that LSPs maintain correct connectivity, and are thus able to deliver customer data to target destinations according to both, availability and QoS (Quality of Service) guarantees, given in SLAs (Service Level Agreements) [20].

 

Some of the requirements that must be supported by the MPLS OAM functions are [20]:

·        Both, on-demand and continuous connectivity verification of LSPs to confirm that defects do not exist on the target LSPs.

·        A defect event in a given layer should not cause multiple alarm events to be raised simultaneously, or cause unnecessary corrective actions to be taken in the client-layers. The client layer is the layer above in the label hierarchy using current layer as a server layer.

·        Capability to measure availability and QoS performance of a LSP.

·        At least the following MPLS user-plane defects must be detected [20]:

-         Loss of LSP connectivity due to a server layer failure or a failure within the MPLS layer

-         Swapped LSP trails

-         Unintended LSP replication of one LSP’s traffic into another LSP’s traffic

-         Unintended self-replication

 

16 values of the 20 bits large label field has been reserved in the label header for special functions, but not all have been specified yet. One of these functions that are proposed is the OAM Alert Label and has been given the numerical value of 14 [21].

 

Figure 23: MPLS OAM packet

There are different payloads depending on what OAM function the packet contains, but there is still a common structure for the payloads. At the beginning, each packet has an OAM Function Type field for specifying which OAM function there is in the payload. In each packet it is also the specific OAM function type data and at the end of the packet a Bit Interleaved Parity (BIP16) error detection mechanism. The BIP16 remainder is computed over all the fields of the OAM payload including the OAM Function Type and the BIP16 positions that are preset to zero. The payload must have at least 44 octets because this will facilitate ease of processing and to support minimum packet size required on layer 2 technologies. This is achieved by padding the specific OAM type data field with all “0”s when necessary [21].

 

OAM packets are differentiated from normal user-plane traffic by an increase of one in the label stack depth at a given LSP level at which they are inserted [21]. To ensure that the OAM packets have a Per Hop Behavior (PHB), ensuring the lowest drop probability, one has to code the EXP field a certain way. The EXP field should be set to all “0”s in the OAM Alert Labeled header and to whatever the 'minimum loss-probability PHB' is in the preceding normal user-plane forwarding header for that LSP [21].

 

The Time to Live (TTL) field should be set to “1” in the OAM Alert Labeled header. One reason for this is that OAM packets should never travel beyond the LSP trail termination sink point at the LSP level they were originally generated. This is possible because the headers is not examined by intermediate label-swapping LSRs, and are only observed at LSP sink points [21].

 

At the moment, May 2002, there are proposed six different types of OAM functions and these have the codepoints shown in Figure 24, and so far in the recommendation there is support for multipoint to point LSPs, Single-hop LSPs and Penultimate hop popping [21a].

 

OAM Function Type codepoint (Hex)

Second octet of OAM packet payload

Function Type and Purpose

00

Reserved

01

CV – Connectivity Verification

02

P – Performance

03

FDI – Forward Defect Indicator

04

BDI – Backward Defect Indicator

05

LB-Req – Loopback Request

06

LB-Rsp – Loopback Response

07 – FF

Reserved for possible future standardizations

Figure 24: OAM Function Type Codepoints [21a]

It is strongly recommended that CV OAM packets are generated on all LSPs in order to detect all defects and potentially provide protection against traffic leakage both in and out of LSPs. It is also recommended that FDI OAM packets are used to suppress alarm storms. BDI packets are a useful tool for single-ended monitoring of both directions and also in some protection switching cases. However, these are only recommendations and operators can choose to use some or all of the OAM packets as they see fit. [21a]

Connectivity Verification (CV)

The Connectivity Verification function is used to detect and diagnose all types of LSP connectivity defects sourced either from below or within the MPLS layer networks. The CV flow is generated at the LSP’s ingress LSR with a nominal frequency of one packet per second and transmitted towards the LSP’s egress LSR. The CV OAM packets are transparent to the transit LSRs; meaning the packets are invisible for these LSRs. The CV packet contains the network-unique identifier Trail Termination Source Identifier (TTSI) and this identifier is used to detect all types of defects explained in chapter 0. This is obtained by egress LSR checking incoming CV packets per LSP. A LSP enters a defect state when one of the defects described in Figure 24 occurs [21].

 

The structure of the LSP TTSI is defined by using a 16 octet LSR ID IPv6 address followed by a 4 octet LSP Tunnel ID [21]. According to Neil Harrison and David Allan (both members of ITU-T Study Group 13 mailing list,) and what we can see, this LSP Tunnel ID is build up by the Local LSP_ID for CR-LDP tunnels [27] or the Tunnel ID for RSVP tunnels [26]. It could also be configured manually. The first 16 (two octets) most significant bits of the LSP Tunnel ID are currently padded with all “0”s to allow for any future increase in the Tunnel ID field [21]. For LSR that do not support IPv6 addressing, an IPv4 address can be used for the LSR ID using the format described in [29], IP Version 6 Addressing Architecture. [21]

Figure 25: CV payload structure

Forward Defect Indication (FDI)

Forward Defect Indication is generated by an egress LSR detecting any defects. When the egress LSR detects a failure, it produces a FDI packet and traces it forward and upward through any nested LSP stack, also known as the label hierarchy (Figure 12). The FDI OAM packets are generated on a nominal one per second basis. [21a].

 

The FDI packets’ primary purpose is to suppress alarms in layer networks above the layer at which the defect occurs. To be able to send FDI packets upwards, it is important that the LSP sink point remembers any server-client LSP label mappings that were in existence prior to the failure. In this way, when higher level LSPs detects loss of CV flow caused by defects on lower level LSPs, we achieve correct identification of the source that actually had the defect. The higher layer clients may not be in the same management domain as the initial defect source. It includes fields to indicate the nature of the defect and its location [21].

When a FDI is to be passed from a server layer LSP to its client layer LSP(s), the Defect Location and Defect Type field should be copied from the server layer LSP FDI into the client layer LSP(s) FDI.

Figure 26: FDI and BDI payload structure [21]

In Figure 26, the Defect Type field is two bytes large and the values this field can have are listed in. Defect Location (DL) will contain the identity of the network in which the defect has been detected. The identity should be in the form of an Autonomous System (AS) [25] number. [21]

Backward Defect Indication (BDI)

The purpose of the BDI OAM function is to inform the upstream end of an LSP of a downstream defect. BDI is generated at a return path’s trail termination source point in response to a defect being detected at a LSP trail termination sink point in the forward direction [21].

 

To be able to send the BDI (and also LB-Rsp) upstream, it is required to have a return path. A return path could be [21]:

a)      A dedicated return LSP.

b)      A shared return LSP, which is shared between many forward LSPs.

c)      A non-MPLS return path, such as an out of band IP path. This option has potential security issues. For example the return path could be terminated on a different LSR interface, and potentially a malicious user could generate a BDI and send it to the ingress LSR. Therefore, due to the possibility of DoS attack, additional security measures must be taken. Such techniques are beyond the scope of this thesis.

 

The BDI packet is sent periodically by one packet per second backwards towards its peer-level LSP trail termination sink point in the reverse direction and further upward through any nested LSP stack. The BDI is sent as a mirror of the appropriate FDI. Appropriate FDI is the FDI generated on the lowest layer where the failure was detected. The Defect Location and Defect Type fields are a direct mapping of those obtained from the appropriate FDI and have identical formats as described previously for the FDI OAM packet [21].

 


Figure 27 illustrates two things concerning LSP connectivity. The two gray areas in A) describe the way CV OAM packets are distributed from ingress to egress on different LSPs and label stack depths. A) describes how the CV packets are sent using level depth 1 and level depth 2 in the label hierarchy. B) describes what happens when a failure is detected, which LSR detects the failure and how it tells the others about the failure. The LSRs belongings to different LSPs and uses a label hierarchy to reach from ingress to egress LSR.

Figure 27: How FDI and BDI are functioning when a failure is occurs.

Assume the name of the three LSPs in Figure 27 are A, B and C. The LSP A from LSR4 to LSR5 has label stack depth one; LSP B from LSR2 through LSR3 and over LSP A to LSR6 uses a label stack depth of two; and finely LSP C from LSR1 over LSP B through LSR7 to LSR8 uses a label stack depth of three.

 

Consider a failure is detected between LSR2 and LSR 3. This will have consequences for both LSP B and LSP C. Both LSR6 and LSR 8 will detect that a failure has occurred even when the failure actually is at LSP B. To suppress alarms for LSP C at LSR8, LSR6 have to inform this router by sending FDI packets along the same path as the LSP C would be using before failure occurred. It is not only necessary to inform the downstream egress LSRs, LSR6 have to inform LSR 2, LSP B’s ingress LSR, which in its turn will inform LSR1 about the failure as well by sending BDI packets. The way the BDI packets are sent, such as finding an alternative return path, is discussed above.

Other OAM functions

Performance “P” packets are for further study at ITU-T. However, the intention of each packet is to have an ad hoc method of determining packet and octet loss on an LSP in order to aid trouble-shooting [21]

 

Loopback Request and Loopback Response provide an ad hoc capability for verifying the LSP endpoint and delay measurement [21]. These two functions are as well for further study.

Defect type codepoint

The defect type (DT) code is encoded in two octets. The first octet indicates the layer and second octet indicates the nature of the defect. To be able to detect these defects we need a LSP availability state machine (ASM) both on the LSP’s ingress LSR and egress LSR. At the ingress LSR do we have the LSP Trail Far-End Defect State and for the egress LSR the LSP Trail Sink Near-End Defect State [21].

 

Defect Type (DT)

DT code (Hex)

Description

 

dServer

01 01

Any server layer defect arising below the MPLS layer network

dLOCV

02 01

Simple Loss of Connectivity Verification.

dTTSI_Mismatch

02 02

Trail Termination Source Identifier Mismatch defect.

dTTSI_Mismerge

02 03

Trail Termination Source Identifier Mismerge defect.

dExcess

 

02 04

Increased rate of CV OAM packets with the expected TTSI above the nominal rate of one per second.

dUnknown

02 FF

Unknown defect detected in the MPLS layer.

None

00 00

Reserved

None

FF FF

Reserved

Figure 28: Defect Type codepoints in FDI/BDI OAM packets [21a]

In Figure 28 there are four MPLS user-plane defects: dLOCV, dTTSI_Mismatch, dTTSI_Mismerge and dExcess. When one of these defects occurs, the ASM enters the LSP Trail Sink Near-End Defect State which in its turn, when BDI packets have reached the ingress LSR, will cause the ingress LSR to enter Trail Far-End Defect State. The other two defect types deals with defects from outside the MPLS layer and unknown defects. All the actions that are invoked when entering the LSP Trail Sink Near-End Defect State are stopped when the LSP Sink Near-End Defect State is exited [21].

 

The descriptive meanings of the various defect types are:

3.4.3     MPLS ping

MPLS ping is a simple and efficient mechanism that can be used to detect data plane failures in MPLS LSPs, which cannot always be detected by the MPLS control plane. This mechanism is needed for providing a tool that would enable users to detect such traffic "black holes" or misrouting within a reasonable period of time; and a mechanism to isolate faults. The mechanism is modelled after the ICMP echo request and reply, used by ping and traceroute to detect and localize faults in IP networks [5].

 

The basic idea is to test that packets that belong to a particular Forwarding Equivalence Class (FEC) actually end their MPLS path on an LSR that is an egress for that FEC. Therefore, an MPLS echo request carries information about the FEC whose MPLS path is being verified. The MPLS ping packet is encapsulate by an UDP packet and contains parameters like Sequence Number and Time Stamp. This echo request is forwarded just like any other packet belonging to that FEC. In a basic connectivity check using ping, the packet should reach the end of the path. At the end point the packet is examined at the control plane of the LSR, which then verifies that it is indeed an egress for the FEC. In traceroute mode, which is the fault isolation mode, the packet is sent to the control plane of each transit LSR, which performs various checks that it is indeed a transit LSR for this path; this LSR also returns further information that helps check the control plane against the data plane, i.e., that forwarding matches what the routing protocols determined as the path [5].

 

An MPLS echo reply is as well an UDP packet and must only be sent in response to a MPLS echo request. The source IP address is the Router ID of the replier; the source port is the well-known UDP port for MPLS ping. The destination IP address, UDP port and sequence number are copies of the source IP address, UDP port and sequence number from the echo request packet. The time stamp is set to the time-of-day that the echo request is received [5].

 

There are two ways to forward the echo replay in reversed direction towards the echo request source. The first option is to set the Reply Mode to the value Router Alert. When a router sees this option, it must forward the packet as an IP packet. Note that this may not work if some transit LSR does not support MPLS ping. The second option is to send the echo reply via the control plane, which is, at present time, only defined for RSVP-TE LSPs [5].

 

One way these tools can be used is to periodically ping a FEC to ensure connectivity. If the ping fails, one can then initiate a traceroute to determine where the fault lies. One can also periodically traceroute FECs to verify that forwarding matches the control plane; however, this places a greater burden on transit LSRs and thus should be used with caution [5].

3.4.4     RSVP node failure detection

The RSVP ‘Hello’ extension enables RSVP nodes to detect when a neighbouring node is not reachable. The mechanism provides node to node failure detection [26].

 

Neighbour failure detection is accomplished by collecting and storing a neighbour's "instance" value. If a change in value is seen or if the neighbour is not properly reporting the locally advertised value, then the neighbour is presumed to have reset. When a neighbour's value is seen to change or when communication is lost with a neighbour, then the instance value advertised to that neighbour is also changed [26].

 

A node periodically generates a Hello message containing a Hello Request object for each neighbour whose status is being tracked. The periodicity is governed by the hello_interval. This value may be configured on a per neighbour basis. The default value is 5 ms. [26]

3.4.5     Protection Switching

Protection Switching is a term that ITU-T is using. They have recognized that protection switching functionality is important to enhance the availability and reliability of MPLS networks. Protection switching implies that both routing and resources are pre-calculated and allocated to a dedicated protection LSP prior to failure. Protection switching therefore offers a strong assurance of being able to re-obtain the required network resources post-failure. This is in contrast to restoration that does not have a defined dedicated protection entity and neither router nor resources are pre-calculated or allocated prior to failure. Restoration therefore offers no assurance of being able to re-obtain the required network resources post-failure. [32]

 

At present time the functionality for protection switching is limited to point-to-point LSP tunnels and there are two types of architecture proposed: The 1+1 type and the 1:1 type. Other functionalities and architecture types are for further study. The 1+1 architecture type uses a protection LSP that is dedicated to each working LSP. At the ingress LSR of the protected domain, the working LSP is bridged onto the protection LSP. The traffic on the working and protection LSPs is transmitted simultaneously to the egress LSR of the protected domain. When the traffic arrive the egress LSR of the protected domain the selection between the working and protection LSP is made based on some predetermined criteria, such as defect indication. [32]

 

In the 1:1 architecture type, a protection LSP is dedicated to each working LSP as well. The working traffic is transmitted either by working or protection LSP. The method for a selection between the working and protection LSPs depends on the mechanism and is performed by the ingress LSR of the protected domain. The protection LSP can be used to carry the so-called extra traffic when it is not used to transmit the working traffic. [32]

 

Protection switching should be conducted when [32]:

·        Initiated by operator control

·        Signal fail is declared on the connected LSP, working LSP or protection LSP, and is not declared on the other LSP. This failure may be detected by using CV packets.

·        The wait to restore timer expires and signal fail is not declared on the working LSP.

 

The two protection architecture type explained above is LSP protection switching where a switching from working entity to protection entity must be performed when a failure has been detected and signaled. There is also a proposal different from a ITU-T’s switching protection scheme. This is a packet level 1+1 path protection scheme that is proposed by Lucent Technologies. It provides an instantaneous recovery from failures without loosing the in-transit packets on the failed LSP. Failure coverage includes any single failures in physical layer, link layer and MPLS layer. [14]

 

To provide packet 1+1 protection service between two MPLS network edge LSRs, this is ingress and egress LSRs, a pair of MPLS LSPs are established along disjoint paths. The packets are dual-fed at the ingress node into the two LSPs and have sequence number attached to it [14]. When the packet arrives the ingress node one of the two copies is selected. In this way there will be no loosing of in-transit packets on the failed LSP.

 

The distinctions between the packet 1+1 protection and the two traditional switching protection schemes proposed by ITU-T is that there is no need for explicit failure detection, signaling and protection switching between the two LSPs and the scheme treats each LSPs as working LSPs. [14]

3.4.6     Fast rerouting

In order to meet the needs of real-time applications such as video conferencing and other services, the IETF Network Working Group finds it highly desirable to be able to redirect user traffic onto backup LSP tunnels in tens of milliseconds. In this subchapter we are writing about explicitly routed LSPs. The backup LSPs have to be placed as close to the failure point as possible, since reporting failure between nodes may cost significant delay. There is one backup segment for each link and they are calculated and allocated pre-failure. The backup segments are intended to cover both node and link failures. When an error occurs on a link or node the traffic on the link will quickly be switches to the backup segment and simultaneously the ingress LSR will be informed. This will compute an alternate path for the primary LSP. The traffic will now be switched onto this new LSP instead of over the backup segment. We use the term local repair when referring to techniques that accomplish this, and refer the LSP that is associated backup tunnel as a protected LSP. It is support for unidirectional point-to-point, but point-to-multipoint and multipoint-to-point are for further study for CR-LDP [7]. [35]

 

There are two basic strategies for setting up backup tunnels. These are one-to-one backup and facility backup for RSVP-TE [35] and for CR-LDP [7] exclusive and shared bandwidth protection respectively. The traffic will be switched onto the backup segment when a failure occurs at the protected LSP and will be switched back to the protected LSP when it is repaired. [35]

 

The first strategy operates on the basis of a backup LSP for each protected LSP. A label switched path is established that intersects the original tunnel somewhere downstream of the point of link or node failure. For each LSP that is backed up, another backup LSP is established. [35]

 

For the second means of backing up LSPs, a single LSP is created that serves to backup up a set of LSPs, instead of creating a separate LSP for every backed-up LSP. We call such a LSP tunnel a bypass tunnel [35].

 

Link failure detection can be performed through a layer-2 failure detection mechanism. Node failure detection can be done through IGP loss of adjacency or RSVP hellos messages extensions as defined in [26].

3.4.7     MPLS and traffic engineering

Operation or management of networks is, as far we can see, two words describing the same thing. Many of the tasks that traffic engineering have, deals with exactly this operation area. Traffic Engineering (TE) is concerned with performance optimization of operational networks. The aspects of interest concerning MPLS are measurement and control [9]. This gives network operators significant flexibility in controlling the paths of traffic flows across their networks and allows policies to be implemented that can result in the performance optimization of networks. But there is of course an operational limit of how many LSPs that actually are needed. A large number of LSP-tunnels allow greater control over the distribution of traffic across the network, but increases network operational complexity. [31]

 

A path from one given node to another must be computed, such that the path can provide QoS for IP traffic and fulfill other requirements the traffic might have. Once the path is computed, traffic engineering, which is a subset of constraint-based routing, is responsible for establishing and maintaining the forwarding state along the path. [37]

 

In order to lower the utilization of congested links and avoid congested resources, an operator may utilize TE methods to route a subset of traffic away from those links onto less congested topological elements. This can be for instance creating new LSP-tunnels around specific congested areas. [31]

 

MPLS TE methods can be applied to effectively distribute the aggregate traffic workload across parallel links between nodes. In this way it is possible to utilize resources in the network better. One can use LSP bandwidth parameters to control the proportion of demand traversing each link. It is also possible to explicitly configure routes for LSP tunnels to distribute routes across the parallel links, and using similarities to map different LSPs onto different links. [31]

 


It is sometimes desirable to restrict certain types of traffic to certain types of links, or to explicitly exclude certain types of links for the paths for some types of traffic. This is helpful when preventing for instance continental traffic from traversing transoceanic. Another example might be to exclude certain traffic from a subset of circuits to keep inter-regional LSPs away from circuits that are reserved for intra-regional traffic. [31]


Figure 29: Traffic engineering example [10]

 

For example, in the traffic-engineering example in Figure 29, there are two paths from Router C to G. If the router selects one of these paths as the shortest path, it will carry all traffic destined for G through that path. The resulting traffic volume on that path may cause congestion, while the other path is under-loaded. To maximize the performance of the overall network, it may be desirable to shift some fraction of traffic from one link to another. While one could, in this simple example, set the cost of Path C-D-G equal to the cost of Path C-E-F-G, such an approach to load balancing becomes difficult, if not impossible, in networks with a complex topology. Explicitly routed paths, implemented using MPLS, can be used as a more straightforward and flexible way of addressing this problem, allowing some fraction of the traffic on a congested path to be moved to a less congested path. [37]

3.4.8     MPLS SNMP MIBs

Several proposals to include Multi Protocol Label Switching Management Information Bases (MPLS MIBs) in MPLS have been made. They are now currently in the works at IETF Network Working Group, and this group has released a number of drafts that describe managed objects for modeling on MPLS. For the time being there are only MIBs in the draft stage. Traffic Engineering MIB [28] and Label Switch Router MIB [36] are two MIBs that are co-operating. Another example of a MPLS MIB is FEC-To-NHLFE MIB (FTN MIB) [38].

 

These three MIBs are most of the MIBs that are proposed at IETF, but there are of course many other MIBs, for instance MIBs implemented by Cisco. But a description of these MIBs will not be written in this thesis.

Traffic Engineering MIB

The Traffic Engineering MIB supports configuring and monitoring of MPLS tunnels, both tunnels created by RSVP-TE or CR-LDP and MPLS tunnels configured manually. Some of the features that this MIB have are reconfigure or remove existing tunnels, set the resources required for a tunnel and measure the performance of the tunnels. [28]

Label Switch Router MIB

This MIB is to use for modeling MPLS LSRs. The MPLS label switch router MIB (LSR-MIB) is designed to satisfy a number of requirements and constraints to configure LSRs. The MIB has the overview over the interfaces, both in and out interfaces, that are capable and their performance. [36]

FEC-To-NHLFE MIB

This MIB resides on any LSR that does the FTN mapping in order to map traffic into the MPLS domain. This mapping is performed on the ingress of the MPLS network. When using this MIB, one can specify the mappings between FEC and NHLFE and what action to be taken on matching packets. Another property is performance monitoring for the different FTNs. [38]


MECHANISMS FOR OAM ON MPLS IN LARGE IP BACKBONE NETWORKS (c) 2002 Hallstein Lohne, Johannes Vea, a graduate thesis written for AUC/ERICSSON