4 General requirements on fault management

12.113GPPFault management of the Base Station System (BSS)TS

GSM fault management follows TMN principles as specified in GSM 12.00 [18]. These principles provide for management of a GSM PLMN based on an information model. This model is developed through the definition of a required set of management services which are then decomposed into service components. These service components are supported by a number of management functions according to M.3400 [4], and an information model is defined to support this set of functions.

In GSM 12.00, the terms reactive maintenance and proactive maintenance are used to identify two different forms of maintenance for a GSM network element. Reactive maintenance is the use of the above management functions to detect a fault and restore all or part of the network element following a failure. Proactive maintenance is the use of the above management functions and manual routine maintenance activities to prevent, as far as possible, the occurrence of a failure. The present document does not explicitly assign the above management functions to either proactive or reactive maintenance, and generically defines the management functions such that they may be used either as part of proactive or reactive maintenance. The remainder of the present document thus makes no further reference to either proactive or reactive maintenance.

Fault Management is a functional area which can support a number of management services (M.3400)¬†[4]. The following list gives examples of the general objectives of the NE’s fault management:

– Inform the operator and/or OS of the current NE condition;

– Provide timely and accurate data regarding any abnormal change in the condition of the NE;

– Maintain synchronisation between the actual conditions in the NEs and the knowledge of the conditions as understood by the OS;

– Provide procedures which allow system recovery either automatically or on operator demand after a fault detection.

To support these objectives the NE shall offer the following set of capabilities:

– Surveillance capability to monitor the system such that faults, defects and anomalies may be detected and reported;

– Fault Localisation capability to identify the one or more replaceable units at fault;

– Fault Correction capability to isolate the faulty units and restore the system to operation;

– Testing capability to verify the proper operation of physical and functional resources in the NE.

To support fault management, the state management capability may also be necessary, for example, to isolate a faulty unit by changing its administrative state, to provide a specific environment for testing etc. The usage of state management for fault management purposes will be described in each clause where it is appropriate.

In addition to the above, the capabilities of other functional management areas are often used in support of fault management. For example, parts of performance management services may be used for the fault detection capability (e.g.: counters and gauges for threshold management) and configuration management functions may be used to restore the system to the best operational configuration.

Based on M.3400 [4], GSM fault management requirements can be achieved by means of the following management service components:

– Alarm surveillance service components;

– Fault localisation service component;

– Fault correction service component;

– Testing service component;

– Trouble administration service component.

Of this list, the trouble administration component is not addressed by the present document as it is too closely related with the operator’s operational procedures and thus is not suitable for standardisation.

The operator and the OS are informed of an NE failure by means of functions provided by the alarm surveillance component: the alarm reporting functions. The information provided by alarm reporting should be sufficient to localise the fault. However, if necessary, the operator may also use the testing capabilities to obtain further details for fault diagnosis. Depending on the type of the detected fault and its impact on the telecommunication services, the fault correction service component provides automatic or manual actions to configure the NE so as to minimise the loss of the telecommunication services. When the faulty unit(s) is repaired, the fault correction service component again provides automatic or manual actions to restore the previously faulty unit(s) to its normal operation. To complete the fault management process, the operator is able to perform a final test to certify the behaviour of the repaired system.

4.1 Overview of the service components

4.1.1 Alarm Surveillance service component

The Alarm Surveillance Service Component performs system monitoring and fault detection in near real time. When a failure occurs in an NE, an alarm record is stored in a log (depending on the filter criteria) and an alarm report is forwarded (depending on the filter and forwarding criteria) as soon as possible to the OS across the Q3 interface. The nature and severity of the faults are determined by the NE. It is important that alarm reports are not lost in case of temporary interruption of communication between the NE and the OS.

The Alarm Surveillance service component is mandatory, and the refinement of the mandatory and optional parts of this service component is further defined in subsequent clauses.

4.1.2 Fault Localisation service component

The objective of Fault Localisation is to identify the faulty unit by means of the information provided by the NE when it notifies the OS of the failure. If necessary, further identification by means of localisation routines (e.g.: tests controlled by the OS) can also be run to get more details.

The Fault Localisation service component is mandatory, and the refinement of the mandatory and optional parts of this service component is further defined in subsequent clauses.

4.1.3 Fault Correction service component

After the identification of the fault and the replaceable faulty units, support by the Fault Correction service component is necessary in order to perform system recovery and/or restoration, either automatically by the NE and/or the OS, or manually by the operator. The first fault correction action is the isolation of the faulty unit, to reduce the effect of the fault on other parts, internal or external to the NE.

The Fault Correction service component is optional. If implemented, at least the mandatory parts of this service component shall be provided as further defined in subsequent clauses.

4.1.4 Testing service component

The Testing service component provides support for the other three fault management capabilities. Testing can be carried out in two ways: uncontrolled and controlled by the management system, and can be performed through periodic scheduling or on demand. Several categories of tests are necessary to cover all the requirements.

The Testing service component is optional. If implemented, at least the mandatory parts of this service component shall be provided as further defined in subsequent clauses.