5 Cell Broadcast Data Coding Scheme

03.383GPPAlphabets and language-specific informationTS

The Cell Broadcast Data Coding Scheme indicates the intended handling of the message at the MS, the alphabet/coding, and the language (when applicable). Any reserved codings shall be assumed to be the GSM default alphabet (the same as codepoint 00001111) by a receiving entity. The octet is used according to a coding group which is indicated in bits 7..4. The octet is then coded as follows:

Coding Group

Bits

7..4

Use of bits 3..0

0000

Language using the default alphabet

Bits 3..0 indicate the language:

0000 German

0001 English

0010 Italian

0011 French

0100 Spanish

0101 Dutch

0110 Swedish

0111 Danish

1000 Portuguese

1001 Finnish

1010 Norwegian

1011 Greek

1100 Turkish

1101Hungarian

1110 Polish

1111 Language unspecified

0001

0000 Default alphabet; message preceded by language indication.

The first 3 characters of the message are a two-character representation of the language encoded according to ISO 639 [12], followed by a CR character. The CR character is then followed by 90 characters of text. A Pre-Phase 2+ MS will overwrite the start of the message up to the CR and present only the text.

0001 UCS2; message preceded by language indication

The message starts with a two 7-bit default alphabet character representation of the language encoded according to ISO 639 [12]. This is padded to the octet boundary with two bits set to 0 and then followed by 40 characters of UCS2-encoded message.

An MS not supporting UCS2 coding will present the two character language identifier followed by improperly interpreted user data.

0010..1111 Reserved for European languages

0010..

0000 Czech

0001 .. 1111 Reserved for European Languages using the default alphabet, with unspecified handling at the MS

0011

0000..1111 Reserved for European Languages using the default alphabet, with unspecified handling at the MS

(continued)

(concluded)

01xx

General Data Coding indication

Bits 5..0 indicate the following:

Bit 5, if set to 0, indicates the text is uncompressed

Bit 5, if set to 1, indicates the text is compressed using the GSM standard compressing algorithm. ( see GSM TS 03.42 )

Bit 4, if set to 0, indicates that bits 1 to 0 are reserved and have no message class meaning

Bit 4, if set to 1, indicates that bits 1 to 0 have a message class meaning:

Bit 1 Bit 0 Message Class:

0 0 Class 0

0 1 Class 1 Default meaning: ME-specific.

1 0 Class 2 SIM specific message.

1 1 Class 3 Default meaning: TE-specific (see GSM TS 07.05 [8])

Bits 3 and 2 indicate the alphabet being used, as follows:

Bit 3 Bit 2 Alphabet:

0 0 Default alphabet

0 1 8 bit data

1 0 USC2 (16 bit) [10]

1 1 Reserved

1000..1101

Reserved coding groups

1110

Defined by the WAP Forum [15]

1111

Data coding / message handling

Bit 3 is reserved, set to 0.

Bit 2 Message coding:

0 Default alphabet

1 8 bit data

Bit 1 Bit 0 Message Class:

0 0 No message class.

0 1 Class 1 user defined.

1 0 Class 2 user defined.

1 1 Class 3

default meaning: TE specific

(see GSM TS 07.05 [8])

These codings may also be used for Unstructured SS Data and MMI/display purposes.

See GSM 04.90 [11] for specific coding values applicable to Unstructured SS Data for MS originated USSD messages and MS terminated USSD messages. USSD messages using the default alphabet are coded with the 7-bit alphabet given in subclause 6.2.1. The message can then consist of up to 182 user characters.

Cell Broadcast messages using the default alphabet are coded with the 7-bit alphabet given in subclause 6.2.1. The message then consists of 93 user characters.

If the 7 bit default alphabet extension mechanism is used then the number of displayable characters will reduce by one for every instance where the 7 bit default alphabet extension table is usedCell Broadcast messages using 8-bit data have user-defined coding, and will be 82 octets in length.

UCS2 alphabet indicates that the message is coded in UCS2 [10]. The General notes specified in subclause 6.1.1 override any contrary specification in UCS2, so for example even in UCS2 a <CR> character will cause the MS to return to the beginning of the current line and overwrite any existing text with the characters which follow the <CR>. Messages encoded in UCS2 consist of 41 characters.

Class 1 and Class 2 messages may be routed by the ME to user-defined destinations, but the user may override any default meaning and select their own routing.

Class 3 messages will normally be selected for transfer to a TE, in cases where a ME supports an SMS/CBS interface to a TE, and the TE requests "TE-specific" cell broadcast messages (see GSM 07.05 [8]). The user may be able to override the default meaning and select their own routing.