Table of Contents
This is a memo of RFC 5646, ie BCP-47.
1 The Language Tag
Language tags are used to help identify languages, whether spoken, written, signed, or otherwise signaled, for the purpose of communication. This includes constructed and artificial languages but excludes languages not intended primarily for human communication, such as programming languages.
1.1 Syntax
- TAG is composed from a sequence of one or more subtags
- SubTags are sequence of alphanumric characters to narrow the range of languge.
- SubTags are concated suing "-".
The syntax of the language tag in ABNF [RFC5234] is:
Language-Tag = langtag ; normal language tags / privateuse ; private use tag / grandfathered ; grandfathered tags langtag = language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse] language = 2*3ALPHA ; shortest ISO 639 code ["-" extlang] ; sometimes followed by ; extended language subtags / 4ALPHA ; or reserved for future use / 5*8ALPHA ; or registered language subtag extlang = 3ALPHA ; selected ISO 639 codes *2("-" 3ALPHA) ; permanently reserved script = 4ALPHA ; ISO 15924 code region = 2ALPHA ; ISO 3166-1 code / 3DIGIT ; UN M.49 code variant = 5*8alphanum ; registered variants / (DIGIT 3alphanum) extension = singleton 1*("-" (2*8alphanum)) ; Single alphanumerics ; "x" reserved for private use singleton = DIGIT ; 0 - 9 / %x41-57 ; A - W / %x59-5A ; Y - Z / %x61-77 ; a - w / %x79-7A ; y - z privateuse = "x" 1*("-" (1*8alphanum)) grandfathered = irregular ; non-redundant tags registered / regular ; during the RFC 3066 era irregular = "en-GB-oed" ; irregular tags do not match / "i-ami" ; the ‘langtag‘ production and / "i-bnn" ; would not otherwise be / "i-default" ; considered ‘well-formed‘ / "i-enochian" ; These tags are all valid, / "i-hak" ; but most are deprecated / "i-klingon" ; in favor of more modern / "i-lux" ; subtags or subtag / "i-mingo" ; combination / "i-navajo" / "i-pwn" / "i-tao" / "i-tay" / "i-tsu" / "sgn-BE-FR" / "sgn-BE-NL" / "sgn-CH-DE" regular = "art-lojban" ; these tags match the ‘langtag‘ / "cel-gaulish" ; production, but their subtags / "no-bok" ; are not extended language / "no-nyn" ; or variant subtags: their meaning / "zh-guoyu" ; is defined by their registration / "zh-hakka" ; and all of these are deprecated / "zh-min" ; in favor of a more modern / "zh-min-nan" ; subtag or sequence of subtags / "zh-xiang" alphanum = (ALPHA / DIGIT) ; letters and numbers
Figure 1: Language Tag ABNF
Note:
1.1.1 Formatting of Languge Tags
Although tags should be case-insensitive, there are formatting conventions:
- recommends that language codes be written in lowercase (‘mn‘ Mongolian).
- recommends that script codes use lowercase with the initial letter capitalized (‘Cyrl‘ Cyrillic).
- recommends that country codes be capitalized (‘MN‘ Mongolia).
1.2 Language Subtag Sources and Interpretation
The namespace of language tags and their subtags is administered by the Internet Assigned Numbers Authority (IANA) according to the rules in Section 5 of this document. The Language Subtag Registry maintained by IANA is the source for valid subtags: other standards referenced in this section provide the source material for that registry.
1.2.1 Primary Language Subtag
Should never be omitted in most cases, can be two or three characters.