Links
- Formatting Messages | ICU User Guide
Pattern_SyntaxandPattern_Whitespace- Unicode pattern syntax.
argNameOrNumberinMessageFormat(the first argument) is composed of either digits or anything that is NOT fromPattern_SyntaxandPattern_Whitespace(so excludes all punctuation, spaces, etc.)
- Unicode pattern syntax.
- JS Lib: SlexAxton/messageformat.js
Patterns and Their Interpretation
Ref: icu-project.org/…/MessageFormat.html, Pattern_Syntax
MessageFormat uses patterns of the following form:
message = messageText (argument messageText)*
argument = noneArg | simpleArg | pluralArg | selectArg | selectordinalArg
noneArg = '{' argNameOrNumber '}'
simpleArg = '{' argNameOrNumber ',' argType [',' argStyle] '}'
pluralArg = '{' argNameOrNumber ',' "plural" ',' pluralStyle '}'
selectArg = '{' argNameOrNumber ',' "select" ',' selectStyle '}'
selectordinalArg = '{' argNameOrNumber ',' "selectordinal" ',' pluralStyle '}'
argNameOrNumber = argName | argNumber
argName = [^[[:Pattern_Syntax:][:Pattern_White_Space:]]]+
argNumber = '0' | ('1'..'9' ('0'..'9')*)
argType = "number" | "date" | "time" | "spellout" | "ordinal" | "duration"
argStyle = "short" | "medium" | "long" | "full" | "integer" | "currency" | "percent" | argStyleText
pluralStyle = [offsetValue] ( explicitValue|pluralKeyword '{' message '}')+ // the "other" pluralKeyword is required.
offsetValue = "offset:" number
explicitValue = '=' number // adjacent, no white space in between
pluralKeyword = 'zero' | 'one' | 'two' | 'few' | 'many' | 'other' | keyword
selectStyle = (selectKeyword '{' message '}')+ // the "other" selectKeyword is required.
selectKeyword = 'other' | keyword
keyword = [^[[:Pattern_Syntax:][:Pattern_White_Space:]]]+
Pattern_White_Space between syntax elements is ignored, except:
- between the {curly braces} and their sub-message
- between the '=' and the number in
explicitValue(i.e. there must be no space between them. e.g. "=1")
Plurals
Predefined keyword's: 'zero', 'one', 'two', 'few', 'many' and 'other'.
You must always define message text for the
other case (it's the fallback.)
Matching Priority / Algorithm
- Exact Matches
- Match the input number against the
explicitValueclauses. If found, use thatmessageTextand return.
- Match the input number against the
- Keyword Matches
- set
keyword = PluralRules(input_number - offset)(offsetdefaults to 0) - Use clause corresponding to this
keywordand return (if found).
- set
- Fallback
- Use the
messageTextcorresponding to theotherclause.
- Use the
Formatting
- Format
number-minus-offsetusing aNumberFormatfor thePluralFormat's locale.- If you need special number formatting, you have to
use a
MessageFormatand explicitly specify aNumberFormatargument. (Note that argument is formatting without subtracting the offset! If you need a custom format and have a non-zero offset, then you need to pass thenumber-minus-offsetvalue as a separate parameter.)
- If you need special number formatting, you have to
use a
- Replace an unquoted pound sign (
#)in the selected sub-message by the formattednumber-minus-offsetvalue from the previous step.
Gender and "select"
The main use case for the select format (selectArg) is gender based inflection.
When names or nouns are inserted into sentences, their gender can affect pronouns, verb forms, articles, and adjectives. Special care needs to be taken for the case where the gender cannot be determined. The impact varies between languages:
- English has three genders, and unknown gender is handled as a special case. Names use the gender of the named person (if known), nouns referring to people use natural gender, and inanimate objects are usually neutral. The gender only affects pronouns: "he", "she", "it", "they".
- German differs from English in that the gender of nouns is rather arbitrary, even for nouns referring to people ("Mädchen", girl, is neutral). The gender affects pronouns ("er", "sie", "es"), articles ("der", "die", "das"), and adjective forms ("guter Mann", "gute Frau", "gutes Mädchen").
- French has only two genders; as in German the gender of nouns is rather arbitrary - for sun and moon, the genders are the opposite of those in German. The gender affects pronouns ("il", "elle"), articles ("le", "la"), adjective forms ("bon", "bonne"), and sometimes verb forms ("allé", "allée").
- Polish distinguishes five genders (or noun classes), human masculine, animate non-human masculine, inanimate masculine, feminine, and neuter.
- Noun clauses: Some other languages have noun classes that are not related to gender, but similar in grammatical use. Some African languages have around 20 noun classes.
The fallback keyword is "other" (just like with pluralization.) Some common keywords are: "male", "female", "mixed" (for groups of people) and "Unknown".
Quoting
messageTextcan contain quoted literal strings including syntax characters.- A quoted literal string begins with an ASCII apostrophe and a syntax character (usually a curly brace/{}) and continues until the next single apostrophe.
- A double ASCII apostrohpe inside or outside of a quoted string represents one literal apostrophe.
- Quotable syntax characters are the curly braces ("
{", "}") in allmessageTextparts, plus the "#" sign in amessageTextimmediately inside apluralStyle. - See also
MessagePattern.ApostropheMode - In
argStyleText, every single ASCII apostrophe begins and ends quoted literal text, and unquoted {curly braces} must occur in matched pairs.
Recommendation: Use the real apostrophe character, «’» (U+2019),
for human-readable text, and use the ASCII
apostrophe, «'» (U+0027), only in program syntax, like
quoting in MessageFormat. See the annotations for U+0027
Apostrophe in The Unicode Standard.