2   Lexical conventions                                          [lex]


1 The text of the program is kept in units called source files  in  this
  International  Standard.   A source file together with all the headers
  (_lib.headers_) and source files included (_cpp.include_) via the pre­
  processing directive #include, less any source lines skipped by any of
  the conditional inclusion (_cpp.cond_)  preprocessing  directives,  is
  called  a  translation  unit.   [Note:  a C++  program need not all be
  translated at the same time.  ]

2 [Note: previously translated translation units and instantiation units
  can  be  preserved individually or in libraries. The separate transla­
  tion units of a program communicate (_basic.link_)  by  (for  example)
  calls  to functions whose identifiers have external linkage, manipula­
  tion of objects whose identifiers have external linkage, or  manipula­
  tion of data files. Translation units can be separately translated and
  then later linked to produce an executable program. (_basic.link_).  ]

  2.1  Phases of translation                                [lex.phases]

1 The  precedence  among the syntax rules of translation is specified by
  the following phases.1)

    1 Physical  source file characters are mapped, in an implementation-
      defined manner, to the source character set (introducing  new-line
      characters  for  end-of-line  indicators)  if necessary.  Trigraph
      sequences (_lex.trigraph_) are replaced by  corresponding  single-
      character internal representations.  Any source file character not
      in the basic source character set (_lex.charset_) is  replaced  by
      the universal-character-name that designates that character.2)

    2 Each instance of a new-line character and an immediately preceding
      backslash  character is deleted, splicing physical source lines to
  1) Implementations must behave as if these separate phases occur,  al­
  though in practice different phases might be folded together.
  2)  The  process of handling extended characters is specified in terms
  of mapping to an encoding that uses only the  basic  source  character
  set,  and, in the case of character literals and strings, further map­
  ping to the execution character set.  In practical terms, however, any
  internal encoding may be used, so long as an actual extended character
  encountered in the input, and the same extended character expressed in
  the input as a universal-character-name (i.e. using the notation), are
  handled equivalently.

      form logical source lines.  If, as a result, a character  sequence
      that matches the syntax of a universal-character-name is produced,
      the behavior is undefined.  A source file that is not empty  shall
      end  in  a new-line character, which shall not be immediately pre­
      ceded by a backslash character.

    3 The  source  file  is   decomposed   into   preprocessing   tokens
      (_lex.pptoken_) and sequences of white-space characters (including
      comments).  A source file shall not end in a partial preprocessing
      token or partial comment3).  Each comment is replaced by one space
      character.   New-line  characters  are  retained.   Whether   each
      nonempty sequence of white-space characters other than new-line is
      retained or replaced by one  space  character  is  implementation-
      defined.   The process of dividing a source file's characters into
      preprocessing tokens is context-dependent.  [Example: see the han­
      dling of < within a #include preprocessing directive.  ]

    4 Preprocessing  directives  are  executed and macro invocations are
      expanded.  If a character sequence that matches the  syntax  of  a
      universal-character-name   is   produced  by  token  concatenation
      (_cpp.concat_), the behavior is undefined.  A #include preprocess­
      ing  directive  causes  the named header or source file to be pro­
      cessed from phase 1 through phase 4, recursively.

    5 Each source character set member, escape sequence,  or  universal-
      character-name  in  character literals and string literals is con­
      verted to a member of the execution character set.

    6 Adjacent character string literal tokens are concatenated.   Adja­
      cent wide string literal tokens are concatenated.

    7 White-space  characters  separating  tokens are no longer signifi­
      cant.  Each  preprocessing  token  is  converted  into  a   token.
      (_lex.token_). The resulting tokens are syntactically and semanti­
      cally analyzed and translated.

    8 Translated translation units and instantiation units are  combined
      as  follows:  [Note:  some  or all of these may be supplied from a
      library.  ] Each translated translation unit is examined  to  pro­
      duce  a  list of required instantiations.  [Note: this may include
      instantiations    which    have    been    explicitly    requested
      (_temp.explicit_).   ]  The  definitions of the required templates
      are located. It is implementation-defined whether  the  source  of
      the  translation units containing these definitions is required to
      be available.  [Note: an implementation  could  encode  sufficient
      information  into  the translated translation unit so as to ensure
      the  source  is  not  required   here.    ]   All   the   required
  3) A partial preprocessing token would arise from a source file ending
  in one or more characters of a multi-character  token  followed  by  a
  "line-splicing"  backslash.   A  partial  comment  would  arise from a
  source file ending with an unclosed /* comment, or a //  comment  line
  that ends with a "line-splicing" backslash.

      instantiations  are  performed  to  produce  instantiation  units.
      [Note: these are similar to translated translation units, but con­
      tain  no  references  to  uninstantiated templates and no template
      definitions.  ] The program is  ill-formed  if  any  instantiation

    9 All external object and function references are resolved.  Library
      components are linked to satisfy external references to  functions
      and  objects  not  defined  in  the  current translation. All such
      translator output is collected into a program image which contains
      information needed for execution in its execution environment.

  2.2  Basic source character set                          [lex.charset]

1 The  basic  source  character set consists of 96 characters: the space
  character, the control characters representing horizontal tab,  verti­
  cal  tab,  form  feed,  and  new-line, plus the following 91 graphical
          a b c d e f g h i j k l m n o p q r s t u v w x y z
          A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
          0 1 2 3 4 5 6 7 8 9
          _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = ,  " '

2 The universal-character-name construct provides a way  to  name  other
                  hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit

                  \u hex-quad
                  \U hex-quad hex-quad
  The character designated by the universal-character-name \UNNNNNNNN is
  that character whose encoding in  ISO/IEC  10646  is  the  hexadecimal
  value  NNNNNNNN;  the character designated by the universal-character-
  name \uNNNN is that character whose encoding in ISO/IEC 10646  is  the
  hexadecimal value 0000NNNN.

  2.3  Trigraph sequences                                 [lex.trigraph]

1 Before any other processing takes place, each occurrence of one of the
  following sequences of  three  characters  ("trigraph  sequences")  is
  replaced by the single character indicated in Table 1.

                       Table 1--trigraph sequences

  |trigraph   replacement | trigraph   replacement | trigraph   replacement |
  |  ??=           #      |   ??(           [      |   ??<           {      |
  |  ??/           \      |   ??)           ]      |   ??>           }      |
  |  ??'           ^      |   ??!           |      |   ??-           ~      |

2 [Example:
          ??=define arraycheck(a,b) a??(b??) ??!??! b??(a??)
          #define arraycheck(a,b) a[b] || b[a]
   --end example]

3 [Note: no other trigraph sequence exists.  Each ?  that does not begin
  one of the trigraphs listed above is not changed.  ]

  2.4  Preprocessing tokens                                [lex.pptoken]
                  each non-white-space character that cannot be one of the above

1 Each preprocessing token that is converted to  a  token  (_lex.token_)
  shall have the lexical form of a keyword, an identifier, a literal, an
  operator, or a punctuator.

2 A preprocessing token is the minimal lexical element of  the  language
  in  translation  phases  3 through 6.  The categories of preprocessing
  token are: header names, identifiers, preprocessing numbers, character
  literals,  string  literals, preprocessing-op-or-punc, and single non-
  white-space characters that do not lexically match the  other  prepro­
  cessing  token  categories.   If a ' or a " character matches the last
  category, the behavior is undefined.  Preprocessing tokens can be sep­
  arated  by  white space; this consists of comments (_lex.comment_), or
  white-space characters (space, horizontal tab, new-line, vertical tab,
  and  form-feed),  or  both.   As described in Clause _cpp_, in certain
  circumstances during translation phase 4, white space (or the  absence
  thereof)  serves  as  more than preprocessing token separation.  White
  space can appear within a preprocessing token only as part of a header
  name  or  between  the  quotation characters in a character literal or
  string literal.

3 If the input stream has been parsed into preprocessing tokens up to  a
  given  character, the next preprocessing token is the longest sequence
  of characters that could constitute a  preprocessing  token,  even  if
  that would cause further lexical analysis to fail.

4 [Example: The program fragment 1Ex is parsed as a preprocessing number
  token (one that is not a valid floating  or  integer  literal  token),
  even though a parse as the pair of preprocessing tokens 1 and Ex might
  produce a valid expression (for example, if Ex were a macro defined as
  +1).  Similarly, the program fragment 1E1 is parsed as a preprocessing
  number (one that is a valid floating literal token), whether or not  E
  is a macro name.  ]

5 [Example:  The  program  fragment  x+++++y  is  parsed as x ++ ++ + y,
  which, if x and y are of built-in  types,  violates  a  constraint  on
  increment  operators,  even though the parse x ++ + ++ y might yield a
  correct expression.  ]

  2.5  Alternative tokens                                  [lex.digraph]

1 Alternative token representations are provided for some operators  and

2 In all respects of the language, each alternative  token  behaves  the
  same, respectively, as its primary token, except for  its  spelling5).
  The set of alternative tokens is defined in Table 2.

  4)  These  include "digraphs" and additional reserved words.  The term
  "digraph" (token consisting of two characters) is  not  perfectly  de­
  scriptive,  since  one of the alternative preprocessing-tokens is %:%:
  and of course several primary tokens contain two characters.  Nonethe­
  less, those alternative tokens that aren't lexical keywords are collo­
  quially known as "digraphs".
  5)    Thus   [   and   <:   behave   differently   when   "stringized"
  (_cpp.stringize_), but can otherwise be freely interchanged.

                       Table 2--alternative tokens

  |alternative   primary | alternative   primary | alternative   primary |
  |    <%           {    |     and         &&    |   and_eq        &=    |
  |    %>           }    |    bitor         |    |    or_eq        |=    |
  |    <:           [    |     or          ||    |   xor_eq        ^=    |
  |    :>           ]    |     xor          ^    |     not          !    |
  |    %:           #    |    compl         ~    |   not_eq        !=    |
  |   %:%:         ##    |   bitand         &    |                       |

  2.6  Tokens                                                [lex.token]

1 There  are  five  kinds  of tokens: identifiers, keywords, literals,6)
  operators, and other  separators.   Blanks,  horizontal  and  vertical
  tabs, newlines, formfeeds, and comments (collectively, "white space"),
  as described below, are ignored  except  as  they  serve  to  separate
  tokens.   Some  white space is required to separate otherwise adjacent
  identifiers, keywords, and literals.

  2.7  Comments                                            [lex.comment]

1 The characters /* start a comment, which terminates with  the  charac­
  ters  */.  These comments do not nest.  The characters // start a com­
  ment, which terminates with the next new-line character. If there is a
  form-feed  or  a vertical-tab character in such a comment, only white-
  space characters shall appear between it and the new-line that  termi­
  nates  the  comment;  no  diagnostic  is required.  [Note: The comment
  characters //, /*, and */ have no special meaning within a //  comment
  and  are  treated  just like other characters.  Similarly, the comment
  characters // and /* have no special meaning within a /* comment.  ]

  6) Literals include strings and character and numeric literals.

  2.8  Header names                                         [lex.header]
                  h-char-sequence h-char
                  any member of the source character set except
                          new-line and >
                  q-char-sequence q-char
                  any member of the source character set except
                          new-line and "

1 Header name preprocessing tokens shall only appear within  a  #include
  preprocessing  directive (_cpp.include_).  The sequences in both forms
  of header-names are mapped  in  an  implementation-defined  manner  to
  external source file names as specified in _cpp.include_.

2 If  the characters ', \, ", or /* appear in the sequence between the <
  and > delimiters,  or  between  the  "  delimiters,  the  behavior  is

  2.9  Preprocessing numbers                              [lex.ppnumber]
                  . digit
                  pp-number digit
                  pp-number nondigit
                  pp-number e sign
                  pp-number E sign
                  pp-number .

1 Preprocessing number tokens lexically  include  all  integral  literal
  tokens (_lex.icon_) and all floating literal tokens (_lex.fcon_).

2 A  preprocessing  number  does not have a type or a value; it acquires
  both after a successful conversion (as part of  translation  phase  7,
  _lex.phases_)  to  an  integral  literal  token  or a floating literal

  2.10  Identifiers                                           [lex.name]
                  identifier nondigit
                  identifier digit

  7)  Thus, sequences of characters that resemble escape sequences cause
  undefined behavior.

          nondigit: one of
                  _ a b c d e f g h i j k l m
                    n o p q r s t u v w x y z
                    A B C D E F G H I J K L M
                    N O P Q R S T U V W X Y Z
          digit: one of
                  0 1 2 3 4 5 6 7 8 9

1 An identifier is an arbitrarily long sequence of letters  and  digits.
  Each  universal-character-name in an identifier shall designat a char­
  acter whose encoding in ISO 10646 falls into one of the ranges  speci­
  fied in _extendid_.  Upper- and lower-case letters are different.  All
  characters are significant.8)

2 In addition, identifiers containing a double underscore (__) or begin­
  ning  with an underscore and an upper-case letter are reserved for use
  by C++ implementations and standard libraries and shall  not  be  used
  otherwise; no diagnostic is required.

  2.11  Keywords                                               [lex.key]

1 The  identifiers  shown  in  Table  3 are reserved for use as keywords
  (that is, they are unconditionally treated as keywords in phase 7):

  8) On systems in which linkers cannot accept extended  characters,  an
  encoding  of the universal-character-name may be used in forming valid
  external identifiers.  For example, some otherwise unused character or
  sequence  of  characters  may be used to encode the \u in a universal-
  character-name.  Extended characters may produce a long external iden­
  tifier,  but  C++  does  not  place a translation limit on significant
  characters for external identifiers.  In C++,  upper-  and  lower-case
  letters are considered different for all identifiers, including exter­
  nal identifiers.

                            Table 3--keywords

  |asm          do             inline             short         typeid       |
  |auto         double         int                signed        typename     |
  |bool         dynamic_cast   long               sizeof        union        |
  |break        else           mutable            static        unsigned     |
  |case         enum           namespace          static_cast   using        |
  |catch        explicit       new                struct        virtual      |
  |char         extern         operator           switch        void         |
  |class        false          private            template      volatile     |
  |const        float          protected          this          wchar_t      |
  |const_cast   for            public             throw         while        |
  |continue     friend         register           true                       |
  |default      goto           reinterpret_cast   try                        |
  |delete       if             return             typedef                    |

2 Furthermore, the alternative representations shown in Table 4 for cer­
  tain  operators and punctuators (_lex.digraph_) are reserved and shall
  not be used otherwise:

                   Table 4--alternative representations

            |and      and_eq   bitand   bitor   compl    not |
            |not_eq   or       or_eq    xor     xor_eq       |

  2.12  Operators and punctuators

1 The lexical representation of C++ programs includes a number  of  pre­
  processing  tokens which are used in the syntax of the preprocessor or
  are converted into tokens for operators and punctuators:
          preprocessing-op-or-punc: one of
          {       }       [       ]       #       ##      (       )
          <:      :>      <%      %>      %:      %:%:    ;       :       ...
          new     delete  ?       ::      .       .*
          +       -       *       /       %       ^       &       |       ~
          !       =       <       >       +=      -=      *=      /=      %=
          ^=      &=      |=      <<      >>      >>=     <<=     ==      !=
          <=      >=      &&      ||      ++      --      ,       ->*     ->
          and     and_eq  bitand  bitor   compl   not     not_eq  or      or_eq
          xor     xor_eq

  Each preprocessing-op-or-punc is converted to a single token in trans­
  lation phase 7 (_lex.phases_).

  2.13  Literals                                           [lex.literal]

1 There are several kinds of literals.9)

  2.13.1  Integer literals                                    [lex.icon]
                  decimal-literal integer-suffixopt
                  octal-literal integer-suffixopt
                  hexadecimal-literal integer-suffixopt
                  decimal-literal digit
                  octal-literal octal-digit
                  0x hexadecimal-digit
                  0X hexadecimal-digit
                  hexadecimal-literal hexadecimal-digit
          nonzero-digit: one of
                  1  2  3  4  5  6  7  8  9
          octal-digit: one of
                  0  1  2  3  4  5  6  7
          hexadecimal-digit: one of
                  0  1  2  3  4  5  6  7  8  9
                  a  b  c  d  e  f
                  A  B  C  D  E  F
                  unsigned-suffix long-suffixopt
                  long-suffix unsigned-suffixopt
          unsigned-suffix: one of
                  u  U
          long-suffix: one of
                  l  L

1 An integer literal is a sequence of digits that has no period or expo­
  nent  part.   An  integer literal may have a prefix that specifies its
  base and a suffix that specifies its type.  The lexically first  digit
  of  the sequence of digits is the most significant.  A decimal integer
  literal (base ten) begins with a digit other then 0 and consists of  a
  sequence  of  decimal  digits.   An octal integer literal (base eight)
  begins with the digit 0 and consists of a sequence of octal digits.10)
  An hexadecimal integer literal (base sixteen) begins with 0x or 0X and
  9)  The  term  "literal"  generally  designates, in this International
  Standard, those tokens that are called "constants" in ISO C.
  10) The digits 8 and 9 are not octal digits.

  consists of a sequence of hexadecimal digits which include the decimal
  digits  and  the letters a or A through f or F with decimal values ten
  through fifteen.  [Example: the number twelve can be written 12,  014,
  or 0XC.  ]

2 The type of an integer literal depends on its form, value, and suffix.
  If it is decimal and has no suffix, it has the first of these types in
  which  its  value  can  be  represented:  int, long int, unsigned long
  int.11)  If  it  is octal or hexadecimal and has no suffix, it has the
  first of these types in which  its  value  can  be  represented:  int,
  unsigned  int, long int, unsigned long int.  If it is suffixed by u or
  U, its type is the first of these types in which its value can be rep­
  resented:  unsigned int, unsigned long int.  If it is suffixed by l or
  L, its type is the first of these types in which its value can be rep­
  resented:  long  int, unsigned long int.  If it is suffixed by ul, lu,
  uL, Lu, Ul, lU, UL, or LU, its type is unsigned long int.

3 A program is ill-formed if one of its translation  units  contains  an
  integer  literal  that  cannot  be  represented  by any of the allowed

  2.13.2  Character literals                                  [lex.ccon]
                  c-char-sequence c-char
                  any member of the source character set except
                          the single-quote ', backslash \, or new-line character
          simple-escape-sequence: one of
                  \'  \"  \?  \\
                  \a  \b  \f  \n  \r  \t  \v
                  \ octal-digit
                  \ octal-digit octal-digit
                  \ octal-digit octal-digit octal-digit
                  \x hexadecimal-digit
                  hexadecimal-escape-sequence hexadecimal-digit
  11) A decimal integer literal with no suffix never has  type  unsigned
  int.   Otherwise, for example, on an implementation where unsigned int
  values have 16 bits and unsigned long values have strictly  more  than
  17  bits,  we  would have -30000<0, -50000>0 (because 50000 would have
  type unsigned int), and -70000<0 (because 70000 would have type long).

1 A character literal is one  or  more  characters  enclosed  in  single
  quotes,  as  in  'x', optionally preceded by the letter L, as in L'x'.
  Single character literals that do not begin with  L  have  type  char,
  with value equal to the numerical value of the character in the execu­
  tion character set.  Multicharacter literals that do not begin with  L
  have type int and implementation-defined value.

2 A character literal that begins with the letter L, such as L'ab', is a
  wide-character literal.  Wide-character literals have type wchar_t.12)
  Wide-character literals have implementation-defined values, regardless
  of the number of characters in the literal.

3 Certain nongraphic characters, the single quote ', the double quote ",
  the question mark ?, and the backslash \, can be represented according
  to Table 5.

                        Table 5--escape sequences

                   |new-line          NL (LF)   \n    |
                   |horizontal tab    HT        \t    |
                   |vertical tab      VT        \v    |
                   |backspace         BS        \b    |
                   |carriage return   CR        \r    |
                   |form feed         FF        \f    |
                   |alert             BEL       \a    |
                   |backslash         \         \\    |
                   |question mark     ?         \?    |
                   |single quote      '         \'    |
                   |double quote      "         \"    |
                   |octal number      ooo       \ooo  |
                   |hex number        hhh       \xhhh |
  The  double  quote  "  and  the question mark ?, can be represented as
  themselves or by the escape sequences \" and \?  respectively, but the
  single  quote ' and the backslash \ shall be represented by the escape
  sequences \' and \\ respectively.  If the character following a  back­
  slash  is  not  one of those specified, the behavior is undefined.  An
  escape sequence specifies a single character.

4 The escape \ooo consists of the backslash followed  by  one,  two,  or
  three  octal digits that are taken to specify the value of the desired
  character.  The escape \xhhh consists of the backslash followed  by  x
  followed  by  one or more hexadecimal digits that are taken to specify
  the value of the desired character.  There is no limit to  the  number
  of digits in a hexadecimal sequence.  A sequence of octal or hexadeci­
  mal digits is terminated by the first character that is not  an  octal
  12) They are intended for character sets where a  character  does  not
  fit into a single byte.

  digit  or a hexadecimal digit, respectively.  The value of a character
  literal is implementation-defined if it falls outside of the implemen­
  tation-defined  range  defined  for  char  (for  ordinary literals) or
  wchar_t (for wide literals).

5 A universal-character-name is translated to the encoding, in the  exe­
  cution  character  set,  of  the character named.  If there is no such
  encoding, the universal-character-name is translated to an implementa­
  tion-defined  encoding.   [Note:  in translation phase 1, a universal-
  character-name is introduced whenever an actual extended character  is
  encountered  in  the  source text.  Therefore, all extended characters
  are described in terms  of  universal-character-names.   However,  the
  actual  compiler  implementation may use its own native character set,
  so long as the same results are obtained.  ]

  2.13.3  Floating literals                                   [lex.fcon]
                  fractional-constant exponent-partopt floating-suffixopt
                  digit-sequence exponent-part floating-suffixopt
                  digit-sequenceopt . digit-sequence
                  digit-sequence .
                  e signopt digit-sequence
                  E signopt digit-sequence
          sign: one of
                  +  -
                  digit-sequence digit
          floating-suffix: one of
                  f  l  F  L

1 A floating literal consists of an integer part,  a  decimal  point,  a
  fraction  part,  an e or E, an optionally signed integer exponent, and
  an optional type suffix.  The integer and fraction parts both  consist
  of  a  sequence of decimal (base ten) digits.  Either the integer part
  or the fraction part (not both) can be  omitted;  either  the  decimal
  point  or the letter e (or E) and the exponent (not both) can be omit­
  ted.  The integer part, the optional decimal point  and  the  optional
  fraction  part form the significant part of the floating literal.  The
  exponent, if present, indicates the power of 10 by which the  signifi­
  cant  part  is  to  be scaled.  If the scaled value is in the range of
  representable values for its type, the result is  either  the  nearest
  representable  value,  or  the  larger  or smaller representable value
  immediately adjacent to the nearest representatble value, chosen in an
  implementation-defined manner.  The type of a floating literal is dou­
  ble unless explicitly specified by a suffix.  The  suffixes  f  and  F
  specify  float,  the  suffixes  l  and  L specify long double.  If the
  scaled value is not in the range of representable values for its type,
  the program is ill-formed.

  2.13.4  String literals                                   [lex.string]
                  s-char-sequence s-char
                  any member of the source character set except
                          the double-quote ", backslash \, or new-line character

1 A   string  literal  is  a  sequence  of  characters  (as  defined  in
  _lex.ccon_) surrounded by double quotes, optionally beginning with the
  letter L, as in "..." or L"...".  A string literal that does not begin
  with L has type "array of n const char" and  static  storage  duration
  (_basic.stc_), where n is the size of the string as defined below, and
  is initialized with the  given  characters.   A  string  literal  that
  begins  with  L,  such  as  L"asdf", is a wide string literal.  A wide
  string literal has type "array of n  const  wchar_t"  and  has  static
  storage  duration, where n is the size of the string as defined below,
  and is initialized with the given characters.

2 Whether all string literals are  distinct  (that  is,  are  stored  in
  nonoverlapping  objects)  is  implementation-defined.   The  effect of
  attempting to modify a string literal is undefined.

3 In translation phase 6 (_lex.phases_), adjacent  string  literals  are
  concatenated and adjacent wide string literals are concatenated.  If a
  string literal token is adjacent to a wide string literal  token,  the
  behavior  is  undefined.   Characters in concatenated strings are kept
  distinct.  [Example:
          "\xA" "B"
  contains the two characters '\xA' and 'B' after concatenation (and not
  the single hexadecimal character '\xAB').  ]

4 After   any   necessary   concatenation,   in   translation   phase  7
  (_lex.phases_), '\0' is appended to every string literal so that  pro­
  grams that scan a string can find its end.

5 Escape sequences and universal-character-names in string literals have
  the same meaning as in character literals  (_lex.ccon_),  except  that
  the  single quote ' is representable either by itself or by the escape
  sequence \', and the double quote " shall be preceded by a  \.   In  a
  non-wide  string  literal,  a universal-character-name may map to more
  than one char element.  The size of a wide string literal is the total
  number of escape sequences, universal-character-names, and other char­
  acters, plus one for the terminating L'\0'.  The size  of  a  non-wide
  string literal is the total number of escape sequences and other char­
  acters, plus at least one for the multibyte encoding of  each  univer­
  sal-character-name, plus one for the terminating '\0'.

  2.13.5  Boolean literals                                    [lex.bool]

1 The  Boolean  literals are the keywords false and true.  Such literals
  have type bool.  They are not lvalues.