DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(bison.info) Symbols

Info Catalog (bison.info) Grammar Outline (bison.info) Grammar File (bison.info) Rules
 
 Symbols, Terminal and Nonterminal
 =================================
 
    "Symbols" in Bison grammars represent the grammatical classifications
 of the language.
 
    A "terminal symbol" (also known as a "token type") represents a
 class of syntactically equivalent tokens.  You use the symbol in grammar
 rules to mean that a token in that class is allowed.  The symbol is
 represented in the Bison parser by a numeric code, and the `yylex'
 function returns a token type code to indicate what kind of token has
 been read.  You don't need to know what the code value is; you can use
 the symbol to stand for it.
 
    A "nonterminal symbol" stands for a class of syntactically equivalent
 groupings.  The symbol name is used in writing grammar rules.  By
 convention, it should be all lower case.
 
    Symbol names can contain letters, digits (not at the beginning),
 underscores and periods.  Periods make sense only in nonterminals.
 
    There are three ways of writing terminal symbols in the grammar:
 
    * A "named token type" is written with an identifier, like an
      identifier in C.  By convention, it should be all upper case.  Each
      such name must be defined with a Bison declaration such as
      `%token'.   Token Type Names Token Decl.
 
    * A "character token type" (or "literal character token") is written
      in the grammar using the same syntax used in C for character
      constants; for example, `'+'' is a character token type.  A
      character token type doesn't need to be declared unless you need to
DONTPRINTYET       specify its semantic value data type ( Data Types of Semantic
      Values Value Type.), associativity, or precedence (*note Operator
DONTPRINTYET       specify its semantic value data type ( Data Types of Semantic
      Values Value Type.), associativity, or precedence ( Operator

      Precedence Precedence.).
 
      By convention, a character token type is used only to represent a
      token that consists of that particular character.  Thus, the token
      type `'+'' is used to represent the character `+' as a token.
      Nothing enforces this convention, but if you depart from it, your
      program will confuse other readers.
 
      All the usual escape sequences used in character literals in C can
      be used in Bison as well, but you must not use the null character
      as a character literal because its ASCII code, zero, is the code
      `yylex' returns for end-of-input ( Calling Convention for
      `yylex' Calling Convention.).
 
    * A "literal string token" is written like a C string constant; for
      example, `"<="' is a literal string token.  A literal string token
      doesn't need to be declared unless you need to specify its semantic
      value data type ( Value Type.), associativity, precedence
      ( Precedence.).
 
      You can associate the literal string token with a symbolic name as
      an alias, using the `%token' declaration ( Token
      Declarations Token Decl.).  If you don't do that, the lexical
      analyzer has to retrieve the token number for the literal string
      token from the `yytname' table ( Calling Convention.).
 
      *WARNING*: literal string tokens do not work in Yacc.
 
      By convention, a literal string token is used only to represent a
      token that consists of that particular string.  Thus, you should
      use the token type `"<="' to represent the string `<=' as a token.
      Bison does not enforces this convention, but if you depart from
      it, people who read your program will be confused.
 
      All the escape sequences used in string literals in C can be used
      in Bison as well.  A literal string token must contain two or more
      characters; for a token containing just one character, use a
      character token (see above).
 
    How you choose to write a terminal symbol has no effect on its
 grammatical meaning.  That depends only on where it appears in rules and
 on when the parser function returns that symbol.
 
    The value returned by `yylex' is always one of the terminal symbols
 (or 0 for end-of-input).  Whichever way you write the token type in the
 grammar rules, you write it the same way in the definition of `yylex'.
 The numeric code for a character token type is simply the ASCII code for
 the character, so `yylex' can use the identical character constant to
 generate the requisite code.  Each named token type becomes a C macro in
 the parser file, so `yylex' can use the name to stand for the code.
 (This is why periods don't make sense in terminal symbols.)  
 Calling Convention for `yylex' Calling Convention.
 
    If `yylex' is defined in a separate file, you need to arrange for the
 token-type macro definitions to be available there.  Use the `-d'
 option when you run Bison, so that it will write these macro definitions
 into a separate header file `NAME.tab.h' which you can include in the
 other source files that need it.   Invoking Bison Invocation.
 
    The symbol `error' is a terminal symbol reserved for error recovery
 ( Error Recovery.); you shouldn't use it for any other purpose.
 In particular, `yylex' should never return this value.
 
Info Catalog (bison.info) Grammar Outline (bison.info) Grammar File (bison.info) Rules
automatically generated byinfo2html