Context free grammar is defined by a finite set of productions. Each production specifies how higher-level grammar construct (a nonterminal) can be composed from lower-level terminals or nonterminals.
Input to the parser has form of stream of terminals (i.e. terminal symbols or tokens). This stream is usually provided by lexer. Groups of consecutive tokens are matched against productions of the grammar. When a match is found sequence of lower-level elements (terminals or nonterminals) is replaced by single higher-order nonterminal in the process known as reduction. This process continues until all input tokens are reduced to a single topmost-level nonterminal representing entire correctly parsed input.
Production is traditionally written with nonterminal being defined on the left and list of its component elements on the right:
There can be more than one production for the same nonterminal. Alternative productions for the same nonterminal can be written in separate lines or in one row, separated with '|' pipe symbol:
A grammar has exactly one top-level nonterminal which corresponds to entire input being parsed. Production for this topmost nonterminal is traditionally placed first.
Example CFG grammar is shown on Fig. 3.
The grammar must contain explicit declaration of all possible terminals that can appear on input. This declaration is preceded by keyword %token. Declaration of possible terminals (tokens) is a part of contract between lexer and parser: the lexer report tokens only from this predefined set recognized by parser.
Nonterminals don’t have to explicitly declared. It is enough that these appear on left hand side of any production.
In the example above the nonterminal defined in first production is SENTENCE. Since it appears first, it is treated as main (topmost) nonterminal of entire grammar corresponding to entire input text to be matched.
The ADJECTIVES nonterminal has three alternative productions (2),(3),(4) separated by pipes. This means that whenever nonterminal ADJECTIVES appears in the grammar it can be replaced by right-hand-side of either of this three productions ( being Quick or Brown or Quick Brown ). These three alternatives could be written in separate lines as well.
Finally MANYJUMPS nonterminal has two productions (5) and (6). Second of these productions has MANYJUMPS also on its right side. This means that nonterminal MANYJUMPS is recursive. By recursively substituting occurrence of MANYJUMPS in production (6) with the same production, one can generate sequences of tokens from production (6) of any length as shown in Fig. 4.