- In formal language theory, a context-free grammar is said to be in Chomsky normal form if all of its production rules are of the form: or or
where , and are nonterminal symbols, α is a terminal symbol (a symbol that represents a constant value), is the start symbol, and ε is the empty string. Also, neither nor may be the start symbol, and the third production rule can only appear if ε is in L(G), namely, the language produced by the Context-Free Grammar G.
Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be transformed into an equivalent one which is in Chomsky normal form. Several algorithms for performing such a transformation are known. Transformations are described in most textbooks on automata theory, such as Hopcroft and Ullman, 1979.[1] As pointed out by Lange and Leiß,[2] the drawback of these transformations is that they can lead to an undesirable bloat in grammar size. The size of a grammar is the sum of the sizes of its production rules, where the size of a rule is one plus the length of its right-hand side. Using to denote the size of the original grammar , the size blow-up in the worst case may range from to , depending on the transformation algorithm used.
Converting a Grammar to Chomsky Normal Form 1. Introduce
Introduce a new start variable, and a new rule where is the previous start variable. 2. Eliminate all rules rules are rules of the form where and where is the CFG's variable alphabet.
Remove every rule with on its right hand side (RHS). For each rule with in its RHS, add a set of new rules consisting of the different possible combinations of replaced or not replaced with . If a rule has as a singleton on its RHS, add a new rule unless has already been removed through this process. For example, examine the following grammar :
has one rule. When the is removed, we get the following:
Notice that we have to account for all possibilities of and so