Phase 1: Lexical Analyzer or Scanner
The first phase of the compiler, called Lexical Analyzer or Scanner reads the source program one character at a time, carving the source program into a sequence of atomic units called tokens. The usual tokens are identifiers, keywords, constants, operators and punctuation symbols such as comma and parenthesis. Each token is a sub-string of the source program that is to be treated as a single unit. The Lexical analyzer examines successive character in the source program starting from the first character not yet grouped into a token. It may be required to search many characters beyond the next token in order to determine what the next token actually is.
Phases of a Compiler
Phase 2: Syntax Analyzer or Parser
The second phase of the compiler, called the Syntax Compiler or Parser receives a stream of tokens as the output of the lexical analyzer. The syntax analyzer groups tokens together into syntactic structure called as expression. Expression may further be combined to form statements. The syntactic structure can be regarded as a tree whose leaves are the token called as parse trees. The parser has two functions: i) Firstly, it checks if the tokens from lexical analyzer, occur in pattern that are permitted by the specification for the source language. It also imposes on tokens that are permitted by the specification for the source language. It also imposes on tokens a tree-like structure that is used by the subsequent phases of the compiler. ii) Secondly, it makes explicit the hierarchical structure of the incoming token stream by identifying which parts of the token stream should be grouped.
Phase 3: Intermediate Code Generation