Lexical Analysis
1
What is Lexical Analysis?
Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical analyzer doesn’t return a list of tokens at one shot, it returns a token when the parser asks a token from it.
source program
lexical analyzer
token
parser get next token
symbol table
2
The lexical analyzer deals with small-scale language constructs, such as names and numeric literals. The syntax analyzer deals with the largescale constructs, such as expressions, statements, and program units. The syntax analysis portion consists of two parts:
1. A low-level part called a lexical analyzer (essentially a pattern matcher). 2. A high-level part called a syntax analyzer, or parser.
The lexical analyzer collects characters into logical groupings and assigns internal codes to the groupings according to their structure.
3
Lexical Analyzer in Perspective
LEXICAL ANALYZER
Scan Input Remove white space, … Identify Tokens Create Symbol Table Generate Errors Send Tokens to Parser
PARSER
Perform Syntax Analysis Actions Dictated by Token Order Update Symbol Table Entries Create Abstract Rep. of Source Generate Errors
4
Lexical analyzers extract lexemes from a given input string and produce the corresponding tokens. Sum = oldsum – value /100; Token
IDENT ASSIGN_OP IDENT SUBTRACT_OP IDENT DIVISION_OP INT_LIT SEMICOLON
5
Lexeme sum = oldsum value / 100 ;
Basic Terminology
What are Major Terms for Lexical Analysis?
TOKEN
A classification for a common set of strings Examples Include , , etc.
PATTERN
The rules which characterize the set of strings for a token
LEXEME
Actual sequence of characters that matches pattern and is classified by a token Identifiers: x, count, name, etc…
6
Basic Terminology
Token
const if relation id num literal if , >= pi, count, D2 3.1416, 0, 6.02E23 “core dumped”
Sample Lexemes const if
Informal Description of