The lexical analysis phase reads the characters in the source program and groups them into streams of tokens; each token represents a logically cohesive sequence of characters, such as identifiers, operators, and keywords. These usual tokens are keywords such as DO or IF, identifier , such as X or NUM , operator symbols such as <= or + , and punctuation symbols such as parentheses . The character sequence that forms a token is called a "lexeme". The output of the lexical analyzer is a stream of tokens which is passed to the next phase , the syntax analyzer or parser .
The lexical analyzer is the interface between the source program and the compiler . what is called tokens depends on the language at the hand and , on some extent on the discretion of the compiler designer . There are two kinds of tokens:
Specific such as IF or a semicolon and , and classes of strings such as identifiers constants labels. We shall treat a token as a pair consisting of two parts : a token type and a token value. A token consisting of specific string such as semicolon well be treated as having a type but no value . A token such as the identifier MAX has a type "identifier" and a value consisting of the string MAX.
The lexical analyzer and the next phase , parser , are often grouped together into one pass.. in that pass , the lexical analyzer operates either under the control of the parser or as a co routine with the parser . the parser asks the lexical analyzer for the next token , whenever the parser needs one. The lexical analyzer returns to the parser a code for the token that it found. In the case that the token is an identifier or another token with a value , the value is also passed to the parser