Essentials of a Serial Compilers

Compiler is a program which converts a program written in one of the High Level Languages to machine understandable instructions. A compiler is different from an intrepreter in that interpreter simulates program execution for programs written in source language. An interpreter reads and executes instructions given by a program one after another.

Compiler for a single processor computer consists of front end that analyses the source program and translates it into intermediate representation, and a back end that translates intermediate representation to computer understandable instructions. Various optimizations may be applied to minimize memory requirements and run time of the resulting program.

Various sections of serial compiler are :

Lexical analyzer Purpose of lexical analyser is to partition input text into symbols and terminals specified by the grammar. Lex is used to develop lexical analyser. The lexecal analyser groups input stream (of characters) into a stream of tokens (lexeme) and constructs a symbol table which is used later for contextual analysis

lex is popularly used tool for development of lexical analyser. Grammer of target language is given input to lex.

Syntax analyses Syntax analysis phase involves checking whether the input adheres to rules specified by the grammer. A finite automate is developed usually by yacc and input is simulated over this finite automata.

Context-free grammars are used to define the program structure recognized by a parser. The parser is implemented as a push-down automata. Yacc and Bison are tools for generating bottom-up parsers in C.

Intermediate code generation If no syntax errors are found in the syntax analysis phase, Intermediate code is generated which is compiler specific. A well designed intermediate representation facilitates the independence of the analysis and syntheses (front- and back-end) phases. Intermedate representations may be

  • assembly language like or
  • be an abstract syntax tree

Code optimization : In order to reduce run-time memory and time consumption, a lot of optimizations can be applied to the intermediate code. The idea here is to reduce the size of the intermediate representation of the parsed program. There are various optimization techniques like :

  • constant reduction
  • loop optimizations
  • induction varianble elimination
  • common sub-expression elimination
  • mathematical optimizations

Code generation : Intermediate code is converted to native code. The native code may be an actual executable binary, assembly code or another high-level language. Producing low-level code requires familiarity with such machine level issues such as

  • data handling
  • machine instruction syntax
  • variable allocation
  • program layout
  • registers
  • instruction set

The code generator may be integrated with the parser

Table management : A program in most modern day languages requires symbol table management. Symbols and their relevant information appearing in various phases may be useful in later stages of compilation. Thesymbol table is used to store the names encountered in the source program and relavant attributes. The information in the symbol table is used by the semantic checker when applying the context-senitive rules and by the code generator.

Error handling : Reporting errors that might appear during lexical analysis or syntax analysis.

*first four form front end next two back end and last two run parallel with the rest.