'Learn LLVM 12' reading note: Part 2 The structure of a compiler

The structure of a compiler

A compiler:

  • Frontend (deal with source language)
    • lexer
    • parser
    • semantic analyzer
    • code generator (generate IR)
  • Backend
    • Target independent optimizatin
    • Select instructions
    • Target dependent optimization
    • emit assembler code or object file

 

An arithmetic expression language (calc)

An arithmetic expression language:

  •  example: with a, b: a * (4 + b)
  • language design using EBNF grammar
    • non_terminal : rules having non-terminal, tokens
    • example: calc : ("with" ident ("," ident)* ":")? expr ;
    • EBNF grammer defines how lexer and parser should work, but not having semantic info
  • lexer: create tokens
void Lexer::next(Token& token);
  • parser: create AST
class AST {};
class Expr : public AST {};
class BinaryOp : public Expr {};
    • We create AST visitors to read and modify AST contents.
class ASTVisitor {
public:
    virtual void visit(AST&) {};
    virtual void visit(Expr&) {};
    virtual void visit(BinaryOp&) {};
};
  • semantic analysis: check semantic rules
  • CodeGen
    • Add AST visitor to generate IR
    • Link with Runtime libraries to create executable

LLVM knowledge


  • Use <llvm/Support/CommandLine.h> for cmdline argument parsing. Example:
static cl::opt<std::string>
    Input(cl::Positional, cl::desc("<input expression>, default is \"with a, b: 4 + a * (2 + b)\""),
          cl::init("with a, b: 4 + a * (2 + b)"));

  • Common classes for IR generation
ClassUsage
LLVMContextenvironment
Modulecreate an object file
IRBuildercreate IR instructions (CreateCall, CreateRet, CreateNSWAdd/Sub/Mul/Div), CreateInBoundsGetP
Typevalue types (getVoidTy, getInt32Ty, getInt8Ty, getPointerTo())
FunctionTypefunction type
Constantconstant value (ConstantInt, ConstantDataArray)
GlobalVariablestore global variable
Functionfunction declaration
BasicBlocka list of instructions

  • Add unittest as part of ctest (run with `ninja test`)
    • unittests/CMakeLists.txt
cmake_minimum_required(VERSION 3.13.4)
enable_testing()

# set LLVM components we need to link our tool against.
llvm_map_components_to_libnames(llvm_libs support core)

include(CTest)

find_package(GTest REQUIRED)
include_directories(${GTEST_INCLUDE_DIRS})

macro(add_tinylang_unittest name)
    add_executable(${name} ${ARGN})
    target_link_libraries(${name}
        tinylangBasic
        ${llvm_libs}
        ${GTEST_LIBRARIES}
        ${GTEST_MAIN_LIBRARIES}
        pthread)
    add_test(NAME ${name}
        COMMAND ${name})
endmacro()

add_tinylang_unittest(TinyLangTests
    VersionTest.cpp
)
  • Add lit test as part of ctest
    • Install lit: `pip install lit`
    • Copy FileCheck to llvm install bin dir
    • test/CMakeLists.txt
if(NOT EXISTS ${LLVM_TOOLS_BINARY_DIR}/FileCheck)
    message(FATAL_ERROR "LLVM wasn't configured with -DLLVM_INSTALL_UTILS, cannot use FileCheck")
endif()

configure_file("${CMAKE_CURRENT_SOURCE_DIR}/lit.cfg.in" "${CMAKE_CURRENT_BINARY_DIR}/lit.cfg")

# provide the test target based on lit (installed by pip install lit)
add_test(NAME runLitTests
        COMMAND lit -v "${CMAKE_CURRENT_BINARY_DIR}"
)

#message(FATAL_ERROR "CMAKE_CURRENT_BINARY_DIR=${CMAKE_CURRENT_BINARY_DIR}")
set_property(TEST runLitTests PROPERTY ENVIRONMENT_MODIFICATION
    "PATH=path_list_prepend:${LLVM_TOOLS_BINARY_DIR}:${CMAKE_CURRENT_BINARY_DIR}/../tools/driver")
    • test/lit.cfg.in
import lit.formats

config.name = "Calc tests"
config.test_format = lit.formats.ShTest(True)
config.test_source_root = "@CMAKE_CURRENT_SOURCE_DIR@"
config.suffixes = ['.calc']
config.test_exec_root = "@CMAKE_CURRENT_BINARY_DIR@"

    • test/lexer.calc
% RUN: tinylang --token "with a, b: 4 + (a * b)" | FileCheck %s

with a, b: 4 + (a * b)

; CHECK: Token:KW_with: with
; CHECK: Token:ident: a
; CHECK: Token:comma: ,
; CHECK: Token:ident: b
; CHECK: Token:colon: :
; CHECK: Token:number: 4
; CHECK: Token:plus: +
; CHECK: Token:l_paren: (
; CHECK: Token:ident: a
; CHECK: Token:star: *
; CHECK: Token:ident: b
; CHECK: Token:r_paren: )

Comments

Popular posts from this blog

'Feynman's study method' reading note: Part 1 The essence of learning

‘Wheeled Autonomous Mobile Robot Programming in Practice’ reading note: Part 1 Wheeled Robot Basics

'Prompt Engineering for ChatGPT' course note: Part 2 What are Prompts