'Learn LLVM 12' reading note: Part 2 The structure of a compiler
The structure of a compiler
A compiler:
- Frontend (deal with source language)
- lexer
- parser
- semantic analyzer
- code generator (generate IR)
- Backend
- Target independent optimizatin
- Select instructions
- Target dependent optimization
- emit assembler code or object file
An arithmetic expression language (calc)
An arithmetic expression language:
- example: with a, b: a * (4 + b)
- language design using EBNF grammar
- non_terminal : rules having non-terminal, tokens
- example: calc : ("with" ident ("," ident)* ":")? expr ;
- EBNF grammer defines how lexer and parser should work, but not having semantic info
- lexer: create tokens
void Lexer::next(Token& token);- parser: create AST
class AST {};
class Expr : public AST {};
class BinaryOp : public Expr {};- We create AST visitors to read and modify AST contents.
class ASTVisitor {
public:
virtual void visit(AST&) {};
virtual void visit(Expr&) {};
virtual void visit(BinaryOp&) {};
};- semantic analysis: check semantic rules
- CodeGen
- Add AST visitor to generate IR
- Link with Runtime libraries to create executable
LLVM knowledge
- Use <llvm/Support/CommandLine.h> for cmdline argument parsing. Example:
static cl::opt<std::string>
Input(cl::Positional, cl::desc("<input expression>, default is \"with a, b: 4 + a * (2 + b)\""),
cl::init("with a, b: 4 + a * (2 + b)"));- Common classes for IR generation
| Class | Usage |
|---|---|
| LLVMContext | environment |
| Module | create an object file |
| IRBuilder | create IR instructions (CreateCall, CreateRet, CreateNSWAdd/Sub/Mul/Div), CreateInBoundsGetP |
| Type | value types (getVoidTy, getInt32Ty, getInt8Ty, getPointerTo()) |
| FunctionType | function type |
| Constant | constant value (ConstantInt, ConstantDataArray) |
| GlobalVariable | store global variable |
| Function | function declaration |
| BasicBlock | a list of instructions |
- Add unittest as part of ctest (run with `ninja test`)
- unittests/CMakeLists.txt
cmake_minimum_required(VERSION 3.13.4)
enable_testing()
# set LLVM components we need to link our tool against.
llvm_map_components_to_libnames(llvm_libs support core)
include(CTest)
find_package(GTest REQUIRED)
include_directories(${GTEST_INCLUDE_DIRS})
macro(add_tinylang_unittest name)
add_executable(${name} ${ARGN})
target_link_libraries(${name}
tinylangBasic
${llvm_libs}
${GTEST_LIBRARIES}
${GTEST_MAIN_LIBRARIES}
pthread)
add_test(NAME ${name}
COMMAND ${name})
endmacro()
add_tinylang_unittest(TinyLangTests
VersionTest.cpp
)- Add lit test as part of ctest
- Install lit: `pip install lit`
- Copy FileCheck to llvm install bin dir
- test/CMakeLists.txt
if(NOT EXISTS ${LLVM_TOOLS_BINARY_DIR}/FileCheck)
message(FATAL_ERROR "LLVM wasn't configured with -DLLVM_INSTALL_UTILS, cannot use FileCheck")
endif()
configure_file("${CMAKE_CURRENT_SOURCE_DIR}/lit.cfg.in" "${CMAKE_CURRENT_BINARY_DIR}/lit.cfg")
# provide the test target based on lit (installed by pip install lit)
add_test(NAME runLitTests
COMMAND lit -v "${CMAKE_CURRENT_BINARY_DIR}"
)
#message(FATAL_ERROR "CMAKE_CURRENT_BINARY_DIR=${CMAKE_CURRENT_BINARY_DIR}")
set_property(TEST runLitTests PROPERTY ENVIRONMENT_MODIFICATION
"PATH=path_list_prepend:${LLVM_TOOLS_BINARY_DIR}:${CMAKE_CURRENT_BINARY_DIR}/../tools/driver")- test/lit.cfg.in
import lit.formats
config.name = "Calc tests"
config.test_format = lit.formats.ShTest(True)
config.test_source_root = "@CMAKE_CURRENT_SOURCE_DIR@"
config.suffixes = ['.calc']
config.test_exec_root = "@CMAKE_CURRENT_BINARY_DIR@"- test/lexer.calc
% RUN: tinylang --token "with a, b: 4 + (a * b)" | FileCheck %s
with a, b: 4 + (a * b)
; CHECK: Token:KW_with: with
; CHECK: Token:ident: a
; CHECK: Token:comma: ,
; CHECK: Token:ident: b
; CHECK: Token:colon: :
; CHECK: Token:number: 4
; CHECK: Token:plus: +
; CHECK: Token:l_paren: (
; CHECK: Token:ident: a
; CHECK: Token:star: *
; CHECK: Token:ident: b
; CHECK: Token:r_paren: )
Comments
Post a Comment