Creating a function using LLVM IRBuilder

Tips:

Code snippets are shown in one of three ways throughout this environment:

Code that looks like this is sample code snippets that is usually part of an explanation.
Code that appears in box like the one below can be clicked on and it will automatically be typed in to the appropriate terminal window:
```
vim readme.txt
```
Code appearing in windows like the one below is code that you should type in yourself. Usually there will be a unique ID or other bit your need to enter which we cannot supply. Items appearing in <> are the pieces you should substitute based on the instructions.
```
Add your name here - <name>
```

1. Overview

The goal of this tutorial is to learn how to use IRBuilder to create various LLVM IR objects. We assume you have already taken the previous more fundamental tutorials:

2. Test Clang/LLVM

Clang/LLVM has already been installed in the the docker-based online terminal on the right panel.

To test clang, try the following command line:

clang --version

You should see the version information after the commands above.

3. Obtain Example Source Files

git clone https://github.com/freeCompilerCamp/code-for-llvm-tutorials.git
cd code-for-llvm-tutorials/first-function
ls

This git repository contains:

mul_add.c: an example program showing a function to be built,
tut1.cpp: the LLVM source file to build LLVM IR for the function,
makefile: makefile with targets to dump LLVM IR and build the program.

4. Look Into the Function to be Built

You can use cat to look into the source file with a simple function:

cat mul_add.c

You should see the following content:

int mul_add(int x, int y, int z) {
return x * y + z;
}

This source file contains a simple mul_add function to perform a mutliply-add operation using three parameters x, y, and z. In order to build the LLVM IR representing this function, we can using clang to dump out its LLVM IR as a reference.

Type the following command line to generate mul_add.ll, the text output of LLVM IR of mul_add.c:

make mul_add.ll

You should see the following screen output:

clang -S -O3 -emit-llvm mul_add.c

The command line above will generate optimized version of LLVM IR, which is easier to understand.

Now let’s look at the text output of the function’s LLVM IR:

cat mul_add.ll

You should see the following screen output (excerpt for the function only):

; Function Attrs: norecurse nounwind readnone uwtable
define dso_local i32 @mul_add(i32 %0, i32 %1, i32 %2) local_unnamed_addr #0 {
 %4 = mul nsw i32 %1, %0
 %5 = add nsw i32 %4, %2
 ret i32 %5
}

If you have finished Getting Familar with LLVM IR, you can easily understand the IR above, with the help from LLVM Language Reference Manual. To simplify this tutorial, we will build add and mul instructions, ignoring nsw (“No Signed Wrap”). nsw is used to indicate the result value of the instructions is a poison value if signed overflow occurs.

5. Look Into the LLVM Program

Now let’s look at the program building a module and a function.

vim tut1.cpp

You should see the following screen output (excerpt for the function only):

#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/PassManager.h"
#include "llvm/IR/CallingConv.h"
#include "llvm/IR/Verifier.h"
#include "llvm/IR/IRPrintingPasses.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Bitcode/BitcodeWriter.h"
#include <stdio.h>

using namespace llvm;

Module *makeLLVMModule(LLVMContext &Context);

int main(int argc, char **argv)
{
 LLVMContext Context;
 Module *Mod = makeLLVMModule(Context);

 raw_fd_ostream r(fileno(stdout), false);
 verifyModule(*Mod, &r);

 //Prints the module IR
 ModulePass *m = createPrintModulePass(outs(), "Module IR printer");
 legacy::PassManager PM;
 PM.add(m);
 PM.run(*Mod);

  // Write IR to a bitcode file
 FILE* mul_add_file = fopen("mul_add.bc", "w+");
 raw_fd_ostream bitcodeWriter(fileno(mul_add_file), true);
 WriteBitcodeToFile(*Mod, bitcodeWriter);

 delete Mod;
 return 0;
}

Module *makeLLVMModule(LLVMContext &Context)
{
 Module *mod = new Module("mul_add", Context);

 FunctionCallee mul_add_fun = mod->getOrInsertFunction("mul_add",
     Type::getInt32Ty(Context),
     Type::getInt32Ty(Context),
     Type::getInt32Ty(Context),
     Type::getInt32Ty(Context));
 Function *mul_add = cast<Function> (mul_add_fun.getCallee());

 mul_add->setCallingConv(CallingConv::C);
 Function::arg_iterator args = mul_add->arg_begin();
 Value *x = args++;
 x->setName("x");
 Value *y = args++;
 y->setName("y");
 Value *z = args++;
 z->setName("z");

 BasicBlock *block = BasicBlock::Create(Context, "entry", mul_add);
 IRBuilder<> builder(block);
 Value *tmp = builder.CreateBinOp(Instruction::Mul, x, y, "tmp");
 Value *tmp2 = builder.CreateBinOp(Instruction::Add, tmp, z, "tmp2");
 builder.CreateRet(tmp2);

 return mod;
}

Line 1 through 12 of the program contains the appropriate LLVM header files.

We declare a makeLLVMModule() function (line 16), which will do the real work of creating the module.

Inside of the main function: the first segment is pretty simple: it creates an LLVM “module” (line 21). In LLVM, a module represents a single unit of code that is to be processed together. A module contains things like global variables, function declarations, and implementations.

Line 24 runs the LLVM module verifier on our newly created module. While this probably isn’t really necessary for a simple module like this one, it’s always a good idea, especially if you’re generating LLVM IR based on some input. The verifier will print an error message if your LLVM module is malformed in any way.

Next, Line 27 through 30 instantiate an LLVM PassManager and run the PrintModulePass on our module. LLVM uses an explicit pass infrastructure to manage optimizations and various other things. A PassManager, as should be obvious from its name, manages passes: it is responsible for scheduling them, invoking them, and ensuring the proper disposal after we’re done with them. For this example, we’re just using a trivial pass that prints out our module in textual form.

 //Prints the module IR
 ModulePass *m = createPrintModulePass(outs(), "Module IR printer");
 legacy::PassManager PM;
 PM.add(m);
 PM.run(*Mod);

Finally, we write the created module containing the function into a bitcode file named mul_add.bc at line from 33 through 35.

  // Write IR to a bitcode file
 FILE* mul_add_file = fopen("mul_add.bc", "w+");
 raw_fd_ostream bitcodeWriter(fileno(mul_add_file), true);
 WriteBitcodeToFile(*Mod, bitcodeWriter);

Now onto the interesting part: creating and populating a module inside makeLLVMModule():

Line 43 creates a new Module object.
Line 45 through 49 construct the function by calling getOrInsertFunction() on our module, passing in the name, return type, and argument types of the function. In the case of our mul_add function, that means one 32-bit integer for the return value and three 32-bit integers for the arguments.

 FunctionCallee mul_add_fun = mod->getOrInsertFunction("mul_add",
     Type::getInt32Ty(Context),
     Type::getInt32Ty(Context),
     Type::getInt32Ty(Context),
     Type::getInt32Ty(Context));
 Function *mul_add = cast<Function> (mul_add_fun.getCallee());

The details of all classes and member functions of LLVM can be found at https://llvm.org/doxygen/index.html . For example, https://llvm.org/doxygen/classllvm_1_1Module.html lists documentation about LLVM::Module, including getOrInsertFunction().

Module::getOrInsertFunction() looks up the specified function in the module symbol table. There are several possibilities:

If it does not exist, add a prototype for the function and return it.
Otherwise, if the existing function has the correct prototype, return the existing function.
Finally, the function exists but has the wrong prototype: return the function with a constantexpr cast to the right prototype. In all cases, the returned value is a FunctionCallee wrapper around the ‘FunctionType T’ passed in, as well as a ‘Value’ either of the Function or the bitcast to the function. So at line 50, we get the callee of mul_add_fun and cast it to a pointer to Function.

Line 52 sets the calling convention for our new function to be the C calling convention. This isn’t strictly necessary, but it ensures that our new function will interoperate properly with C code.

The following code segment gives names to the parameters. This also isn’t strictly necessary (LLVM will generate names for them if you don’t specify them), but it’ll make looking at our output somewhat more pleasant. To name the parameters, we iterate over the arguments of our function and call setName() on them. We’ll also keep the pointer to x, y, and z around, since we’ll need them when we get around to creating instructions.

 Function::arg_iterator args = mul_add->arg_begin();
 Value *x = args++;
 x->setName("x");
 Value *y = args++;
 y->setName("y");
 Value *z = args++;
 z->setName("z");

So far, we have created a function with a parameter list. The next step is to create its body populated with some instructions. The LLVM IR, being an abstract assembly language, represents control flow using jumps (we call them branches), both conditional and unconditional. The straight-line sequences of code between branches are called basic blocks, or just blocks. To create a body for our function, we fill it with blocks:

We create a new basic block at line 61 by calling its constructor. All we need to tell it is its name and the function to which it belongs. In addition, we’re creating an IRBuilder object, which is a convenience interface for creating instructions and appending them to the end of a block. Instructions can be created through their constructors as well, but some of their interfaces are quite complicated. Unless you need a lot of control, using IRBuilder will make your life simpler.

 BasicBlock *block = BasicBlock::Create(Context, "entry", mul_add);
 IRBuilder<> builder(block);
 Value *tmp = builder.CreateBinOp(Instruction::Mul, x, y, "tmp");
 Value *tmp2 = builder.CreateBinOp(Instruction::Add, tmp, z, "tmp2");
 builder.CreateRet(tmp2);

The final step in creating our function is to create the instructions that make it up. Our mul_add function is composed of just three instructions: a multiply, an add, and a return. IRBuilder gives us a simple interface for constructing these instructions and appending them to the “entry” block. Each of the calls to IRBuilder returns a Value* that represents the value yielded by the instruction. You’ll also notice that, above, x, y, and z are also Value’s, so it’s clear that instructions operate on Value’s.

6. Build and Test the Program

This tutorial’s sourcetree lives independent from LLVM.
It uses a makefile to build the executable, using an installed copy of LLVM. To build the executable, type the following command line:

make tut1

You should see the following output:

clang++ -g tut1.cpp `llvm-config --cxxflags --ldflags --libs core BitWriter --system-libs` -lpthread -o tut1

clang++ is used to compile tut1.cpp, using cxxflags and ldflags exposed by llvm-config. Additionally, core and BitWriter libraries are used since the code also uses BitWriter to write out the created module into a bitcode file (linei 33-35). Pthreads is also needed for BitWriter.

Finally, we run the program:

./tut1

You should see the following content:

Module IR printer
; ModuleID = 'mul_add'
source_filename = "mul_add"

define i32 @mul_add(i32 %x, i32 %y, i32 %z) {
entry:
  %tmp = mul i32 %x, %y
  %tmp2 = add i32 %tmp, %z
  ret i32 %tmp2
}

The module IR is extactly what we want to create. Additionally, there is a mul_add.bc file created under the current path. You can use llvm-dis to convert it to a text file and check its content. It should be the same as the text output we just saw above.

rm -rf mul_add.ll 
llvm-dis mul_add.bc
cat mul_add.ll

7. References

This tutorial is based on the content from https://releases.llvm.org/2.6/docs/tutorial/JITTutorial1.html.

Source file for this page: link