Modifying LLVM IR

Tips:

Code snippets are shown in one of three ways throughout this environment:

Code that looks like this is sample code snippets that is usually part of an explanation.
Code that appears in box like the one below can be clicked on and it will automatically be typed in to the appropriate terminal window:
```
vim readme.txt
```
Code appearing in windows like the one below is code that you should type in yourself. Usually there will be a unique ID or other bit your need to enter which we cannot supply. Items appearing in <> are the pieces you should substitute based on the instructions.
```
Add your name here - <name>
```

1. Overview

The goal of this tutorial is to learn how to use IRBuilder to modify LLVM IR using a simple example program. We assume you have already taken the previous more fundamental tutorials:

2. Test Clang and LLVM

Clang and LLVM have already been installed in the the docker-based online terminal on the right panel.

To test clang and llvm’s optimizer, try the following command lines:

clang --version

and

opt --version

You should see the version information after the commands above.

3. Obtain Example Source Files

git clone --single-branch --branch mutate https://github.com/chunhualiao/llvm-pass-skeleton.git
cd llvm-pass-skeleton/

This git repository contains:

An example skeleton LLVM pass to find any binary operators and replace them with a multiply operator
CMakeList.text to build the program
Example programs as the input of the LLVM pass

4. Look Into the Source File

You can use vim to look into the source

vim skeleton/Skeleton.cpp

You should see the following content:

#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/InstrTypes.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
using namespace llvm;

namespace {
 struct SkeletonPass : public FunctionPass {
   static char ID;
   SkeletonPass() : FunctionPass(ID) {}

   virtual bool runOnFunction(Function &F) {
     for (auto &B : F) {
       for (auto &I : B) {
         if (auto *op = dyn_cast<BinaryOperator>(&I)) {
           // Insert at the point where the instruction `op` appears.
           IRBuilder<> builder(op);

           // Make a multiply with the same operands as `op`.
           Value *lhs = op->getOperand(0);
           Value *rhs = op->getOperand(1);
           Value *mul = builder.CreateMul(lhs, rhs);

           // Everywhere the old instruction was used as an operand, use our
           // new multiply instruction instead.
           for (auto &U : op->uses()) {
             User *user = U.getUser();  // A User is anything with operands.
             user->setOperand(U.getOperandNo(), mul);
           }
// TODO: remove the old instruction, may need to consider the iterator invalidation problem. omitted for brevity  
           // We modified the code.
           return true;
         }
       }
     }

     return false;
   }
 };
}

char SkeletonPass::ID = 0;

// Automatically enable the pass.
// http://adriansampson.net/blog/clangpass.html
static void registerSkeletonPass(const PassManagerBuilder &,
                        legacy::PassManagerBase &PM) {
 PM.add(new SkeletonPass());
}
static RegisterStandardPasses
 RegisterMyPass(PassManagerBuilder::EP_EarlyAsPossible,
                registerSkeletonPass);

This program uses the FunctionPass as the base class to get access to its member function runOnFunction() as shown in line 16. Inside of this function, a nested loop is used to iterate over all instructions within all basic blocks (line 17 and 18).

Line 19 uses dynamic casting to check if an instruction is a binary operator. If yes, IRBuilder is used to specify the insertion point (the binary instruction) and to create a multiply instruction using the existing two operands (line 21 through 26).

Once the new instruction is created, another loop (line 30) is used to find all places using the original binary instruction. And for each user, we reset its operand to the newly created instruction (line 32).

For brevilty, the program does not implement the removal of the replaced binary operation.

5. Build the LLVM Pass

This tutorial’s sourcetree lives independent from LLVM. It uses CMake build system’s support for exporting LLVM libraries as importable CMake targets. Essentially, it has a build system using an installed copy of LLVM.

Two CMakeLists.txt files are used. The first one is located at the top level project directory.

cat CMakeLists.txt

You should see the following content:

cmake_minimum_required(VERSION 3.1)
project(Skeleton)

# support C++14 features used by LLVM 10.0.0
set(CMAKE_CXX_STANDARD 14)

find_package(LLVM REQUIRED CONFIG)
add_definitions(${LLVM_DEFINITIONS})
include_directories(${LLVM_INCLUDE_DIRS})
link_directories(${LLVM_LIBRARY_DIRS})
11
add_subdirectory(skeleton)  # Use your pass name here.

LLVM is a supported package in CMake. The build system will automatically find the installed clang/llvm and extract definitions related to include and library paths.

The second CMakelists.txt is located in the subfolder skeleton:

cat skeleton/CMakeLists.txt

You should see the following content:

add_library(SkeletonPass MODULE
  # List your source files here.
  Skeleton.cpp
)

# Use C++11 to compile our pass (i.e., supply -std=c++11).
target_compile_features(SkeletonPass PRIVATE cxx_range_for cxx_auto_type)

# LLVM is (typically) built with no C++ RTTI. We need to match that;
# otherwise, we'll get linker errors about missing RTTI data.
set_target_properties(SkeletonPass PROPERTIES
  COMPILE_FLAGS "-fno-rtti"
)
.. rest is omitted

The source file of this pass is compiled as a library (line 1-4). Additional compiler features and flags are specified to compile the source file (line 7 and line 11-13).

Now give it a try to build the pass:

mkdir build
cd build/
cmake ../.
make

You should see the following screen output

Scanning dependencies of target SkeletonPass
[ 50%] Building CXX object skeleton/CMakeFiles/SkeletonPass.dir/Skeleton.cpp.o
[100%] Linking CXX shared module libSkeletonPass.so
[100%] Built target SkeletonPass

6. Run the LLVM Pass

We first test the input program to see its behavior before the LLVM IR modification.

cd ..
cat something.c

You should see the following content:

#include <stdio.h>
int main(int argc, const char** argv) {
  int num=10;
  printf("%i\n", num + 2);
  return 0;
}

Using GCC, we compile and run it.

gcc something.c
./a.out

The execution result should be 12 since line 4 of the code prints out num (10) +2.

We now using clang hooked with the LLVM pass we just built.

clang -Xclang -load -Xclang build/skeleton/libSkeletonPass.so something.c
./a.out

Now the execution result should be 20 since the LLVM pass replaces the binary operator within (10+2) with * , resulting in (10*2).

7. References

The following links are useful for further information:

This tutorial is based on the content from http://www.cs.cornell.edu/~asampson/blog/llvm.html .
https://llvm.org/docs/CMake.html#embedding-llvm-in-your-project : how to build your project using an installed version of LLVM.

Source file for this page: link