Using FIT (the Flexible Instrumentation Toolkit)

by Vince Weaver (vince _at_ csl.cornell.edu)

Background

FIT is an Atom like tool used for Binary instrumentation developed at Ghent University in Belgium.

It aims to be atom-compatible, so if you are familiar with atom moving to FIT should be fairly easy. The main problem is stripping your atom code of unsupported features.

Limitations

FIT has many improvements over atom, see the paper on their web-site. Besides being more accurate in many ways, it also is open-source and will eventually run on many platforms. Currently x86-Linux and ARM-Linux are best supported.

Unlike atom, for FIT you need to use a custom toolchain (available at their website). Also you need to have all of the object files available for the binaries you are instrumenting. This means that in general you can only instrument files for which you have the source code.

Installing

Note, if you are running fit on sampaka this is not needed. It's already installed there

This describes setting up a generic version of FIT. In order to run cache_stat you need a version with my specific patches applied. Those _should_ be included in fit-0.2. If you need to use the tool and fit-0.2 is not out yet, then e-mail me and I'll include the patches in this distribution.

The fit homepage is located here: http://www.elis.ugent.be/fit/index.html

Download the modified toolchain linked off the page. It's a 100+MB file. Install it somewhere but note the location. You can patch and compile a toolchain yourself, but that's a lot harder than using the pre-supplied one.

Download the most recent version of fit. As noted above, you might need to apply some patches.

Modify the fit Makefile to have ARCH be i386 and CROSSCOMPILER point to the location of gcc in the fit toolchain.

Run "make" and it should create the "fit" binary for you.

Sample Instrumenting

First, compile the program to be instrumented with the fit modified toolchain. On sampaka this is /opt/tools/fit/toolchain-i386-glibc/bin/gcc

For the final linking step, you'll need to have the options
-static -Wl,-Map,smg2000.map
-static means we have a static binary (fit can't handle dyamic ones). The -Map file creates a file needed by fit. Name it so that it matches your filename (so don't call it smg2000.map like above unless you are instrumenting smg2000).

Now that you have your file to be instrumented, you need to do the actual instrumentation. You'll need an instrumentation file and an analysis file. These files are exactly like the ones you create for atom.

Name the files my_file.inst.c and my_file.anal.c

Finally, instrument the file. Run the binary created by fit. So
"my_file -o binary.fit binary"
Would create a new instrumented file called "binary.fit" that is an instrumented version of "binary" using the "my_file" instrumentation routines.

You will need all of the generated .o files around from the build of binary when instrumenting.

WARNING! Fit takes a _lot_ of memory. At least 512MB. And instrumentation can take a long time, especially on older machines.

Various Hints

If generating a trace to be used once which a tool processes and generates output for, instead you can create a named fifo and have the instrumented file and the tool run at the same time and not need any disk space to hold a large trace. To do this use the "mkfifo" command to create a named fifo, and then point both the tool and the instrumented file at it.

Instrumenting malloc and friends

Instrumenting something like malloc is a bit non-trivial. You probably want to gather the parameters being passes and the end result.

The best way I've found to do this is the following. It will work on getting parameters and results for any function, not just malloc, calloc, free, etc. It's just that's what I needed it for.

In the instrumentation file add the following:
     AddCallProto("BeforeCalloc(REGV)");
     AddCallProto("AfterCalloc(REGV)");
	     

     Proc *proc;
    
     proc=NULL;
     proc=GetNamedProc("__libc_calloc");
     if (proc!=NULL) {
	AddCallProc(proc,ProcBefore,"BeforeCalloc",REG_I386_ESP);
	AddCallProc(proc,ProcAfter,"AfterCalloc",REG_I386_EAX);
     }

Note that we get the procedure "__libc_calloc" instead of just calloc. This is because with FIT the c-library is statically linked to the executable and this is what the procedure ends up being called.
Also note that we pass the stack pointer along to the ProcBefore, this is because parameters are passed on the stack on x86.
Again note that we pass the value of EAX to the ProcAfter, this is because the return value of the function ends up in EAX.

In the analysis code we have this:
void BeforeCalloc(int esp) {
      
   int *pointer;
	  
   pointer=(int *)esp;
	     
   fprintf(fff,"Calloc: num=%i size=%i\n",*(pointer+1),*(pointer+2));
	        
}
		 
void AfterCalloc(int address) {
		    
   fprintf(fff,"Calloc: address=0x%x\n",address);
		       
}

AfterCalloc should be fairly self-explanatory. BeforeCalloc we have to cast the stack pointer to a pointer, and then the parameters we want are +1 and +2 (the return address is at +0).