Finally, instrument the file.  Run the binary created by fit.  So
"my_file -o binary.fit binary"
Would create a new instrumented file called "binary.fit" that is an 
instrumented version of "binary" using the "my_file" instrumentation 
routines.
You will need all of the generated .o files around from the build of
binary when instrumenting.
WARNING!  Fit takes a _lot_ of memory.  At least 512MB.  And 
instrumentation can take a long time, especially on older machines.
Various Hints
If generating a trace to be used once which a tool processes and 
generates output for, instead you can create a named fifo and have
the instrumented file and the tool run at the same time and not need
any disk space to hold a large trace.
To do this use the "mkfifo" command to create a named fifo, and then
point both the tool and the instrumented file at it.
Instrumenting malloc and friends
Instrumenting something like malloc is a bit non-trivial.  You probably
want to gather the parameters being passes and the end result.
The best way I've found to do this is the following.  It will work on
getting parameters and results for any function, not just malloc, calloc,
free, etc.  It's just that's what I needed it for.
In the instrumentation file add the following:
     AddCallProto("BeforeCalloc(REGV)");
     AddCallProto("AfterCalloc(REGV)");
	     
     Proc *proc;
    
     proc=NULL;
     proc=GetNamedProc("__libc_calloc");
     if (proc!=NULL) {
	AddCallProc(proc,ProcBefore,"BeforeCalloc",REG_I386_ESP);
	AddCallProc(proc,ProcAfter,"AfterCalloc",REG_I386_EAX);
     }
				  
Note that we get the procedure "__libc_calloc" instead of just calloc.
This is because with FIT the c-library is statically linked to the
executable and this is what the procedure ends up being called.
Also note that we pass the stack pointer along to the ProcBefore,
this is because parameters are passed on the stack on x86.
Again note that we pass the value of EAX to the ProcAfter, this is
because the return value of the function ends up in EAX.
In the analysis code we have this:
void BeforeCalloc(int esp) {
      
   int *pointer;
	  
   pointer=(int *)esp;
	     
   fprintf(fff,"Calloc: num=%i size=%i\n",*(pointer+1),*(pointer+2));
	        
}
		 
void AfterCalloc(int address) {
		    
   fprintf(fff,"Calloc: address=0x%x\n",address);
		       
}
AfterCalloc should be fairly self-explanatory.  BeforeCalloc we
have to cast the stack pointer to a pointer, and then the parameters
we want are +1 and +2 (the return address is at +0).