qemu-trace -- a tool to generate traces for cache simulators

	   by:
	        Vince Weaver    vince _at_ csl.cornell.edu
	   
           instruction tracing based on some code posted by:
                Stuart Brady    sdbrady _at_ ntlworld.com

Note
~~~~
   This patch only works for the qemu-mips target (for running
   MIPS binaries).  Using the patch it should be straightforward
   to see what changes need to be made to get similar output
   for other architectures.

Background
~~~~~~~~~~
   Qemu (http://fabrice.bellard.free.fr/qemu/) is a simulator that
   dynamically translates programs on the fly while executing them.
   
   Qemu is primarily used for full-system simulation, but it also
   can do user-space simulation by translating system calls.
   
   While not overtly designed for computer architecture simulations,
   the code can be patched to enable useful results (usually with
   a performance penalty).
   
   In our case, this patch modifies Qemu so it adds code that
   generates Dinero IV (http://pages.cs.wisc.edu/~markhill/DineroIV/)
   cache trace information.
   
 
Compiling
~~~~~~~~~
   + Obtain the source from the Qemu website.  The code here was
     made against the qemu-snapshot-2007-08-14_05 tree, but
     it should work with the 0.9.0 release.
     
   + Build qemu according to the directions that come with it.
   
   + Enter into the qemu directory, and patch the source code:
       patch -p0 < qemu-mips-dinero.patch
     Where qemu-mips-dinero.patch is the patch file that came with
     this README.  You might want to try the "--dry-run" option
     to patch first to make sure everything looks good before
     applying it.
     
   + Now run "make" and in the ./mips-linux-user directory
     should be a file "qemu-mips" that is patched to generate the output.
     
Using
~~~~~

   + You will need statically linked 32-bit big-ending Linux MIPS 
     binaries.  You can generate these on a real Linux-MIPS machine,
     or else with a cross-compiler (setting up a cross compiling
     environment is a bit beyond the scope of this document).
     
     Here is a sample compile command-line on a MIPS machine:
       gcc -O2 -Wall -static -o hello.mips hello.c
       
     Note optimizations or on, you'll probably want this because
       gcc w/o optimizations generates a lot of extreneous
       load and store instructions.
       
   + Once you have your MIPS executable (you can use the "file" command 
     to make sure it is MIPS, ie "file hello.mips") you can test it by 
     running an unmodified qemu on it.
     
     You generate an unmodified version of qemu by building from source
     as before, just without the patch applied.  If all goes well,
     running
             ./qemu-mips hello.mips
     Should generate the output you expect.
   
   + Now that you know your program works, run the modified qemu on it.
     In our example, we have renamed "qemu-mips" the be called
     "qemu-mips-cachetrace" to avoid confusion with standard qemu-mips.
     
        ./qemu-mips-cachetrace hello.mips
	
     This will generate a file called "trace.mem" in your current
     directory that has a dinero-format memory trace of all data
     accesses in your program.

     Right now the output file is hard coded to be "trace.mem".  I know
     this isn't very user friendly, but this whole project is currently
     just a quick hack.
    
     Be sure to rename "trace.mem" before tracing another program, or
     the file will get over-written!
    
     Now that you have "trace.mem", you can run dinero on it.
        ./dineroIV -l1-isize 8k -l1-dsize 8k -l1-ibsize 32 
	           -l1-dbsize 32 < trace.mem
       
Things to Note
~~~~~~~~~~~~~~
  + The traces can be *huge* if you are tracing a long program.  
    Also, right now it might be impossible to generate a trace file > 2GB.
    You might want to compress the trace files once you have them:
        gzip trace.mem
    And you can then run them into dinero like this:
        zcat trace.mem.gz | dineroIV -l1-isize 8k ...
	
  + the qemu-mips-cachetrace program is only works with 32-bit
    big-endian binaries for now.  Also it might not handle
    un-alligned loads ("lwl") and ll/sc locking instructions properly.
    

Newest Version
~~~~~~~~~~~~~~
  The newest version of this utility should always be available here:
     http://www.csl.cornell.edu/~vince/projects/qemu-trace/


-- vmw  16 August 2007