RPrism: Efficient Regression Analysis
Using View-Based Trace Differencing
Kevin Hoffman, Patrick Eugster, Suresh Jagannathan

RPrism is a dynamic tracing and analysis framework built upon our view-based model of execution traces. It can trace the execution of any Java program, recording execution events (such as method calls and field accesses) at a semantic level. Optionally dynamic state can be preserved in the traces.

As opposed to simply recording a linear record of execution events, it captures the execution of the program from several different viewpoints simultaneously and then forms links between views that observe the same execution event at the same time. This allows analysis algorithms to traverse the execution of a program from many different semantic perspectives

We have currently implemented one analysis algorithm into RPrism that calculates a set of semantic differences between captured execution traces. When this algorithm is applied to execution traces between non-regressing and regression program versions and test cases, one can find the likely root cause of regressions with few or no false positives.

RPrism is short for 'regression prism.' Just as a prism splits white light into a spectrum of colors, so RPrism splits execution traces into many different views. The additional perspective offered by this 'spectrum' of information allows the tool to discover the likely root cause of regressions.

First, install AspectJ 5 and make sure the aj5 script is in your path. Then, download SSSDynTracer.jar and sssdyntracer and place somewhere in your path.

To use, instead of using the java command to start a program, use the sssdyntracer command. You should be sure to explicitly set the classpath using the -cp command line option. Before you put the name of the class containing the main method, you should put options that affect weaving and trace segmentation. You can get help on these options by running 'sssdyntracer -help'. The script is used as follows:

sssdyntracer [--tsegment=(pat)]* [--dynt-exclude=(pat)]* [--dynt-include=(pat)]*
                                          [--weave-verbose] (arguments for java)
  Invokes java and integrates SSSDynTracer using AspectJ 5 load time weaving

  --tsegment=(pointcut)    A pointcut that designates a trace segment. Specify at least one.
                           For example: --tsegment="execution(* *.main(..))"
  --dynt-exclude=(pat)     Allows you to exclude certain pieces of code from tracer
  --dynt-include=(pat)     Allows you to include certain pieces of code for tracer
  --weave-verbose          The load time weaver will show debugging messages

A (pat) matches a "within" AspectJ 5 pattern.
If you use --dynt-include then only those classes matching your pattern will be
included. You should always include your class containing main in the list of
include patterns, or the trace output will not be written.

The only required argument is the --tsegment option, which should indicate the method executions or calls where a trace should begin/end. For example, --tsegment="* *.main(..)" would indicate to start the trace when the main method starts and end when the main method ends.

The dynamic tracing itself can be adjusted by specifying some options *after* the main method classname. These options are as follows:

=================================== DYNTRACER ========================================
DynTracer aspect usage:
  <program> --dynt-testcaseid <caseID> --dynt-runid <runID> [--dynt-prev-runid <prevRunID>]
                    [--dynt-outdir <outdir>] [--dynt-cslen <int>]
                    [--dynt-flatten-aspects <yes|no>] [--dynt-maxrunsegs <int>]
                    <program args>

  dynt-testcaseid:  The ID of the test case. Use to distinguish between versions of the program.
  dynt-runid:       The run ID. Use to distinguish between versions of the program.
  dynt-prev-runid:  The previous run ID. When specified, a .map will be generated.
  dynt-outdir:      The output directory to store trace files. Defaults to
                    the <current directory>/dynt directory.
  dynt-cslen:       Specify the maximum call string length for analysis context.
  dynt-flatten-aspects: Whether or not to ignore aspect-related stack frames when
                    searching for a parent container for trace points.
  dynt-maxrunsegs:  Maximum number of times to run each trace segment. Defaults to 10.

The output of the trace will by default be put into the dynt directory off of the current directory. Unless you change the runid or testcaseid using the --dynt-testcaseid or --dynt-runid options, then you should delete the dynt directory before starting the program or it will contain merged, incomplete, or incorrect trace information.

The analysis program will accept as input trace data a path to a directory containing the full view trace data described above. Alternatively, you can jar this directory into a single file and use the JAR file directly as input for analysis.

Putting it all together, tracing a program looks something like:

#(for programs with a main function)
rm -Rf dynt; sssdyntracer --tsegment="execution(* *.main(..))" --dynt-include="*" -cp . \
                    mypackage.MyMainClass --dynt-runid 1 <other program arguments>
#(for JUnit test cases -- you may have to adjust the tsegment pointcut)
rm -Rf dynt; sssdyntracer --tsegment="execution(* junit.framework.TestCase+.test*(..))" --dynt-include="*" \
                    -cp .:junit.jar testpackage.TestDriver --dynt-runid 1 <other program arguments>

Download the following file to your desktop, then double click the file to start the application:
Start SSS Trace Browser.

IMPORTANT: If you don't know how to access the text console for your Java Web Start apps, then you'll need to start the application from the command line or you won't be able to see the output of the FIM. To do this, open a console and execute the following command:

javaws http://www.kevinjhoffman.com/dynt/SSSTraceBrowser.jnlp

Alternatively, you can download the command line version here (now the preferred way of running the program). Simply untar into a directory and run the sssanalyzer script. Running the script without arguments will show you the usage. The tool has the ability to run the MAFIA frequent itemset miner on the output; however, this data is currently not needed to diagnose the root cause of regressions. If you do not want to use mafia, give it the path to /bin/true instead of the path to the mafia binary.

In addition to the path to mafia and the directories/JAR-files containing the old and new traces, the script accepts several options that affect the tracing differencing algorithm. You can tell it to ignore dynamic state information, adjust window sizes for the view-based differencing, and tell it to use a finer granularity when working with runs of differences (useful if initial analysis doesn't produce any likely differences marked as causes, it can apply a much finer granularity and use trace words instead of runs of differences). The script can also be told to use the real LCS algorithm instead of the view-based differencing algorithm (WARNING: using this option on long execution traces requires a LONG time to execute).

Two very important options are the --exclude-diffsets and --require-diffsets options. These options are used to tell the analysis to first subtract previously generated sets of differences (--exclude-diffsets option) or to intersect the calculated difference set with another saved set of differences (--require-diffsets). Using these operations as building blocks, the procedure for calculating the root cause of regressions is as follows:

  1. Prepare traces for the following runs of the program:
    1. Non-regressing version, non-regressing test case
    2. Non-regressing version, regressing test case
    3. Regressing version, non-regressing test case
    4. Regressing version, regressing test case
  2. D1. Compute differences between two versions of the program for non-regressing test case.
  3. D2. Compute differences between non-regressing and regressing test cases for the regressing version of the program.
  4. D3. Compute differences between two versions of the program for regressing test case. Use --exclude-diffsets and give it the out-diffsets.txt file from D1. Use --require-diffsets and give it the out-diffsets.txt file from D2.
  5. Inspect the out-diffsets.txt, out-sequences.txt from D3 to see the differences that are most likely to be the cause of the regression. These differences are also marked in the old-thread-* and new-thread-* text files that contain exactly where these causes were manifest in the execution of the program (including any dynamic state). Search for the text '==== DIFF' in these files to quickly find any execution events marked as likely regression causes.

If you are using the GUI for analysis, you should open trace data on the left and right sides (can either be a directory or a JAR file). Then use the Compare menu -> Compare All Traces item. It will analyze all trace runs with matching names and collect all the data for differencing analysis (and also for input into the mafia frequent itemset miner). Note that currently the GUI is best used to view and browse traces and their differences, while the command line version is best for performing the actual analysis, as it allows more configuration for analysis parameters.

The final output will be printed on the console listing the most relevant frequent itemsets found. You can also visually browse the differences between the traces using the new windows that were popped up for each trace dataset. The command line version will output several files whose names start with 'out-' as well as text files that record all of the thread execution traces with the differences marked. This allows for easy inspection of exactly where the differences were manifest in the program executions.

Download all data here (270 MB)

Valid XHTML 1.0 Strict