Data Flow Analysis: Using Soot

Soot is a product of the Sable Research Group from McGill University, analyze, instrument, optimize and visualize Java and Android applications. Here we use the tool for Data Flow Analysis on simple Java programs.

Soot Home [outdated]
Soot Home
Github Soot Wiki
Soot Data Flow Analysis Tutorial [outdated]

Setting up Soot

You can download the latest version of Soot from their website . I recommend downloading the latest nightly build since the latest release version is over 5 years old. Read the tutorial on their github wiki on how to use it. Get familiar with using Soot as a command line tool. Learn what the command line options mean; we would especially be using the options -cp, -pp, -w, -src-prec, -main-class, -f, and -d. You can run Soot on the command line as:

	java -cp soot.jar soot.Main -cp path/to/test/class -pp -w -f J test_class

This would convert the test class to a Jimple file (due to the -f J option) and put it in a directory called sootOutput. You can convert to other output formats using the -f option and select the output directory with -d. Run your own test cases to make sure Soot is working. Depending on your OS, version of Soot, and version of Java tyou may need to some modifications. If you are experiencing errors try updating to the latest nightly build of Soot (present on the Soot homepage

), and to the latest version of Java.

Data Flow Analysis

Soot provides FlowAnalysis classes for fixed-point computation of static data flow analysis (using the worklist algorithm). Ther are classes for ForwardFlowAnalysis and BackwardFlowAnalysis depending on what kind of analysis is required. We can create a new class for data flow analysis by extending any of these class and filling in the approrpiate methods:

newInitialFlow(): Set an initial value for the IN and OUT FlowSets of all nodes in the graph. (ususally for may analysis the sets are initialized to empty, while for must analysis the sets are initialized to the universal set- i.e. all possible values which can be part of the set). You can also set different values for each node by overriding the customizeInitialFlowGraph() function.

entryInitialFlow(): Set the initial value of the IN FlowSets for the entry node(s) in the graph. Note that for BackwardFlowAnalysis the entry node(s) is the last node(s) in the graph (there can be multiple last nodes).

merge(inSet1, inSet2, outSet): Compute the merge of the two inSets and put it in the outSet. Used for the MEET operator of sets when a set has several predecessors. For a may analysis the merge can be implemented by union and for a must analysis by intersection. However the exact implementation must take care of aliasing, as well as if the FlowSet implementation performs union/intersection based on values or based on reference.

copy(inSet, outSet): Make a copy of inSet into the outSet. (Used for copying sets in the algorithm.)

flowThrough(inSet, node, outSet): Compute the data flow thorough a single node. Used as the transfer function for the data flow set across the node (or statement). Given the FlowSet before a node compute the FlowSet after the node based on the statements within the node.

Constructor: A constructor must be implemented that at least takes a DirectedGraph as an an argument. In this constructor you must first call the contructor of the super-class and pass the graph to the super-constructor. Also, you should call doAnalysis() at the end of your constructor, which actually executes the data flow analysis.

Look into the detailed description of intraprocedural data flow analysis on the github wiki.

Exercise: Live Variable Analysis

A variable is live at a program point if its value may be read later in some execution of the program. Live Variable Analysis is useful in compiler optimizations such as dead code elimination. Implement Live Variable Analysis using Soot and run it on some examples.

[Complete Exercise]

Assignment: Upward Exposed Uses

Upwards exposed uses serves a similar purpose to reaching definitions analysis. Instead of finding which definitions reach each point, we find which uses of a variable have not yet been matched with one or more definitions. Given a control flow graph the use of the variable x at vertex v is upwards exposed at a vertex u if x is read at v,and there exists a path from u to v along which which no vertex assigns to x.

[Complete Problem] [Base Code]

[ << Prev ]

[ Next >> ]