Provenance Tracking in Python

1. The Code

You can find my code and paper on GitHub. Special thanks to Thomas Witte for advising me throughout the project.

2. Demonstration

Suppose we have some function in Python which we don't necessarily know the implementation of.

def blackbox(x):
    return 3 + (x * 7)

Using a wrapper provided from the library, you can call the function and calculate what input you have to give to the function in order to result in a desired output.

>>> from rosslt import Tracked
>>>
>>> expr = blackbox(Tracked(0.0)).get_expression()
>>> param = expr.reverse()(42.0)
>>> print(f"blackbox({param}) = {blackbox(param)}")
blackbox(5.571428571428571) = 42.0

This works as long as the function is deterministic and the operations applied to the value does not change.

3. The Goal

We want to know what to input as x, so that the returned value would be 42. Now obviously this can be done by looking at the function implementation code and figuring out that you have to first subtract 3 from 42 and then divide the result by 7 to get the solution. However, this can be done in an automated way.

4. Provenance Tree

Figure 1: blackbox(x) in a tree for x = α

By creating a wrapper class overriding every operator, we can record all operations and store them in a tree structure. Python makes this easy, however there are quite a few operators to be implemented.

5. Tree Inversion

Provenance trees can be inverted relatively easy, though there are many cases to consider, especially because some operators are not commutative.

Figure 2: inverted provenance tree

6. Tree Evaluation

If we now evaluate the result we want of our blackbox function with the inverted provenance tree, we can calculate the required input in an automated fashion.

f(β) = (β - 3) / 7
f(42) = (42 - 3) / 7 = ~5.57

While this works for most cases, in some cases the evaluation of the provenance tree can have multiple solutions, such as when the value is multiplied by zero.

7. Optimizations

To make the library more efficient, there have been a few optimizations implemented.

7.1. Postfix Notation

The parameter count of all operations is fixed, which means the whole tree can be stored in postfix (RPN) notation within a linear list.

7.2. Serialization

By giving each operator a code and encoding the numbers, we can efficiently store the operations in a binary format.

ID	Operator	Identifier
0	`SWAP`	swap
1	`ADD`	+
2	`SUB`	-
3	`MUL_INT`	*
4	`MUL`	*
5	`DIV`	/
6	`DIV_FLOOR`	//
7	`SIN`	sin
8	`COS`	cos
9	`ASIN`	asin
10	`ACOS`	acos
11	`POW`	pow
12	`IPOW`	ipow

To properly invert a floor division, we added an integer multiplication operator. There are also a few frequently basic trigonometric functions implemented.

Code	Name	Size
0x01	INT32	4 bytes
0x02	INT64	8 bytes
0x03	DOUBLE	8 bytes
0x04	COMPLEX	16 bytes
0x05	STRING	variable

Storing the values themselves works by defining a few data type codes. We can store both the operator and value codes in a single byte by using 4 bits each.

Name	Data	Size
Expression	-1.3833333333333333;swap;-;4.2;*;sin	36 bytes
Value Array	222222222222F6BFCDCCCCCCCCCC1040	16 bytes
Code Array	034042034447	6 bytes

8. Location Tree

Figure 3: the concept

To properly support classes, we have to keep track of all the attributes, even when nested. The solution is to override the dot operator to return a temporary wrapper that keeps track of how nested it is and which path it has. Using a reference to the parent object it can record everything as necessary. I will not go into the details here, however.

9. Monitoring

For introspection and debugging purposes in ROS nodes, the library features a monitoring tool visualizing the provenance history as an expression string, with the ability to force specific values by notifying the value wrapper to override the source value with the automatically calculated required value. This works by evaluation the inverted provenance tree as shown above.

Figure 4: monitoring a ROS topic

10. Usage in ROS Nodes

10.1. Publisher

# create tracked instance
marker_value = self.location(Marker())

# supports setting initial values outside instance creation
marker_value.pose.position.x = 1.5
marker_value.pose.position.y = 2.5
marker_value.pose.position.z = 3.0

# apply calculation
marker_value.pose.position.x += 1
marker_value.pose.position.y *= 2
marker_value.pose.position.z /= 3

# publish message
self.publish(self.pub_marker, marker_value)

10.2. Subscriber

marker = Tracked.from_msg(msg)
pos = marker.pose.position

self.force_value(pos.x, pos.x + 1)
self.force_value(pos.y, pos.y + 2)
self.force_value(pos.z, pos.z + 3)