Provenance Tracking in Python

1. The Code

You can find my code and paper on GitHub. Special thanks to Thomas Witte for advising me throughout the project.

2. Demonstration

Suppose we have some function in Python which we don't necessarily know the implementation of.

def blackbox(x):
    return 3 + (x * 7)

Using a wrapper provided from the library, you can call the function and calculate what input you have to give to the function in order to result in a desired output.

>>> from rosslt import Tracked
>>>
>>> expr = blackbox(Tracked(0.0)).get_expression()
>>> param = expr.reverse()(42.0)
>>> print(f"blackbox({param}) = {blackbox(param)}")
blackbox(5.571428571428571) = 42.0

This works as long as the function is deterministic and the operations applied to the value does not change.

3. The Goal

We want to know what to input as x, so that the returned value would be 42. Now obviously this can be done by looking at the function implementation code and figuring out that you have to first subtract 3 from 42 and then divide the result by 7 to get the solution. However, this can be done in an automated way.

4. Provenance Tree

pt_example.png

Figure 1: blackbox(x) in a tree for x = α

By creating a wrapper class overriding every operator, we can record all operations and store them in a tree structure. Python makes this easy, however there are quite a few operators to be implemented.

5. Tree Inversion

Provenance trees can be inverted relatively easy, though there are many cases to consider, especially because some operators are not commutative.

pt_inverted.png

Figure 2: inverted provenance tree

6. Tree Evaluation

If we now evaluate the result we want of our blackbox function with the inverted provenance tree, we can calculate the required input in an automated fashion.

f(β) = (β - 3) / 7
f(42) = (42 - 3) / 7 = ~5.57

While this works for most cases, in some cases the evaluation of the provenance tree can have multiple solutions, such as when the value is multiplied by zero.

7. Optimizations

To make the library more efficient, there have been a few optimizations implemented.

7.1. Postfix Notation

The parameter count of all operations is fixed, which means the whole tree can be stored in postfix (RPN) notation within a linear list.

pt_postfix.png

7.2. Serialization

By giving each operator a code and encoding the numbers, we can efficiently store the operations in a binary format.

ID Operator Identifier
0 SWAP swap
1 ADD +
2 SUB -
3 MUL_INT *
4 MUL *
5 DIV /
6 DIV_FLOOR //
7 SIN sin
8 COS cos
9 ASIN asin
10 ACOS acos
11 POW pow
12 IPOW ipow

To properly invert a floor division, we added an integer multiplication operator. There are also a few frequently basic trigonometric functions implemented.

Code Name Size
0x01 INT32 4 bytes
0x02 INT64 8 bytes
0x03 DOUBLE 8 bytes
0x04 COMPLEX 16 bytes
0x05 STRING variable

Storing the values themselves works by defining a few data type codes. We can store both the operator and value codes in a single byte by using 4 bits each.

Name Data Size
Expression -1.3833333333333333;swap;-;4.2;*;sin 36 bytes
Value Array 222222222222F6BFCDCCCCCCCCCC1040 16 bytes
Code Array 034042034447 6 bytes

pt_serialization.png

8. Location Tree

pt_ltree.png

Figure 3: the concept

To properly support classes, we have to keep track of all the attributes, even when nested. The solution is to override the dot operator to return a temporary wrapper that keeps track of how nested it is and which path it has. Using a reference to the parent object it can record everything as necessary. I will not go into the details here, however.

9. Monitoring

For introspection and debugging purposes in ROS nodes, the library features a monitoring tool visualizing the provenance history as an expression string, with the ability to force specific values by notifying the value wrapper to override the source value with the automatically calculated required value. This works by evaluation the inverted provenance tree as shown above.

pt_monitoring.png

Figure 4: monitoring a ROS topic

10. Usage in ROS Nodes

10.1. Publisher

# create tracked instance
marker_value = self.location(Marker())

# supports setting initial values outside instance creation
marker_value.pose.position.x = 1.5
marker_value.pose.position.y = 2.5
marker_value.pose.position.z = 3.0

# apply calculation
marker_value.pose.position.x += 1
marker_value.pose.position.y *= 2
marker_value.pose.position.z /= 3

# publish message
self.publish(self.pub_marker, marker_value)

10.2. Subscriber

marker = Tracked.from_msg(msg)
pos = marker.pose.position

self.force_value(pos.x, pos.x + 1)
self.force_value(pos.y, pos.y + 2)
self.force_value(pos.z, pos.z + 3)