alignment, edit distance, graph, levenstein, matching, python |
Pipelines became a quite popular way to represent a workflow. Many tools such as RapidMiner, Orange or Weka use graphs to represent the sequence of processes data follow. Most of the time, between two experiments, we just copy paste a previous one, we add or delete a few nodes. After a while, it becomes uneasy to find out what modifications were made. I was wondering how it would be possible to automate such a task. How to find what modifications were introduced in a graph ?
Graph 1 | Graph 2 | Two graphs we would like to merge into a single one. Vertices 00 and 11 represents the only root and the only leave. |
The following paper gives an answer to those questions for graphs without cycles. The outcome is a way to match two graphs altogether and to build an alignment between edges and vertices from both graphs. This method is based on Levenstein's edit distance.
Graph 1 | Graph 2 | Merged graph |
You can find an implementation here: graph_distance.py, importme.py. This algorithm is still very slow and could be improved. Maybe for another blog post.
<-- --> |