XD blog

blog page

edit distance, python, sequence alignment


2013-02-10 Edit distance and sequence alignment

Some algorithms, you cannot avoid implementing them a couple of times every year. When you work for a search company, you always deal with text. You use regular expressions almost every week. The edit distance is one of these algorithms you never find in the company's wiki or not in the programming language you expect. The code references some weird assembly you don't need and don't want. Or simply, this time, you don't need the distance but the path between the two sequences. You expect to find this algorithm to be somewhere present, accessible, somewhere you don't have to look for, somewhere on the company's wiki. But surprisingly, everybody did his own, not documented, including unexpected paramters. I wish I could even keep myself some versions I made but I often implemented the edit distance in a way that it was faster for my specific needs at that time. Even for this blog (How to normalize an edit distance ?), I did it again. I remember one time doing it to prove my students the edit distance was another versions of the shortest path in a graph. I did it again today: edit_distance_alignment.py. Maybe it will be the last time?


<-- -->

Xavier Dupré