Skip to main content

Ruby implementation of the famous Knuth–Morris–Pratt Algorithm

today just decided to write the  Knuth–Morris–Pratt string searching algorithm , implementation in ruby. 



As you can see that the algorithm is divided into two parts , one is the preprocess , which is the crux of the Algo and the actual string searching using the preprocessed metadata.


The preprocess function is run only on the patterns, and it calculates π[i] = the longest proper prefix of pattern[0..i] , which is also a suffix of pattern[0..i].


e.g. if the pattern you are searching for is 

P = [ ‘A’,'B’,'A’,'B’,'A’,'C’,'A’]


the π array will look like


π = [ 0,0,1,2,3,0,1]


which shows that longest running prefix of the pattern is ABA . Buy doing the above calculations the algo saves itself some of the comparisons which it had to do in the naive , brute force approach.



if you get the preprocessing function then the rest of the things are straight forward , the algorithm is intelligent enough to understand that if next char in the seq does not match , but if it is part of the substring which will match the entire pattern then it will not do the additional comparisons.



As you can see that the time complexity of the search function is O(n) , and that of the prepropcess should be O(k) where k is the length of the pattern.


In my case the time complexity of preprocess is more than O(K) because of the additional running_sequence method , but that could be removed if you maintained the seq to match the prefix in the preprocess loop itself.



 

Comments

Popular posts from this blog

It was a great experience to talk to a huge audience in Mumbai and Delhi about how to start your ML journey at Google Cloud Summit ’18 India

Fractals and Mandelbrot Set

While mathematics is in itself quite interesting and forms the basis of any modern day research, be it computational biology, machine learning or building complex structure, it can be quite a challenge to decide where to start.  That is why i decided to explore Fractals, thinking of it as a bridge between the nature and science. It brings in some really fascinating concepts which should be good enough for me as a gateway go deeper.  Fractals are in simple language never ending patterns which keep on repeating without an end, because fractals are never ending they have an infinite perimeter but finite area.  Since the patterns repeats indefinitely but if you draw a circle around the peremeter the area will remain finite.  It is like adding 1+0.1+0.01+0.001 and never making 2 This video explains the basic concept really well  Fractals are found everywhere nature in Trees, Rivers, Branching patterns, Hurricanes and Galaxies. It tries to bring order and understanding to the patterns that w