Skip to main content

Suffix Trees , a bit complicated but really powerful data structure

Continuing with my obsession with the string matching algorithms , after 
KMP algorithm which does string matching in O(n +k) time , where n is length of the string and k is the length of the pattern. I tried exploring another data structure called as “Suffix Trees” , Suffix trees are nothing but compressed tries , where you model all the suffix present in a string in an optimized tree/trie structure.



this is how a suffix tree for word “BANANA” will look like




some of the other common use cases , where suffix trees can be used is 



  • Find the longest common substring of the strings S and T. This can be done in time O(|S| + |T|)


  • Find the longest substring of S that appears at least m > 1 times. This query can be answered in O(|S|) time.




Traverse the suffix tree labeling the branch nodes with the sum of the label lengths from the root and also with the number of information nodes in the subtrie.


Traverse the suffix tree visiting branch nodes with information node count >= m. Determine the visited branch node with longest label length.




  • Find all strings that contain a pattern P. Suppose we have a collection S1, S2, … Sk of strings and we wish to report all strings that contain a query pattern P. For example, the genome databank contains tens of thousands of strings, and when a researcher submits a query string, we are to report all databank strings that contain the query string. To answer queries of this type efficiently, we set up a compressed trie (we may call this a multiple string suffix tree) that contains the suffixes of the string S1$S2$…$Sk#, where $ and # are two different digits that do not appear in any of the strings S1, S2, …, Sk. In each node of the suffix tree, we keep a list of all strings Si that are the start point of a suffix represented by an information node in that subtrie.

i have referred to http://stackoverflow.com/questions/9452701/ukkonens-suffix-tree-algorithm-in-plain-english/22126672#22126672


to write an implementation for the suffix trees , the above link explains it very nicely on how to build your own suffix trees.





Comments

Popular posts from this blog

Fractals and Mandelbrot Set

While mathematics is in itself quite interesting and forms the basis of any modern day research, be it computational biology, machine learning or building complex structure, it can be quite a challenge to decide where to start.  That is why i decided to explore Fractals, thinking of it as a bridge between the nature and science. It brings in some really fascinating concepts which should be good enough for me as a gateway go deeper.  Fractals are in simple language never ending patterns which keep on repeating without an end, because fractals are never ending they have an infinite perimeter but finite area.  Since the patterns repeats indefinitely but if you draw a circle around the peremeter the area will remain finite.  It is like adding 1+0.1+0.01+0.001 and never making 2 This video explains the basic concept really well  Fractals are found everywhere nature in Trees, Rivers, Branching patterns, Hurricanes and Galaxies. It tries to bring order and understanding to the patterns that w
It was a great experience to talk to a huge audience in Mumbai and Delhi about how to start your ML journey at Google Cloud Summit ’18 India

COVID-19 Lockdown

It is May 11th 2020 and most of India has been under lockdown for 46 straight days. These have been tough times for everyone. While some of the folks i know have been cribbing about being bored to death in this lockdown, for me it has been a busy time. My current field of work which is "Cloud Technology" picked up as everyone was locked up with nothing but technology to entertain and survive and hence there was a lot of work. While me and my family was trying to navigate the chaos and madness of ordering everything online, i would say we were able to deal with the situation without going out much.  This was until my phone broke, crashed, bricked, it was gone.  While i had a spare 4 year old iPhone to fall back on but you know about spare phones, they were too old to trade for a newer one is why you decided to label it as a spare and keep it. They almost never work. So after struggling to get my office and personal stuff to work on an old phone and tablet, i decided i needed t