3

I need to get all the unique substring of a string. I have stored the string into a trie but I am not able to figure out how can i used the same to print all the unique substring for example

string aab all unique substrings are {"a", "aa", "aab", "ab", "b"}

here is my code for trie

#include <iostream>
#include <map>
#include <string>
#include <stack>

struct trie_node_t {
    typedef std::map<char, trie_node_t *> child_node_t;
    child_node_t m_childMap;
    trie_node_t() :m_childMap(std::map<char, trie_node_t*>()) {}

    void insert( std::string& word ) {
        trie_node_t *pNode = this;
        for ( std::string::const_iterator itr = word.begin(); itr != word.end(); ++itr) {
            char letter = *itr;
            if ( pNode->m_childMap.find(letter) == pNode->m_childMap.end()){
                pNode->m_childMap[letter] = new trie_node_t();
            }
            pNode = pNode->m_childMap[letter];
        }
    }

    void print() {
    }
};

int main ( int argc, char **argv ) {
    trie_node_t trie;
    trie.insert(std::string("aab"));
    trie.print();
}

How do i implement print function which will print all the unique substring.

I am looking for Linear time approach

Since I have built a trie, is there a any way I can iterate over and whenever I visit any node I can print it as a unique string.

Avinash
  • 11,749
  • 27
  • 102
  • 175
  • there is a memory leak in your code. probably doesnt matter in *this* case but ... – Anycorn Jan 09 '12 at 07:55
  • 2
    strictly speaking, the empty-string is also considered a substring of all strings. – BlueRaja - Danny Pflughoeft Jan 09 '12 at 07:58
  • @BlueRaja-DannyPflughoeft: But then again, he needs to print it, and every program prints the empty string :) – amit Jan 09 '12 at 08:03
  • Possible duplicate: http://stackoverflow.com/questions/2560262 . Suffix tree is the structure best suited to represent all unique substrings. You can build it in linear time even though there might be Theta(n^2) unique substrings. – Rafał Dowgird Jan 09 '12 at 09:13
  • possible duplicate of [Generate all unique substrings for given string](http://stackoverflow.com/questions/2560262/generate-all-unique-substrings-for-given-string) – SoapBox Jan 09 '12 at 09:27
  • @Rafał Dowgird, Suffix tree does not give me all the substring. http://ideone.com/AjZol , for aab i expect following string to come {"a", "aa", "aab", "ab", "b"} – Avinash Jan 09 '12 at 10:18
  • @Avinash: I'll elaborate in a proper answer. – Rafał Dowgird Jan 09 '12 at 11:57
  • RafałDowgird Suffix tree is not essential. @Avinash, if you have already built the trie properly, all you have to do is navigate it once. Keep track of parent string and whenever you move to a child node append that letter and print it the string until that letter. – ElKamina Jan 09 '12 at 21:47
  • @ElKamina, I did built trie properly, but I did not get how to keep track of parent string, I will give another try, if you have any code sample, pls share. – Avinash Jan 10 '12 at 03:30
  • func(node,str,n) { for (i = each_child_letter){str[n]=i;str[n+1]='\0'; func( node->child(i, str, n+1); } Something like this. Call this with func(root_node, some_string, 0). – ElKamina Jan 10 '12 at 04:59

3 Answers3

6

First, build a suffix tree. This represents all suffixes of the string and can be done in linear time. Since every substring is a prefix of a suffix, now you need to enumerate the prefixes of the suffixes.

Fortunately if two suffixes share a common prefix, the prefix will be on a single common path from root, so there's a 1-1 mapping between paths from root(*) in the tree and unique suffixes.

Therefore it is sufficient to iterate over all paths from root in the suffix tree to produce all the unique substrings.

(*) The paths in the suffix tree are compressed, i.e. an edge might represent several characters. You need to uncompress the paths to produce all the substrings, i.e. treat compressed edges as multi-node paths.

Rafał Dowgird
  • 38,640
  • 11
  • 73
  • 89
0

Note that every substring of myString has a length between 0 and strlen(myString). So just loop over every possible length, and every possible starting-position of the substring.

BlueRaja - Danny Pflughoeft
  • 75,675
  • 28
  • 177
  • 259
0

There are "end signs" in Trie, i.e. if a node is the last char of a string, then it's marked as one terminal.

So if you need to print all the strings in a Trie, you'll need to dfs() on that Trie, whenever visiting a node with end sign(which means it's a terminal), you know it's the last char of some string, so print it.

iloahz
  • 4,121
  • 7
  • 20
  • 30