Finding anagrams for a given word

Question

Two words are anagrams if one of them has exactly same characters as that of the another word.

Example : Anagram & Nagaram are anagrams (case-insensitive).

Now there are many questions similar to this . A couple of approaches to find whether two strings are anagrams are :

1) Sort the strings and compare them.

2) Create a frequency map for these strings and check if they are the same or not.

But in this case , we are given with a word (for the sake of simplicity let us assume a single word only and it will have single word anagrams only) and we need to find anagrams for that.

Solution which I have in mind is that , we can generate all permutations for the word and check which of these words exist in the dictionary . But clearly , this is highly inefficient. Yes , the dictionary is available too.

So what alternatives do we have here ?

I also read in a similar thread that something can be done using Tries but the person didn't explain as to what the algorithm was and why did we use a Trie in first place , just an implementation was provided that too in Python or Ruby. So that wasn't really helpful which is why I have created this new thread. If someone wants to share their implementation (other than C,C++ or Java) then kindly explain it too.

Something to help you looking for an answer: http://stackoverflow.com/questions/7896694/how-to-find-find-anagrams-among-words-which-are-given-in-a-file Basically, what you may do is to have hash function that yields same value for anagrams, and then convert your dictionary to a structure that allows fetch the list of words given such hash. — Bartosz, Sep 18 '12 at 12:57
What do you really want to do ? Find all anagrams that exists in a fixed dictionary from a given set of letters ? Or build an anagram relation over all the words in a fixed dictionary i.e. given a word from that dictionary, efficiently retrieve all the valid anagrams ? — Kwariz, Sep 18 '12 at 13:20
Given a dictionary with fixed set of words , and a random word (may or not be in the dictionary) , find its anagrams (which are present in the dictionary). Makes sense? — h4ck3d, Sep 18 '12 at 14:25

score 76 · Accepted Answer · answered Sep 18 '12 at 13:25

76

Example algorithm:

Open dictionary
Create empty hashmap H
For each word in dictionary:
  Create a key that is the word's letters sorted alphabetically (and forced to one case)
  Add the word to the list of words accessed by the hash key in H

To check for all anagrams of a given word:

Create a key that is the letters of the word, sorted (and forced to one case)
Look up that key in H
You now have a list of all anagrams

Relatively fast to build, blazingly fast on look-up.

answered Sep 18 '12 at 13:25

Vatine

18,928
3
50
67

alphabetical sorting of the words to produce the key is a great idea. Although careful after the lookup, you still need to weed out potential false positives. Just because two words have the same hash, it doesn't mean they are necessarily equal (although it's very likely in common languages). Still leaves some room for error. – mprivat Sep 18 '12 at 13:49
2

@mprivat I'd be happy if you can find two words that have the same sorted letter sequence that aren't anagrams of each other (note, we're not dropping any letters, the key for "banana" would be 'aaabnn' and any other word with exactly that key would by necessity have to be an anagram of "banana"). – Vatine Sep 18 '12 at 13:55
I'm not talking about the sorted letter sequence, I was talking about its numeric hash (which is what the hashmap will actually use as a key). But I guess depending on the language you are using, the hashmap implementation will deal with the key collision. – mprivat Sep 18 '12 at 14:04
@mprivat Ah, yes, a typical hashmap implementation should distinguish two different keys with the same hash before giving you the result. – Vatine Sep 18 '12 at 14:07
@Vatine Where have we used the TRIE ds here ? – h4ck3d Sep 18 '12 at 18:47
@sTEAK.Not at all, I don't see that using a trie is the best solution. – Vatine Sep 19 '12 at 09:14
Really fast one in python using this style of algorithm http://www.reddit.com/r/answers/comments/13o511/what_is_the_word_with_the_most_anagrams/ – phyatt Oct 13 '14 at 23:03
Instead of *"Relatively fast to build, blazingly fast on look-up"*, a standard way to express complexity (such `O(n)`, `O(log(n))` etc) would have been better. – Nawaz Jul 03 '16 at 04:58
@Nawaz, looks to be `O(w n log(w))` to build and `O(w log(w))` on lookup, on average, where `w` is the average word length in the dictionary and `n` is the dictionary length. – Him Dec 06 '18 at 13:13

ACV · Answer 2 · 2015-10-01T10:57:03.533

19

I came up with a new solution I guess. It uses the Fundamental Theorem of Arithmetic. So the idea is to use an array of the first 26 prime numbers. Then for each letter in the input word we get the corresponding prime number A = 2, B = 3, C = 5, D = 7 … and then we calculate the product of our input word. Next we do this for each word in the dictionary and if a word matches our input word, then we add it to the resulting list. All anagrams will have the same signature because

Any integer greater than 1 is either a prime number, or can be written as a unique product of prime numbers (ignoring the order).

Here's the code. I convert the word to UPPERCASE and 65 is the position of A which corresponds to my first prime number:

private int[] PRIMES = new int[] { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
        37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103,
        107, 109, 113 };

This is the method:

 private long calculateProduct(char[] letters) {
    long result = 1L;
    for (char c : letters) {
        if (c < 65) {
            return -1;
        }
        int pos = c - 65;
        result *= PRIMES[pos];
    }
    return result;
}

edited Oct 01 '15 at 10:57

answered Mar 09 '15 at 18:13

ACV

8,090
4
56
72

3

It's essentially another way of creating a key that is unique to all words that are anagrams of each other, i.e. that have the same set of letters. It's a nice idea, but the more obvious approach is to sort the letters alphabetically (or however you want, so long as it's consistent). E.g. the key for alphabetically is aaabcehilllpty. I wonder though if you're aproach would produce a more compact key, and would therefore have the potential to be more computationally efficient. – redcalx Aug 05 '15 at 18:11
2

yours is also a good idea, indeed. Sorting is a bit more expensive than multiplication. – ACV Aug 06 '15 at 08:25
2

`Sorting is a bit more expensive than multiplication` good point. In which case I think I prefer your approach using primes. I'm going to look into it further. – redcalx Aug 06 '15 at 09:26
I used this method – artifex_somnia Apr 13 '16 at 16:59
1

Before. I was a minutely bummed when I saw that someone thought of it before me lol. One experiment would be (that i haven't tried) to try really long words to see if the product could be stored an long int. Example Pneumonoultramicroscopicsilicovolcanoconiosis . – artifex_somnia Apr 14 '16 at 14:37
For long strings, we may not have a data-type to store a big multiplication. So, may not be the most robust solution but better than sorting if we know that strings wont be very long. – Aman Gupta May 09 '17 at 11:56
this is intended for words. Java's long can support long values for words up to 40 characters long. Otherwise, BigInteger could be used. – ACV May 09 '17 at 14:21
_Sorting is a bit more expensive than multiplication._ But one might assume that the dictionary is read in once, characters in words sorted, and stored as a list under the "sorted anagram" key (and the "list of anagrams" saved permanently in a useful form). Then any words under one key are anagrams. Any input word is quickly "hashed" to the sorted anagram key, and checked for known anagrams. Looking up prime numbers by `ord(chr)` or similar is going to be at least as long as sorted characters in words. Did anyone time this in any language? – Quantum Mechanic Aug 14 '18 at 13:40
preparing the data is a separate algorithm therefore you cannot consider it as one algorithm – ACV Aug 14 '18 at 16:17

score 2 · Answer 3 · answered Sep 18 '12 at 13:20

We know that if two words don't have the same length, they are not anagrams. So you can partition your dictionary in groups of words of the same length.

Now we focus on only one of these groups and basically all words have exactly the same length in this smaller universe.

If each letter position is a dimension, and the value in that dimension is based on the letter (say the ASCII code). Then you can calculate the length of the word vector.

For example, say 'A'=65, 'B'=66, then length("AB") = sqrt(65*65 + 66*66). Obviously, length("AB") = length("BA").

Clearly, if two word are anagrams, then their vectors have the same length. The next question is, if two word (of same number of letters) vectors have the same length, are they anagrams? Intuitively, I'd say no, since all vectors with that length forms a sphere, there are many. Not sure, since we're in the integer space in this case, how many there are actually.

But at the very least it allows you to partition your dictionary even further. For each word in your dictionary, calculate the vector's distance: for(each letter c) { distance += c*c }; distance = sqrt(distance);

Then create a map for all words of length n, and key it with the distance and the value is a list of words of length n that yield that particular distance.

You'll create a map for each distance.

Then your lookup becomes the following algorithm:

Use the correct dictionary map based on the length of the word
Compute the length of your word's vector
Lookup the list of words that match that length
Go through the list and pick the anagrams using a naive algorithm is now the list of candidates is greatly reduced

This is more like a heuristic approach? – h4ck3d Sep 18 '12 at 13:26 — h4ck3d, Sep 18 '12 at 13:26

Thumbnail · Answer 4 · 2017-04-27T11:13:08.107

Reduce the words to - say - lower case (clojure.string/lower-case).
Classify them (group-by) by letter frequency-map (frequencies).
Drop the frequency maps,
... leaving the collections of anagrams.

(These) are the corresponding functions in the Lisp dialect Clojure.

The whole function can be expressed so:

(defn anagrams [dict]
  (->> dict
       (map clojure.string/lower-case)
       (group-by frequencies)
       vals))

For example,

(anagrams ["Salt" "last" "one" "eon" "plod"])
;(["salt" "last"] ["one" "eon"] ["plod"])

An indexing function that maps each thing to its collection is

(defn index [xss]
  (into {} (for [xs xss, x xs] [x xs])))

So that, for example,

((comp index anagrams) ["Salt" "last" "one" "eon" "plod"])
;{"salt" ["salt" "last"], "last" ["salt" "last"], "one" ["one" "eon"], "eon" ["one" "eon"], "plod" ["plod"]}

... where comp is the functional composition operator.

Daniel · Answer 5 · 2012-09-18T13:29:41.017

Well Tries would make it easier to check if the word exists. So if you put the whole dictionary in a trie:

http://en.wikipedia.org/wiki/Trie

then you can afterward take your word and do simple backtracking by taking a char and recursively checking if we can "walk" down the Trie with any combiniation of the rest of the chars (adding one char at a time). When all chars are used in a recursion branch and there was a valid path in the Trie, then the word exists.

The Trie helps because its a nice stopping condition: We can check if the part of a string, e.g "Anag" is a valid path in the trie, if not we can break that perticular recursion branch. This means we don't have to check every single permutation of the characters.

In pseudo-code

checkAllChars(currentPositionInTrie, currentlyUsedChars, restOfWord)
    if (restOfWord == 0)
    {
         AddWord(currentlyUsedChar)
    }
    else 
    {
        foreach (char in restOfWord)
        {
            nextPositionInTrie = Trie.Walk(currentPositionInTrie, char)
            if (nextPositionInTrie != Positions.NOT_POSSIBLE)
            {
                checkAllChars(nextPositionInTrie, currentlyUsedChars.With(char), restOfWord.Without(char))
            }
        }   
    }

Obviously you need a nice Trie datastructure which allows you to progressively "walk" down the tree and check at each node if there is a path with the given char to any next node...

Could you give an example for this? Not really clear from your description. — h4ck3d, Sep 18 '12 at 13:19

score 1 · Answer 6 · edited May 27 '15 at 13:22

static void Main(string[] args)
{

    string str1 = "Tom Marvolo Riddle";
    string str2 = "I am Lord Voldemort";

    str2=  str2.Replace(" ", string.Empty);
    str1 = str1.Replace(" ", string.Empty);
    if (str1.Length != str2.Length)
        Console.WriteLine("Strings are not anagram");
    else
    {
        str1 = str1.ToUpper();
        str2 = str2.ToUpper();
        int countStr1 = 0;
        int countStr2 = 0;
        for (int i = 0; i < str1.Length; i++)
        {
            countStr1 += str1[i];
            countStr2 += str2[i];

        }
        if(countStr2!=countStr1)
            Console.WriteLine("Strings are not anagram");
        else Console.WriteLine("Strings are  anagram");

    }
    Console.Read();
}

Can you elaborate on your answer and describe how it solves the question, and also mention what it offers in addition to the already existing answers? — TZHX, May 27 '15 at 13:23

score 0 · Answer 7 · answered Sep 18 '12 at 13:05

Generating all permutations is easy, I guess you are worried that checking their existence in the dictionary is the "highly inefficient" part. But that actually depends on what data structure you use for the dictionary: of course, a list of words would be inefficient for your use case. Speaking of Tries, they would probably be an ideal representation, and quite efficient, too.

Another possibility would be to do some pre-processing on your dictionary, e.g. build a hashtable where the keys are the word's letters sorted, and the values are lists of words. You can even serialize this hashtable so you can write it to a file and reload quickly later. Then to look up anagrams, you simply sort your given word and look up the corresponding entry in the hashtable.

Generating the permutations itself is O(n!) and highly inefficient. — h4ck3d, Sep 18 '12 at 13:06

score 0 · Answer 8 · answered Sep 18 '12 at 13:17

That depends on how you store your dictionary. If it is a simple array of words, no algorithm will be faster than linear.

If it is sorted, then here's an approach that may work. I've invented it just now, but I guess its faster than linear approach.

Denote your dictionary as D, current prefix as S. S = 0;
You create frequency map for your word. Lets denote it by F.
Using binary search find pointers to start of each letter in dictionary. Lets denote this array of pointers by P.
For each char c from A to Z, if F[c] == 0, skip it, else
- S += c;
- F[c] --;
- P <- for every character i P[i] = pointer to first word beginning with S+i.
- Recursively call step 4 till you find a match for your word or till you find that no such match exists.

This is how I would do it, anyway. There should be a more conventional approach, but this is faster then linear.

Store the dictionary as a TRIE. – h4ck3d Sep 18 '12 at 13:18 — h4ck3d, Sep 18 '12 at 13:18

score 0 · Answer 9 · edited Dec 22 '13 at 21:18

tried to implement the hashmap solution

public class Dictionary {

    public static void main(String[] args){

    String[] Dictionary=new String[]{"dog","god","tool","loot","rose","sore"};

    HashMap<String,String> h=new HashMap<String, String>();

    QuickSort q=new QuickSort();

    for(int i=0;i<Dictionary.length;i++){

        String temp =new String();

        temp= q.quickSort(Dictionary[i]);//sorted word e.g dgo for dog

        if(!h.containsKey(temp)){
           h.put(temp,Dictionary[i]);
        }

        else
        {
           String s=h.get(temp);
           h.put(temp,s + " , "+ Dictionary[i]);
        }
    }

    String word=new String(){"tolo"};

    String sortedword = q.quickSort(word);

    if(h.containsKey(sortedword.toLowerCase())){ //used lowercase to make the words case sensitive

        System.out.println("anagrams from Dictionary   :  " + h.get(sortedword.toLowerCase()));
    }

}

score 0 · Answer 10 · answered Mar 08 '16 at 06:21

Compute the frequency count vector for each word in the dictionary, a vector of length of the alphabet list.
generate a random Gaussian vector of the length of the alphabet list
project each dictionary word's count vector in this random direction and store the value (insert such that the array of values is sorted).
Given a new test word, project it in the same random direction used for the dictionary words.
Do a binary search to find the list of words that map to the same value.
Verify if each word obtained as above is indeed a true anagram. If not, remove it from the list.
Return the remaining elements of the list.

PS: The above procedure is a generalization of the prime number procedure which may potentially lead to large numbers (and hence computational precision issues)

score 0 · Answer 11 · answered Jan 12 '21 at 07:10

# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
        "Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP",
         "Protijayi","Paikpara","dipSouta","Shyambazaar",
        "jayiProti", "North Calcutta", "Sovabazaar"]

#Method 1
A = [''.join(sorted(word)) for word in words]

dict ={}

for indexofsamewords,samewords in enumerate(A):
    dict.setdefault(samewords, []).append(indexofsamewords)
    
print(dict)
#{'AOOPR': [0, 2, 5, 9, 11], 'ABTU': [1, 3, 4], 'Sadioptu': [6, 14], ' KPaaehiklry': [7], 'Taeggllnouy': [8], 'Leov': [10], 'Paiijorty': [12, 16], 'Paaaikpr': [13], 'Saaaabhmryz': [15], ' CNaachlortttu': [17], 'Saaaaborvz': [18]}

for index in dict.values(): 
    print( [words[i] for i in index ] )

The Output :

['ROOPA', 'OOPAR', 'PAROO', 'AROOP', 'AOORP']
['TABU', 'BUTA', 'BUAT']
['Soudipta', 'dipSouta']
['Kheyali Park']
['Tollygaunge']
['Love']
['Protijayi', 'jayiProti']
['Paikpara']
['Shyambazaar']
['North Calcutta']
['Sovabazaar']

Jitendra Rathor · Answer 12 · 2015-07-16T18:49:12.483

-3

One solution is - Map prime numbers to alphabet characters and multiply prime number

For ex - 

    a -> 2
    b -> 3
    ......
    .......
    ......
    z -> 101

So

'ab' -> 6
'ba' -> 6
'bab' -> 18
'abba' -> 36
'baba' -> 36

Get MUL_number for Given word. return all the words from dictionary which have same MUL_number as given word

edited Jul 16 '15 at 18:49

answered Jul 16 '15 at 18:39

Jitendra Rathor

130
7

1

This is the approach posted just above. – ACV Oct 01 '15 at 12:15

score -3 · Answer 13 · answered Sep 28 '16 at 17:18

-3

First check if the length of the strings are the same. then check if the sum of the characters in both the strings are same (ie the ascii code sum) then the words are anagrams else not an anagram

answered Sep 28 '16 at 17:18

Athul

1

Finding anagrams for a given word

13 Answers13

Linked