which of these two implementation is faster and why , to check the strings anagram

Question

First one using Arrays.sort

public static void main(String[] args) {
        String s="schoolmaster";
        String s1="theclassroom";

        char a[]=s.toCharArray();
        Arrays.sort(a);
        char a1[]=s1.toCharArray();
        Arrays.sort(a1);
        System.out.println(Arrays.equals(a, a1));


    }

and this one using hashmap

public static void main(String[] args) {
        String s="murphee";
        String s1="purhmee";

        System.out.println(new Anagram().checkAnagram(s, s1));


    }

    boolean checkAnagram(String s,String s1)
    {
        if(s.length()!=s1.length())
            return false;

        Map<Character,Integer> hs= new HashMap<Character,Integer>();
        char c[]=s.toCharArray();
        for(char cs:c)
        {
            if(hs.containsKey(cs))
            {
                hs.put(cs,hs.get(cs)+1);

            }
            else
                hs.put(cs,0);
        }
        char c1[]=s1.toCharArray();
        for(char cs:c1)
        {
            if(hs.containsKey(cs))
            {
                if(hs.get(cs)!=0)
                hs.put(cs,hs.get(cs)-1);
                else
                hs.remove(cs);
            }
        }
        if(hs.size()==0)
            return true;
        else
            return false;



    }

The implementation of hashmap is calling put and get several times , isnt is slower that Arrays...

Please help me understand how the complexity will work , and if any other better ways to find anagrams

score 1 · Accepted Answer · answered Jun 29 '14 at 10:30

1

The array implementation is O(n * log n) (if not worst), since the arrays are being sorted.
On the other hand, the hash map implementation is O(n), since each insert/change into the hash map is O(1) and you do this for each of the characters (O(n) * 2 = O(n)).

Therefore, the hash map implementation (in theory) is more efficient. Notice, however, that if you're only dealing with small arrays, this is not really the case. The complexity calculations are aimed at large inputs, not small.

answered Jun 29 '14 at 10:30

ethanfar

3,673
21
41

Oh, are you sure about `O(1)` in inserting into `HashMap`? That's absolutely impossible. The correct complexity is `O(log n)`, so the final complexity is the same, but surely, sorting an array is much more simpler task, that inserting elements one-by-one into `HashMap` – Dmitry Ginzburg Jun 29 '14 at 10:33
1

I'm sure. Look at this for more information: http://stackoverflow.com/questions/4553624/hashmap-get-put-complexity. Moreover, check online for hash table complexity (that's what HashMap is). The O(1) is the whole reason for using them. You are wrong about O(log n). Please provide some references if you're going to down vote. This is simply misleading others. – ethanfar Jun 29 '14 at 10:35
I should explain more clearly: in CS there's no possibility of existance of such a `HashMap`, because that would mean that sorting algorithm can be evaluated in `O(n)` complexity and this fact is proven to be false. – Dmitry Ginzburg Jun 29 '14 at 10:37
2

You're wrong again, and should really review your CS material. Here's a presentation from MIT (you're not going to try to argue with them as well, are you ?) about hash tables. http://web.mit.edu/16.070/www/lecture/lecture_18.pdf – ethanfar Jun 29 '14 at 10:39
Okay then, I see, there's some hack on it. But, in Arrays.sort there's now the same hack for making it work on `O(n)`: [this code](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/DualPivotQuicksort.java#DualPivotQuicksort.sort%28char%5B%5D%2Cint%2Cint%29)(really, there's counting sort, which takes `O(n)` on such a structures) is used for [sorting](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/Arrays.java#Arrays.sort%28char%5B%5D%29) `char[]` arrays. – Dmitry Ginzburg Jun 29 '14 at 10:49
1

There's no "hack on it", that's a basic attribute of hash tables. Access operations are O(1), that's CS 101. The dual pivot quicksort that you referred me to on the other hand, is most definitely O(n * log n), and was invented to reduce the frequency of the regular quicksort algorithm taking O(n ^ 2) on a lot of end cases. You should really review complexity of data structures, and please stop misleading people by giving wrong information. – ethanfar Jun 29 '14 at 10:58
It's not the wrong information, if you look exactly at the code I give you, you'll see, that in this concrete case the array would be sorted not by double pivot quick sort algo, but with counting sort (except for the case of very small arrays, but there's no difference in complexity on them). – Dmitry Ginzburg Jun 29 '14 at 11:00
@DmitryGinzburg If you bothered to read the code you provided, you would learn that the threshold for counting sort to be used (3200) is far too high for these arrays to be sorted using it. Dual pivot quicksort would certainly be used in this case, probably because the constant factors for counting sort are too high for it to be worth using for smaller arrays. – awksp Jun 29 '14 at 16:34
@DmitryGinzburg In addition, there is no "hack" involved in making counting sort operate in `O(n)`. Counting sort is not a comparison sort, so the "barrier" of `O(n lg n)` does not apply to it. – awksp Jun 29 '14 at 16:35
@eitanfar so HashMap is the only way for the problem(anagram) , any other way faster than this. ? – anshulkatta Jun 30 '14 at 09:46
I don't have the numbers, but there are other data structures usually used when dealing with string operations and comparisons. If you want to dig dipper, check out Prefix Tree(Trie) (http://en.wikipedia.org/wiki/Prefix_tree), and Suffix Tree (http://en.wikipedia.org/wiki/Suffix_tree). I'm not sure if it'll do a better job, but it's worth taking a look, even only for broadening your knowledge on data structures :-) – ethanfar Jul 06 '14 at 04:55

which of these two implementation is faster and why , to check the strings anagram

1 Answers1