3

I am writing my own Radix Sort method to sort the words in a String (the big black cat sat on the  beautiful brown mat would be sorted as beautiful big black brown cat mat on sat the the). The method takes in a List (my own List interface) of the individual words and reorders the list in place.

Here is my method so far:

public static void stringRadixSort(List<String> list, int letters) {
    List<String>[] buckets = (List<String>[]) Array.newInstance(List.class, 26);

    int letterNumber = 1; //Sorts list by 1st letter of each word, then 2nd etc.
    for (int i = 0; i < letters; i++) {
        while (!list.isEmpty()) {
            String word = list.remove(list.first());
            if (word.length() > letters) throw new UnsortableException("The list contains a word that holds more letters than the given maximum number of letters."
                    + "\nMax Letters: " + letters + "\nWord: " + word);
            String letter = word.substring(letterNumber - 1, letterNumber); //EXCEPTION THROWN
            char ch = letter.charAt(0);
            int index = ch - 'a';    //gets index of each letter ('a' = buckets[0], 'z' = buckets[25]
            if (buckets[index] == null) {
                buckets[index] = new LinkedList<String>();
            }
            buckets[index].insertLast(word);
        }

        for (int j = 0; j < buckets.length; j++) {
            if (buckets[j] != null) {
                while (!buckets[j].isEmpty()) {
                    list.insertLast(buckets[j].remove(buckets[j].first()));
                }
            }
        }
        letterNumber++;
    }
}

The (only, I hope) problem with my method is that when I am reading each character of the word, I create a single letter substring of the word. As the outer for loop runs through letters times (where letters is the maximum length of a word in the List), the exception is thrown when this loop is on an iteration greater than the length of the current word - i.e. letterNumber > word.length() - and so it is attempting to create a substring using String Indexes which are greater than the String's length.

How can I adjust my method so that it only creates substrings of each word until letterNumber == word.length(), and also then be able to apply the sorting algorithm to these shorter words - "a" would become before "aa".

KOB
  • 3,062
  • 1
  • 24
  • 60
  • It seems you have an **empty word** in the list. This could happen if one splits on non-word chars and they are at the beginning or end, or one did not take in account, that more than one non-word char might be between words. – Joop Eggen Apr 08 '16 at 15:59

3 Answers3

2

Just group the elements that are shorter than the string length in a additional group. Also you need to sort the least significant (relevant) character first. The following code uses java collections instead of whatever datastructure you were using:

public static void stringRadixSort(List<String> list, int letters) {
    if (list.size() <= 1) {
        return;
    }

    List<String>[] buckets = new List[27];
    for (int i = 0; i < buckets.length; i++) {
        buckets[i] = new LinkedList<>();
    }
    int largestLength = -1;
    int secondLargestLength = 0;
    for (String s : list) {
        int length = s.length();
        if (length >= largestLength) {
            secondLargestLength = largestLength;
            largestLength = length;
        } else if (secondLargestLength < length) {
            secondLargestLength = length;
        }
    }

    if (largestLength > letters) {
        throw new IllegalArgumentException("one of the strings is too long");
    }

    for (int i = secondLargestLength == largestLength ? secondLargestLength-1 : secondLargestLength; i >= 0; i--) {
        for (String word : list) {
            int index = (word.length() <= i) ? 0 : word.charAt(i) - ('a' - 1);
            buckets[index].add(word);
        }

        list.clear();

        for (List<String> lst : buckets) {
            if (lst != null) {
                list.addAll(lst);
                lst.clear();
            }
        }
    }
}
fabian
  • 67,623
  • 12
  • 74
  • 102
  • I like this solution where `buckets[0]` holds the shorter words. If the list in `buckets[0]` contains more than one word, would they still be sorted? Sorry, I don't have the time to analyse your solution in full now, but I'll let you know how I get on later. – KOB Apr 07 '16 at 14:55
  • 1
    @KOB: Yes. This produces the same order that would be produced, if you padded the `String`s with `('a'-1)`. Therefore it just prefers shorter strings over longer ones, if they have the same prefix... Note that the algorithm starts with the **least significant** character and uses the fact that the elements in the buckets remain in the same order they were in the list before. After each iteration of the loop the list will be sorted by the substrings starting at index `i` where substings for too large indices are considered to be empty. – fabian Apr 07 '16 at 15:24
  • Unfortunately, my code uses my own List interface quiet a bit and so I cannot change this class to use the Java Utils List. I have edited your solution to use my list instead - from what I can tell it doesn't change the functionality of the algorithm at all, just changes the List methods used to edit the List. [Here is my edited version](http://pastebin.com/tzS9LphY). This is sorting `10 : the big black cat sat on the beautiful brown mat` as `8 : cat beautiful big the mat on sat the` where `10` and `8` are the size of each list, added in my `toString` method. – KOB Apr 08 '16 at 15:41
1

Why don't you replace

String letter = word.substring(letterNumber - 1, letterNumber);
char ch = letter.charAt(0);

with

char ch = word.charAt(letterNumber - 1);

which gives you the char directly. But this doesn't solve the problem with the IndexOutOfBoundException.

You should of course catch the exception and handle it. Maybe it is good to create a bucket for this case: When the word is too short for the current iteration, it is sorted into a bucket. When merging the list back together, take the elements of this bucket first.

public static void stringRadixSort(List<String> list, int letters) {
    List<String>[] buckets = (List<String>[]) Array.newInstance(List.class, 27);

    int letterNumber = 1; //Sorts list by 1st letter of each word, then 2nd etc.
    for (int i = 0; i < letters; i++) {
        while (!list.isEmpty()) {
            String word = list.remove(list.first());
            if (word.length() > letters) throw new UnsortableException("The list contains a word that holds more letters than the given maximum number of letters."
                + "\nMax Letters: " + letters + "\nWord: " + word);
            int index;
            if(word.length() > letterNumber) {
                char ch = word.charAt(letterNumber - 1);
                index = ch - 'a' + 1;    //gets index of each letter ('a' = buckets[1], 'z' = buckets[26], buckets[0] is for short words
            } else {
                index = 0;
            }
            if (buckets[index] == null) {
                buckets[index] = new LinkedList<String>();
            }
            buckets[index].insertLast(word);
        }

        for (int j = 0; j < buckets.length; j++) {
            if (buckets[j] != null) {
                while (!buckets[j].isEmpty()) {
                    list.insertLast(buckets[j].remove(buckets[j].first()));
                }
            }
        }
        letterNumber++;
    }
}
l7r7
  • 520
  • 3
  • 14
  • Thanks, I don't know how that didn't come to mind. The original problem still exists nonetheless – KOB Apr 07 '16 at 12:50
  • Yes, I see. I'll try to have a look into the problem – l7r7 Apr 07 '16 at 12:51
  • 1
    Using `try/catch` instead of `if` is a bad practice. Since it's pretty easy to test, if a certain index would throw a `IndexOutOfBoundsException` for a given `String`, it should be done with a `if` rather than a `try/catch`. – fabian Apr 07 '16 at 13:42
  • @user187470 I like that solution, I'm don't have access to my code now so I'll implement it later and let you know how it works. Thanks – KOB Apr 07 '16 at 14:51
  • @fabian I updated my answer, thank you. This is much better – l7r7 Apr 08 '16 at 08:11
  • @user187470 I think you have made an error with `if(word.length() <= letterNumber)`. Should it not be `>=`? But doing so messes with the sorting and I can't seem to figure it out. – KOB Apr 08 '16 at 15:18
  • That's also what I changed your solution to when testing it, however the sorting is not correct unfortunately. The way I am attempting to sort it is by 1st letter, then 2nd letter etc, regardless of how many letters are in each entire word For example, the words `bb, b, bc, c, abcd, bcd` would be sorted as `abcd, b, bb, bc, bcd, c`. So the sentence `the big black cat sat on the beautiful brown mat` would be sorted as `beautiful big black brown cat mat on sat the the`. Your algorithm sorts it by length of word and then lexographically as `on cat mat sat the the big black brown beautiful` – KOB Apr 08 '16 at 15:31
0

Throughout all my attempts, I had been sorting the words by most significant letter first (1st letter of each word), then the next significant, and so on. Of course, Radix sort relies on sorting the least significant digit/letter (the last digit/letter of the number/word). So, instead of iterating through my outer for loop starting by focusing on letterNumber = 1 and incrementing this after each iteration, I instead began with letterNumber = maxWordLength, and then decremented this after each iteration, so that each iteration compares the next most significant letter.

@SuppressWarnings("unchecked")
public static void stringRadixSort(List<String> list) {
    List<String>[] buckets = (List<String>[]) Array.newInstance(List.class, 27);

    //Find longest word in list
    int maxWordLength = 0;
    for (String word : list) {
        if (word.length() > maxWordLength) {
            maxWordLength = word.length();
        }
    }

    //Sorts list based on least significant letter (last letter of word) to most significant
    int letterNumber = maxWordLength;
    for (int i = 0; i < maxWordLength; i++) {
        while (!list.isEmpty()) {
            String word = list.remove(list.first());
            int index = 0;
            if(word.length() >= letterNumber) {
                char ch = word.charAt(letterNumber - 1);
                index = ch - 'a' + 1;    //gets index of each letter ('a' = buckets[1], 'z' = buckets[26], buckets[0] is for words shorter than 'letterNumber')
            }
            if (buckets[index] == null) {
                buckets[index] = new LinkedList<String>();
            }
            buckets[index].insertLast(word);
        }

        for (int j = 0; j < buckets.length; j++) {
            if (buckets[j] != null) {
                while (!buckets[j].isEmpty()) {
                    list.insertLast(buckets[j].remove(buckets[j].first()));
                }
            }
        }
        letterNumber--;
    }
}
KOB
  • 3,062
  • 1
  • 24
  • 60