6

I'm still learning Haskell and I wrote following radix sort function. It seems to work correctly, but the problem is that it is rather memory inefficient. If compiled with ghc, the memory goes highly over 500MB already with input list of size 10000 elements.

So I want to ask you how could the following algorithm/code improved to make it more efficient in terms of speed and memory. What is the best place to start?

import System.Random

-- radixsort for positive integers. uses 10 buckets
radixsort :: [Int] -> [Int]
radixsort [] = []
radixsort xs =
    -- given the data, get the number of passes that are required for sorting
    -- the largest integer
    let maxPos = floor ((log (fromIntegral (foldl max 0 xs)) / log 10) + 1)

        -- start sorting from digit on position 0 (lowest position) to position 'maxPos'
        radixsort' ys pos
         | pos < 0   = ys
         | otherwise = let sortedYs   = radixsort' ys (pos - 1)
                           newBuckets = radixsort'' sortedYs [[] | i <- [1..10]] pos
                       in  [element | bucket <- newBuckets, element <- bucket]

        -- given empty buckets, digit position and list, sort the values into
        -- buckets
        radixsort'' []     buckets _   = buckets
        radixsort'' (y:ys) buckets pos =
            let digit = div (mod y (10 ^ (pos + 1))) (10 ^ pos)
                (bucketsBegin, bucketsEnd) = splitAt digit buckets
                bucket = head bucketsEnd
                newBucket = bucket ++ [y]
            in radixsort'' ys (bucketsBegin ++ [newBucket] ++ (tail bucketsEnd)) pos
    in radixsort' xs maxPos

-- get an random array given an seed
getRandIntArray :: Int -> [Int] 
getRandIntArray seed = (randomRs (0, div (maxBound :: Int) 2) (mkStdGen seed))

main = do
        value <- (\x -> return x ) (length (radixsort (take 10000 (getRandIntArray 0))))
        print value
Timo
  • 4,458
  • 3
  • 32
  • 37
  • 1
    Have you considered using arrays from the IO monad? – Gabe Mar 11 '11 at 20:43
  • Thanks, I'll definitely check out other data types as soon as I feel more comfortable with Haskell basics. – Timo Mar 11 '11 at 21:56
  • In `maxPos` you should use `foldl'` instead of `foldl`. Also, isn't `floor (x + 1)` better expressed as `ceiling x`? – Dan Burton Mar 12 '11 at 06:34
  • You might consider benchmarking against existing radix-sort on Haskell arrays: http://hackage.haskell.org/packages/archive/vector-algorithms/0.4/doc/html/Data-Vector-Algorithms-Radix.html – Don Stewart Mar 12 '11 at 17:51
  • @Gabe ST is better, since it is quasipure. – PyRulez Mar 09 '16 at 00:13

1 Answers1

7

The main problem is your function radixsort'', because ++ is O(n) and it copies each time the list given as the first argument.

pack (-1) r' _ = r'
pack n  r' relems =
    let getn = (map snd) . (filter ((n==) . fst))
    in pack (n - 1) ((getn relems):r') relems
radixsort'' elems pos = 
    let digit = \y -> div (mod y (10 ^ (pos + 1))) (10 ^ pos)
        relems = zip (map digit elems) elems
    in pack 9 [] relems
Kru
  • 4,002
  • 22
  • 29
  • Thanks for pointing that out and providing your own solution. I really like how cleverly you managed to sort the values into buckets. I've got a lot to learn :) – Timo Mar 11 '11 at 22:08
  • 2
    `++` is *O(n)*, where `n` is the length of the first list. Using it can lead to *O(n^2)* algorithms where you expect *O(n)* ones. – luqui Mar 12 '11 at 05:06
  • Yes, sorry I've written too fast. In the question however it is used several times which lead to a quadratic algorithm. (fixed) – Kru Mar 12 '11 at 13:55