12

Consider an array with n numbers that has maximum k digits (See Edit). Consider the radix sort program from here:

def radixsort( aList ):
  RADIX = 10
  maxLength = False
  tmp, placement = -1, 1

  while not maxLength:
    maxLength = True
    # declare and initialize buckets
    buckets = [list() for _ in range( RADIX )]

    # split aList between lists
    for  i in aList:
      tmp = i / placement
      buckets[tmp % RADIX].append( i )
      if maxLength and tmp > 0:
        maxLength = False

    # empty lists into aList array
    a = 0
    for b in range( RADIX ):
      buck = buckets[b]
      for i in buck:
        aList[a] = i
        a += 1

    # move to next digit
    placement *= RADIX

The buckets basically is a 2d list of all the numbers. However, only n values will be added to it. How come the space complexity is O(k + n) and not O(n)? Correct me if I am wrong, even if we consider the space used to extract digits in a particular place, it is only using 1 (constant) memory space?

Edit: I would like to explain my understanding of k. Suppose I give an input of [12, 13, 65, 32, 789, 1, 3], the algorithm given in the link would go through 4 passes (of first while loop inside the function). Here k = 4, i.e. maximum no. of digits for any element in the array + 1. Thus k is no. of passes. This is the same k involved in time complexity of this algorithm: O(kn) which makes sense. I am not able to understand how it plays a role in space complexity: O(k + n).

user2314737
  • 21,279
  • 16
  • 81
  • 95
skr_robo
  • 770
  • 13
  • 31

3 Answers3

9

Radix sort's space complexity is bound to the sort it uses to sort each radix. In best case, that is counting sort.

Here is the pseudocode provided by CLRS for counting sort:

Counting-sort(A,B,k)
  let C[0..k] be a new array
  for i = 0 to k
      C[i] = o
  for j = 1 to A.length
      C[A[j]] = C[A[j]] + 1
  for i = 1 to k
      C[i] = C[i] + C[i-1]
  for j = A.length down to 1
      B[C[A[j]]] = A[j]
      C[A[j]] = C[A[j]] - 1 

As you can see, counting sort creates multiple arrays, one based on the size of K, and one based on the size of N. B is the output array which is size n. C is an auxiliary array of size k.

Because radix sort uses counting sort, counting sort's space complexity is the lower bound of radix sort's space complexity.

Jayson Boubin
  • 1,355
  • 2
  • 8
  • 17
  • This is just plain wrong. [Here](https://stackoverflow.com/questions/43587853/duplicate-removal/43588349#43588349), for example, you can find my implementation of in-place radix sort whose space complexity is `O(1)`. What\`s more, this answer doesn\`t actually answer the question. – hidefromkgb Jun 10 '17 at 21:56
  • If the algorithm takes a variable size input, it can't have O(1) space complexity. Space complexity is a measure of the amount of working storage an algorithm needs. If an algorithm operates over an array, it needs that array as storage. in-place does not mean O(1). – Jayson Boubin Jun 10 '17 at 22:00
  • Also, I'm simply quoting CLRS here. If you can prove those guys wrong then I think you'll gain a little more than stack overflow rep. – Jayson Boubin Jun 10 '17 at 22:01
  • Meh. I\`m wrong. *Extra* space complexity ≠ actual space complexity. I now get what you meant; +1. – hidefromkgb Jun 10 '17 at 22:07
  • Sorry for the delay in accepting answer. I understood this satisfactorily only now. – skr_robo Jul 01 '17 at 21:15
3

I think that there is a terminological issue. The space complexity of the question's implementation and implementation mentioned in the Jayson Boubin's answer is O(n+k). But k is not the length of the longest word (or longest number). k is a size of an 'alphabet': number of different digits (in numbers) or letters (in words).

buckets = [list() for _ in range( RADIX )]

This code creates an array with RADIX elements. In this particular implementation RADIX is a constant (and the space complexity is O(n)), but in general, it's a variable. RADIX is a k, the number of different digits (letters in the alphabet). And this k does not depend on n and can be larger than n in some cases, so the space complexity is O(n+k) in general.

Edit: In this implementation the size of placement (or tmp) is O(k) (with your definition of k), because k is log(maxNumber) base 10, and placement size is log(maxNumber) base 256. But I'm not sure this is a general case.

DAle
  • 8,309
  • 2
  • 24
  • 40
  • Yes. The k denotes number if digits. – skr_robo Jun 10 '17 at 23:17
  • I am not convinced that k is same as radix or its length. Radix would always be 10 . But k will be 3 if there is at least one three digit number and rest are 2 or 1 digit. – skr_robo Jun 10 '17 at 23:37
  • @skr_robo, that's what I'm talking about, In `O(n+k)` `k` is not the maximal number of digits in some number. – DAle Jun 10 '17 at 23:41
  • @DAIe If we input this array = [12, 13, 65, 32, 789, 1, 3], the above algorithm would run through 4 passes (because there is one 3 digit number). Now, the time complexity is O(kn) and this k is number of passes which is 4. My point is that O(k+n) also has the same k. I am not able understand why radix plays a part here? – skr_robo Jun 11 '17 at 01:13
  • I am wrong at calling `k` maximal number of digits. Its one more than that. – skr_robo Jun 11 '17 at 01:14
2

Radix sort uses Counting sort for each digit of numbers in the dataset. Counting sort has space complexity of O(n+k) where k is the largest number in the dataset.

Decimal digits range from 0 to 9 so if we sort 4 decimal numbers (11,22,88,99) using radix sort (counting sort used within radix sort), for each digit, it will create array of size b = 10 where b is the base.

It means that the total space used would be total digits * (n + base). If total digit are constant. The space complexity becomes O(n+base).

Hence the space complexity of Radix Sort is O(n+b).