3

Given an alphabet ["a"; "b"; "c"] I want to dump all sequences of length 25 to a file. (Letters can repeat in a sequence; it's not a permutation.) The problem is, I get a Stack overflow during evaluation (looping recursion?) when I try using the following code:

let addAlphabetToPrefix alphabet prefix =
  List.map (function letter -> (prefix ^ letter)) alphabet;;

let rec generateWords alphabet counter words =
  if counter > 25 then
    words
  else
    let newWords = List.flatten(List.map (function word -> addAlphabetToPrefix alphabet word) words) in 
    generateWords alphabet (counter + 1) newWords;;

generateWords ["a"; "b"; "c"] 0 [""];; (* Produces a stack overflow. *)

Is there a better way of doing this? I was thinking of generating the entire list first, and then dumping the entire list to a file, but do I have to repeatedly generate partials lists and then dump? Would making something lazy help?

Why exactly is a stack overflow occurring? AFAICT, my generateWords function is tail-recursive. Is the problem that the words list I'm generating is getting too big to fit into memory?

grautur
  • 27,957
  • 33
  • 90
  • 125
  • Does ocaml optimize tail recursion? –  Mar 30 '11 at 23:38
  • @Jeff: Interesting! Really I have no clue what ocaml is about. Just that, the languages I know don't seem to bother trying to optimize tail recursion :-) –  Mar 30 '11 at 23:51
  • You should `#trace` the `generateWords` function to see if it sheds some light on the subject. I think it's just that you are generating a huge list (potentially 26^25 words) here so naturally you will run out of memory. Add that you are generating this recursively, there will be many intermediate results. – Jeff Mercado Mar 30 '11 at 23:53

2 Answers2

6

Your functions are being compiled as tailcalls. I confirmed from the linearized code; obtained from the -dlinear option in the native compiler, ocamlopt[.opt].

The fact of the matter is, your heap is growing exponentially, and 25 words is unsustainable in this method. Trying with 11 works fine (and is the highest I could deal with).

Yes, there is a better way to do this. You can generate the combinations by looking up the index of the combination in lexicographical order or using grey codes (same page). These would only require storage for one word, can be run in parallel, and will never cause a segmentation fault --you might overflow the using the index method though, in which case you can switch to the big integers but will sacrifice speed, or grey codes (which may be difficult to parallelize, depending on the grey code).

Community
  • 1
  • 1
nlucaroni
  • 45,818
  • 5
  • 58
  • 85
6

OCaml optimizes tail recursion, so your code should work, except: the standard library's List.map function is, unfortunately, not tail-recursive. The stack overflow is potentially occurring in one of those calls, as your lists get rather large.

Batteries Included and Jane Street's Core library both provide tail-recursive versions of map. Try one of those and see if it fixes the problem.

Michael Ekstrand
  • 26,173
  • 8
  • 54
  • 88
  • Shit; I forgot about that. I was only looking at his declared functions for tail-recursion. s/he will still have to deal with memory problems with word lengths of that size though. – nlucaroni Mar 31 '11 at 01:16
  • 2
    In this context, `List.rev_map` can also be used (instead of changing `List.map` implementation). – gasche Mar 31 '11 at 07:27