I'm currently trying to read the contents of an XML file into a Map Int (Map Int String)
and it works quite well (using HaXml). However, I'm not satisfied with the memory consumption of my program and the problems seems to be the garbage collection.
Here's the code I'm using to read the XML file:
type TextFile = Map Int (Map Int String)
buildTextFile :: String -> IO TextFile
buildTextFile filename = do content <- readFile filename
let doc = xmlParse filename content
con = docContent (posInNewCxt filename Nothing) doc
return $ buildTF con
My guess is that content
is held in memory even after the return, although it doesn't need to be (of course it could also be doc
or con
). I come to this conclusion because the memory consumption rises quickly with very large XML files, although the resulting TextFile
is only a singleton map of a singleton map (using a special testing file, generally it's different, of course). So in the end, I have a Map
of a Map Int String
, with only one string in it, but the memory consumption is up to 19 MB.
Using strict application ($!
) or using Data.Text
instead of String
in TextFile
doesn't change anything.
So my question is: Is there some way to tell the compiler that the string content
(or doc
or con
) isn't needed anymore and that it can be garbage collected?
And more generally: How can I find out where the problem really comes from without all the guessing?
Edit: As FUZxxl suggested I tried using deepseq and changed the second line of buildTextFile
like so:
let doc = content `deepseq` xmlParse filename content
Unfortunately that didn't change anything really (or am I using it wrong?)...