How do I implement `cat` in Haskell?

Question

I am trying to write a simple cat program in Haskell. I would like to take multiple filenames as arguments, and write each file sequentially to STDOUT, but my program only prints one file and exits.

What do I need to do to make my code print every file, not just the first one passed in?

import Control.Monad as Monad
import System.Exit
import System.IO as IO
import System.Environment as Env

main :: IO ()
main = do
    -- Get the command line arguments
    args <- Env.getArgs

    -- If we have arguments, read them as files and output them
    if (length args > 0) then catFileArray args

    -- Otherwise, output stdin to stdout
    else catHandle stdin

catFileArray :: [FilePath] -> IO ()
catFileArray files = do
    putStrLn $ "==> Number of files: " ++ (show $ length files)
    -- run `catFile` for each file passed in
    Monad.forM_ files catFile

catFile :: FilePath -> IO ()
catFile f = do
    putStrLn ("==> " ++ f)
    handle <- openFile f ReadMode
    catHandle handle

catHandle :: Handle -> IO ()
catHandle h = Monad.forever $ do
    eof <- IO.hIsEOF h
    if eof then do
        hClose h
        exitWith ExitSuccess
    else
        hGetLine h >>= putStrLn

I am running the code like this:

runghc cat.hs file1 file2

score 18 · Accepted Answer · answered Jul 13 '12 at 17:21

18

Your problem is that exitWith terminates the whole program. So, you cannot really use forever to loop through the file, because obviously you don't want to run the function "forever", just until the end of the file. You can rewrite catHandle like this

catHandle :: Handle -> IO ()
catHandle h = do
    eof <- IO.hIsEOF h
    if eof then do
        hClose h
     else
        hGetLine h >>= putStrLn
        catHandle h

I.e. if we haven't reached EOF, we recurse and read another line.

However, this whole approach is overly complicated. You can write cat simply as

main = do
    files <- getArgs
    forM_ files $ \filename -> do
        contents <- readFile filename
        putStr contents

Because of lazy i/o, the whole file contents are not actually loaded into memory, but streamed into stdout.

If you are comfortable with the operators from Control.Monad, the whole program can be shortened down to

main = getArgs >>= mapM_ (readFile >=> putStr)

answered Jul 13 '12 at 17:21

shang

23,876
3
55
83

I switched the accepted answer to yours because you fixed my bug and also explained the lazy IO streaming. – Sam Jul 13 '12 at 17:38
"kleisli composition". I don't know any better (shorter) name for it. – shang Jul 13 '12 at 17:50
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Monad.html#v:-62--61--62- – shang Jul 13 '12 at 17:53
5

thanks, good to know the actual name so I can read up on it. I will probably just keep calling it "leftfish" and "rightfish" in my head. – Sam Jul 13 '12 at 18:01
http://www.haskell.org/haskellwiki/Pronunciation doesn't have `>=>` -- considering that Sam's comment has 3 upvotes, is that enough to justify adding it to the page as "rightfish"? – MatrixFrog Jul 14 '12 at 07:02

Luis Casillas · Answer 2 · 2012-07-13T22:34:34.047

If you install the very helpful conduit package, you can do it this way:

module Main where

import Control.Monad
import Data.Conduit
import Data.Conduit.Binary
import System.Environment
import System.IO

main :: IO ()
main = do files <- getArgs
          forM_ files $ \filename -> do
            runResourceT $ sourceFile filename $$ sinkHandle stdout

This looks similar to shang's suggested simple solution, but using conduits and ByteString instead of lazy I/O and String. Both of those are good things to learn to avoid: lazy I/O frees resources at unpredictable times; String has a lot of memory overhead.

Note that ByteString is intended to represent binary data, not text. In this case we're just treating the files as uninterpreted sequences of bytes, so ByteString is fine to use. If OTOH we were processing the file as text—counting characters, parsing, etc—we'd want to use Data.Text.

EDIT: You can also write it like this:

main :: IO ()
main = getArgs >>= catFiles

type Filename = String

catFiles :: [Filename] -> IO ()
catFiles files = runResourceT $ mapM_ sourceFile files $$ sinkHandle stdout

In the original, sourceFile filename creates a Source that reads from the named file; and we use forM_ on the outside to loop over each argument and run the ResourceT computation over each filename.

However in Conduit you can use monadic >> to concatenate sources; source1 >> source2 is a source that produces the elements of source1 until it's done, then produces those of source2. So in this second example, mapM_ sourceFile files is equivalent to sourceFile file0 >> ... >> sourceFile filen—a Source that concatenates all of the sources.

EDIT 2: And following Dan Burton's suggestion in the comment to this answer:

module Main where

import Control.Monad
import Control.Monad.IO.Class
import Data.ByteString
import Data.Conduit
import Data.Conduit.Binary
import System.Environment
import System.IO

main :: IO ()
main = runResourceT $ sourceArgs $= readFileConduit $$ sinkHandle stdout

-- | A Source that generates the result of getArgs.
sourceArgs :: MonadIO m => Source m String
sourceArgs = do args <- liftIO getArgs
                forM_ args yield

type Filename = String          

-- | A Conduit that takes filenames as input and produces the concatenated 
-- file contents as output.
readFileConduit :: MonadResource m => Conduit Filename m ByteString
readFileConduit = awaitForever sourceFile

In English, sourceArgs $= readFileConduit is a source that produces the contents of the files named by the command line arguments.

+1 an excellent testament to the simplicity and elegance that `conduit` has achieved. I wonder if a `getArgs`-esque Source would be of use. Then you could write `runResourceT $ sourceArgs $= readFileConduit $$ sinkHandle stdout` where `sourceArgs :: MonadIO m => Source m String` and `readFileConduit :: MonadResource m => Conduit FileName m ByteString` — Dan Burton, Jul 13 '12 at 21:49
@DanBurton: I'm still learning conduits, so I decided to try my hand at it—and succeeded within 10 minutes. I'll edit the response to add that version. — Luis Casillas, Jul 13 '12 at 22:22
This is not technically the answer to my question, but it's so informative that would consider it "required reading" for anyone with similar questions about Haskell. — Sam, Jul 13 '12 at 23:06
@sacundim, would you mind adding a explanation of `sourceArgs` in the same way that you did for `catFiles`? Why do you have to do `liftIO`? What is `yield`? What package is `awaitForever` in? I couldn't compile the **EDIT 2** code. Thanks in advance. — Sam, Jul 13 '12 at 23:26
@Sam the truth of the matter is that the `conduit` library is a mere 7 months old, and has undergone several significant revisions since inception. As such, I wouldn't consider it "required reading" for Haskell newbs quite yet (especially since it might undergo even more significant changes), although `conduit` certainly does seem to have a bright future. Regarding yield and awaitForever, see [Data.Conduit](http://hackage.haskell.org/packages/archive/conduit/latest/doc/html/Data-Conduit.html#v:awaitForever). Fair warning: the `conduit` type signatures can be quite daunting for a newcomer. — Dan Burton, Jul 14 '12 at 22:29
@DanBurton oh okay for some reason I was under the impression that Conduit was the "proper" way of doing it. Thanks. — Sam, Jul 16 '12 at 01:19

score 5 · Answer 3 · answered Jul 13 '12 at 17:16

catHandle, which is indirectly called from catFileArray, calls exitWith when it reaches the end of the first file. This terminates the program, and further files aren't read anymore.

You should instead just return normally from the catHandle function when the end of the file has been reached. This probably means you shouldn't do the reading forever.

score 4 · Answer 4 · answered Jul 13 '12 at 17:29

My first idea is this:

import System.Environment
import System.IO
import Control.Monad
main = getArgs >>= mapM_ (\name -> readFile name >>= putStr)

It doesn't really fail in unix-y way, and doesn't do stdin nor multibyte stuff, but it is "way more haskell" so I just wanted to share that. Hope it helps.

On the other hand, I guess it should handle large files easily without filling up memory, thanks to the fact that putStr can already empty the string during file reading.

How do I implement `cat` in Haskell?

4 Answers4

Linked