How to use Pandoc filter within Hakyll?

Question

I am sorry to ask such a question. But I am really new to Haskell. I searched the Internet for a whole day but didn't find any example.

I have a pandoc filter written in python (tikzcd.py). I want to use that filter to process my blog posts.

I guess I need to use unixFilter or pandocCompileWithTransform but my knowledge to Haskell is really not enough to find a solution myself.

So, could someone provide me an example?

-----------U--P--D--A--T--E--S---------------

~~@Michael gives an solution using pandocCompileWithTransformM and unixFilter. It works. But there is a problem.~~

~~When using a filter from command line, what I will do is~~

pandoc -t json -READEROPTIONS input.markdown | ./filter.py | pandoc -f JSON -WRITEROPTIONS -o output.html

~~or equivalently~~
pandoc --filter ./filter.py -READEROPTIONS -WRITEROPTIONS -o html
~~This command is shorter but it doesn't show the procedures.~~

~~But with pandocCompilerTransformM, it does something like~~

pandoc -t html -READEROPTIONS -WRITEROPTIONS input.mardown | pandoc -t JSON | ./filter.py | pandoc -f JSON -WRITEROPTIONS -o output.html

The difference is that the text that passed to filter.py is different: one is the contents directly produced from markdown, while the other is some texts produced from HTML that was produced from markdown. As you know, to convert something back and forth will always produce unexpected problem. So I think there may be a better solution.

PS. I've stared to learn Haskell. I hope I could solve this problem myself someday. Thank you!

Michael · Accepted Answer · 2015-04-27T00:11:41.500

4

In the end I think you would use both. Using this https://github.com/listx/listx_blog/blob/master/blog.hs as a model, the following will have the same shape as transformer has in it. transformer is used on lines 69-80 for 'posts' -- that is as the third argument to pandocCompilerWithTransformM, which is a (Pandoc -> Compiler Pandoc) Here you would need to add the absolute path to your python filter -- or the name if it's in $PATH -- and reader and writer options (e.g. defaultHakyllReaderOptions and defaultHakyllWriterOptions)

import Text.Pandoc
import Hakyll

type Script = String 

transformer
  :: Script         -- e.g. "/absolute/path/filter.py"
  -> ReaderOptions  -- e.g.  defaultHakyllReaderOptions
  -> WriterOptions  -- e.g.  defaultHakyllWriterOptions
  -> (Pandoc -> Compiler Pandoc)
transformer script reader_opts writer_opts pandoc = 
    do let input_json = writeJSON writer_opts pandoc
       output_json <- unixFilter script [] input_json
       return $ 
          -- either (error.show) id $  -- this line needs to be uncommented atm.
          readJSON reader_opts output_json

similarly, (transformer "/usr/local/bin/myfilter.py" defaultHakyllReaderOptions defaultHakyllWriterOptions) might be used where (return . pandocTransform) is used, on line 125 of this example gist

For debugging you might outsource everything to unixFilter:

transform :: Script -> String -> Compiler String
transform script md = do json0 <- unixFilter pandoc input_args md
                         json1 <- unixFilter script [] json0
                         unixFilter pandoc output_args json1
 where
   pandoc = "pandoc"
   input_args = words "-f markdown -t json" -- add others
   output_args = words "-f json -t html"    -- add others

The three lines of the do block are the equivalent of the stages of unix piping in pandoc -t json | filter.py | pandoc -f json with whatever additional arguments.

I think maybe you are right there is an extra layer of pandoc back and forth here. The pandocCompilerWithTransform(M) functions are for a direct Pandoc-> Pandoc function - it will be applied to the Pandoc hakyll comes up with. I think we should dispense with this and use the Pandoc libraries directly. A use of unixCompile might be like this.

transformXLVI :: Script -> ReaderOptions -> WriterOptions -> String  -> Compiler Html
transformXLVI script ropts wopts = fmap fromJSON . unixFilter script [] . toJSON 
  where 
    toJSON   = writeJSON wopts 
    --           . either (error . show) id -- for pandoc > 1.14
               . readMarkdown ropts 
    fromJSON = writeHtml wopts
    --           . either (error . show) id
               . readJSON ropts

I hope the principles are emerging from these variations! This should be pretty much the same as the preceding transform; we are using the pandoc library in place of calls to the pandoc executable.

edited Apr 27 '15 at 00:11

answered Apr 26 '15 at 00:38

Michael

2,831
15
16

Thank you very much for your answer. But I still have a question. If I understood correctly, your solution is doing something like `pandoc | pandoc -t json | ./filter.py | pandoc -f json`Pandoc process twice (Markdown ---> HTML ---> JSON ---> JSON ---> HTML) before the filter, the result is a little bit different from that of `pandoc -t | ./filter.py | pandoc -f json` (Markdown ---> JSON --->JSON ---> HTML) by looking at the pandoc native output. However, I could modify my filter a bit to get my desired result. Thank you again. – Fang Hung-chien Apr 26 '15 at 09:50
Oh wait, above I was thinking you were using hakyll which will itself process these things using the pandoc haskell libraries, rather than using the `pandoc` executable. I was in any case imagining `md -> json -> json -> html` where python is doing the `json -> json` bit -- which is what the script does. If you are making html by hand with the `pandoc` executable then as you say just do `cat myfile.md | pandoc -t json | ./filter.py | pandoc -f json -t html -s` (using `-s` for a standalone html). – Michael Apr 26 '15 at 18:16
It is true that the hakyll process I was describing will convert the markdown to pandoc's 'native' representation, then go to json, then re-read the python-altered json into pandoc's native representation and then to html. However, the native haskell representation in the `Pandoc` type is barely different from the json your filter reads and writes. If you had a filter written in haskell of course it would be able to skip the json step which is for the sake of the python filter. – Michael Apr 26 '15 at 18:19
Sorry, I didn't understand what you said (maybe because I didn't understand how Hakyll deal with the markdown files). I am not using the `pandoc` executable. I am using Hakyll with pandoc libraries. I thought what you described is converting markdown to HTML (as `pandocCompiler` does). Then the `trasformer` reads HTML and write to JSON (`writeJSON`). After filtered by my filter, the JSON goes to the reader (`readJSON`). Finally, `pandocCompiler` converts the JSON to HTML. – Fang Hung-chien Apr 26 '15 at 18:34
The reason why I think so is my filter didn't work in this workflow. I am sure I am using correct pandoc extensions and options. – Fang Hung-chien Apr 26 '15 at 18:35
I see, are you doing all this on the command line? Or via a hakyll program written in haskell? – Michael Apr 26 '15 at 18:37
Hm hm. I am baffled why in the hakyll as I was imagining it there would seem to be an intermediate production of html. It may be that the double application of `readerOptions` or `writerOptions` is messing things up. That make make it appear that html is produced in the middle. – Michael Apr 26 '15 at 18:40
I will try to figure out why my filer does not work. At least you gave me some ideas on how to solve this. Thank you! – Fang Hung-chien Apr 26 '15 at 18:57
I am sorry for the puzzling updates. Now I find out what's going wrong. It's my fault that I din't use the same test input but wrote a new one. My filter is dealing with LaTeX math/environment (as indicated in my description of my question, I want to draw commutative diagram using `tikzcd`). But markdown has some restrictions how to write math: `\[ \begin{tikcd}` must be in the same line but `\[ \sum` need not be. Because of this mistake, I didn't get what I expected. – Fang Hung-chien Apr 26 '15 at 19:16
Thanks for your updates. It really helps. I hope your answer will help others who have similar problems as me. – Fang Hung-chien Apr 26 '15 at 19:20

How to use Pandoc filter within Hakyll?

1 Answers1