0

I’m working on a plugin to parse all posts and gather them into a JSON file to be consumed by a search mechanism. How can I access just the text of the post, with no markup? I’m currently accessing site.posts, then e.g. page.content in loops. This returns the content of the post, but includes newline markers (\n) and Markdown syntax.

I saw another question in which someone wanted to get Markdown processed content in a Jekyll tag plugin, but my case is different: I don't want any markup at all, just the plain text of the post, with no formatting applied.

Below is the key def from my current implementation.

def generate(site)
  target = File.open('js/searchcontent.js', 'w')
  target.truncate(target.size)
  target.puts('var tipuesearch = {"pages": [')

  all_but_last, last = site.posts[0..-2], site.posts.last

  # Process all posts but the last one
  all_but_last.each do |page|
    tp_page = TipuePage.new(
      page.data['title'],
      "#{page.data['tags']} #{page.data['categories']}",
      page.url,
      page.content
    )
    target.puts(tp_page.to_json + ',')
  end

  # Do the last post
  tp_page = TipuePage.new(
    last.data['title'],
    "#{last.data['tags']} #{last.data['categories']}",
    last.url,
    last.content
  )
  target.puts(tp_page.to_json)

  target.puts(']};')
  target.close
end
Tohuw
  • 3,122
  • 5
  • 20
  • 24

1 Answers1

1

Maybe you can try this :

{{ page.content | strip_html | strip_newlines }}

Edit obviously I misunderstood you question.

But you can use Liquid filters with include Liquid::StandardFilters

You can then use strip_html and strip_newlines in your plugin.

David Jacquel
  • 46,880
  • 4
  • 106
  • 132
  • Those are Liquid filters, which are not accessible from a plugin Ruby script. I need to do this entirely within this plugin file. – Tohuw Feb 23 '15 at 14:32