Pandoc and PDF

ยท 499 words ยท 3 minute read

I’m quite a big fan of Markdown. It’s easy to write and works nicely with Git, you can preview it in OSX and you can edit in any program you want (no Word or OpenOffice necessary). I prefer writing Markdown over Tex: I can copy from my blog to a document and the markup is less ‘finicky’.

So it should come as no surprise that I was very happy to discover Pandoc, a ‘universal document converter’ that can convert Markdown to just about any other format (like Epub, HTML, Word, LaTeX, PDF…) and back again.

I already used Pandoc for creating an EPUB book and several Word-documents. This all works great.

If that’s all you need, you can stop reading now.

Creating PDF-documents ๐Ÿ”—

The problem arose after I wanted to create a PDF document. Apparently this is Crazy Difficult. Pandoc supports several converters, but all have their own little problem. Here are my findings:

Using XeLateX ๐Ÿ”—

I couldn’t get the default LaTeX engine (pdflatex) to work because of UTF-8 characters. Apparently the solution is to use XeLaTeX. I used the BasicTex package instead of downloading the full 2GB distribution. To make it work, I had to:

sudo tlmgr update --self
sudo tlmgr install ucharcat
sudo tlmgr install lm-math

After that magic incantation, I can create a PDF reasonably painlessly:

pandoc input.md -o output.pdf --css=style.css --pdf-engine=xelatex

The problem is it looks Tex-y: the style.css I created gets discarded.

Using Calibre ๐Ÿ”—

Pandoc creates EPUBs easily, so I thought I could convert this EPUB painlessly to PDF using Calibre, a tool I already have installed for my ebook-management.

pandoc input.md -o inbetween.epub -t epub --css=style.css
/Applications/calibre.app/Contents/MacOS/ebook-convert inbetween.epub output.pdf --paper-size a4

This works and the endresult actually looks very nice. But there is no ‘orphan detection’ which makes for very weird single sentences on pages and tables get distributed over pages (they simply get cut in half, there’s no new header on the second page).

Using PhantomJS ๐Ÿ”—

Another option I already had installed was PhantomJS, a ‘headless browser’. With a simple javascript you can convert HTML to PDF (using Pandoc to create the HTML and PhantomJS to create the PDF).

This just looked horrible all around.

WKprint ๐Ÿ”—

Another option provided by Pandoc is wkhtml2pdf. But this gave me a Warning: Failed to load errors. In short, images and CSS are not loaded, which again made the result look horrible. Perhaps worth another shot later…

weasyprint ๐Ÿ”—

weasyprint is a Python based HTML->PDF renderer. It supports CSS and tables over multiple pages (๐Ÿ‘), but also no ‘orphan detection’ (๐Ÿ‘Ž). By default the font renders smaller than in Calibre, but this is easily fixed in CSS.

Untested ๐Ÿ”—

Pandoc supports more PDF-engines, but I didn’t test these:

  • luatex and pdflatex also don’t support CSS
  • pdroff seems to be mainly for manuals
  • prince costs money…

Conclusion ๐Ÿ”—

There was no clear winner for Markdown->PDF. For now it looks like I need to continue my experimenting with Weasyprint (or try WKprint again).

Curious to hear if others have more success!