Pandoc and PDF13 Dec 2017
I’m quite a big fan of Markdown. It’s easy to write and works nicely with Git, you can preview it in OSX and you can edit in any program you want (no Word or OpenOffice necessary). I prefer writing Markdown over Tex: I can copy from my blog to a document and the markup is less ‘finicky’.
So it should come as no surprise that I was very happy to discover Pandoc, a ‘universal document converter’ that can convert Markdown to just about any other format (like Epub, HTML, Word, LaTeX, PDF…) and back again.
I already used Pandoc for creating an EPUB book and several Word-documents. This all works great.
If that’s all you need, you can stop reading now.
The problem arose after I wanted to create a PDF document. Apparently this is Crazy Difficult. Pandoc supports several converters, but all have their own little problem. Here are my findings:
I couldn’t get the default LaTeX engine (
pdflatex) to work because of UTF-8 characters. Apparently the solution is to use XeLaTeX. I used the BasicTex package instead of downloading the full 2GB distribution. To make it work, I had to:
sudo tlmgr update --self sudo tlmgr install ucharcat sudo tlmgr install lm-math
After that magic incantation, I can create a PDF reasonably painlessly:
pandoc input.md -o output.pdf --css=style.css --pdf-engine=xelatex
The problem is it looks Tex-y: the
style.css I created gets discarded.
Pandoc creates EPUBs easily, so I thought I could convert this EPUB painlessly to PDF using Calibre, a tool I already have installed for my ebook-management.
pandoc input.md -o inbetween.epub -t epub --css=style.css /Applications/calibre.app/Contents/MacOS/ebook-convert inbetween.epub output.pdf --paper-size a4
This works and the endresult actually looks very nice. But there is no ‘orphan detection’ which makes for very weird single sentences on pages and tables get distributed over pages (they simply get cut in half, there’s no new header on the second page).
Another option I already had installed was
PhantomJS, a ‘headless browser’. With a simple
This just looked horrible all around.
Another option provided by Pandoc is
wkhtml2pdf. But this gave me a
Warning: Failed to load errors. In short, images and CSS are not loaded, which again made the result look horrible. Perhaps worth another shot later…
weasyprint is a Python based HTML->PDF renderer. It supports CSS and tables over multiple pages (👍), but also no ‘orphan detection’ (👎). By default the font renders smaller than in Calibre, but this is easily fixed in CSS.
Pandoc supports more PDF-engines, but I didn’t test these:
pdflatexalso don’t support CSS
pdroffseems to be mainly for manuals
There was no clear winner for Markdown->PDF. For now it looks like I need to continue my experimenting with Weasyprint (or try WKprint again).
Curious to hear if others have more success!