How to make LaTeX PDF output copy-and-pasteable and searchable

Monday, Nov 16, 2015

How to make LaTeX PDF output copy-and-pasteable, searchable and diffable

Have you ever tried to diff a PDF file generated by pdflatex? Have you ever tried to copy and paste from one? If so, your face probably looked as surprised as mine after you attempted it: it doesn’t work! The characters that make it to your clipboard are gibberish, even though the PDF looks entirely normal.

This same behaviour will bite you if you try to index or search those PDFs. Or if you try to diff them, for example if you manage them using git.

I can’t tell you what causes it, but I can tell you the solution.

How to generate better PDF files from LaTeX

There are two approaches. The first one worked better for me, but it apparently only works if you use T1 encoding (which is probably everybody this day and age). The solution is to add the line

\usepackage{cmap}

before all other packages.

(In case you’re wondering: to activate T1, add the line \usepackage[T1]{fontenc})

I found mention of another approach. This one didn’t work for me, but I’m including it here in case it’s more useful to you.

\input glyphtounicode

\pdfgentounicode=1

Now copy and paste, diff or all other uses of plaintext in PDF generated for LaTeX should work.

Join my daily Newsletter

How to make LaTeX PDF output copy-and-pasteable and searchable

How to make LaTeX PDF output copy-and-pasteable, searchable and diffable

How to generate better PDF files from LaTeX