Join my daily Newsletter

Subscribe to get my latest content by email.

    I respect your privacy. Unsubscribe at any time.

    preloader

    How to make LaTeX PDF output copy-and-pasteable and searchable

    • Monday, Nov 16, 2015
    Blog Image

    How to make LaTeX PDF output copy-and-pasteable, searchable and diffable

    Have you ever tried to diff a PDF file generated by pdflatex? Have you ever tried to copy and paste from one? If so, your face probably looked as surprised as mine after you attempted it: it doesn’t work! The characters that make it to your clipboard are gibberish, even though the PDF looks entirely normal.

    This same behaviour will bite you if you try to index or search those PDFs. Or if you try to diff them, for example if you manage them using git.

    I can’t tell you what causes it, but I can tell you the solution.

    How to generate better PDF files from LaTeX

    There are two approaches. The first one worked better for me, but it apparently only works if you use T1 encoding (which is probably everybody this day and age). The solution is to add the line

    \usepackage{cmap}

    before all other packages.

    (In case you’re wondering: to activate T1, add the line \usepackage[T1]{fontenc})

    I found mention of another approach. This one didn’t work for me, but I’m including it here in case it’s more useful to you.

    \input glyphtounicode

    \pdfgentounicode=1

    Now copy and paste, diff or all other uses of plaintext in PDF generated for LaTeX should work.