`pypdf`’s `merge_scaled_page` leads to blank pages if content is complex

I am using a combination of reportlab and pypdf to add page numbers to a list of pdfs.

  1. Firstly I use pypdfPdfMerger to concatenate many pdfs.
  2. Then reportlab makes a pdf of numbered but otherwise blank pages
  3. I then use pypdfmerge_scaled_page to combine the numbers to the concatenated pdf.

This third step is not reporting errors, but it is producing bad output intermittently.

Here is the business-part of the code

with open(tmp, "rb") as ftmp:
            number_pdf = PdfReader(ftmp)
            # iterate pages
            for p in range(n):
                page = reader.pages[p]
                number_layer = number_pdf.pages[p]
                page.merge_scaled_page(number_layer,scale=1.0,over=True)
                writer.add_page(page)

Now, 90% of the time this works fine. But sometimes pages in the finished document come out blank. It is not random: I can usually predict which pages will fail, namely the pages with complicated content. For example, a microsoft word doc with many reviewing markups, or a combination of many detailed Excel charts on the one page. Apart from this general feature, I cannot see any difference between the pages that work and the ones that don’t. I have tried variations on the merge_scaled_page to see if this would help, but it doesn’t.

Currently my work-around is just to skip the step for page.merge_scaled_page if the page is complicated; but this is very manual and not satisfactory.

Any ideas?
 

Leave a Comment