Why is the size of some PDF files bigger than expected?

Post Reply

Patent Information Marketing
Posts: 358
Joined: Fri Mar 03, 2017 1:16 pm

Why is the size of some PDF files bigger than expected?

Post by Patent Information Marketing » Tue Jun 22, 2021 10:45 am

Users of Espacenet worldwide can download "original documents" in PDF format for more than 105 countries.

The size of a PDF file is sometimes bigger than the original document publication made available by the patent office or the file downloaded from classic Espacenet. The reason for this is explained below.

Patent document facsimiles are delivered by patent offices to the EPO in different formats (e.g. PDF, ST33, HTML-TIFF and XML-TIFF). The files are stored by the EPO in our prior-art facsimile repositories. These facsimiles are converted and assembled page by page in two standard formats, which are used to display original documents in EPO patent search tools depending on compatibility as follows:
  • a greyscale TIFF format, which is displayed by OPS, classic Espacenet and Global Patent Index (GPI). When using the download feature of classic Espacenet and GPI, this format is converted on the fly to PDF
  • a higher-quality PNG format, sometimes with colour, which is displayed by new Espacenet. However, depending on the incoming format and quality, the file size can be between 1.9 and 8 times the size of the basic black-and-white version. When the download feature of new Espacenet is used, a PDF file constructed from PNG is delivered and may therefore be much bigger than the original document publication provided by the patent office


While the "original document" downloads in Espacenet do not correspond to the "Original File Publications" as produced by the original publishing patent offices, they are faithful representations of the original publications, going back to the publication of 1623 of Robert Mansell (GB162300024A), when electronic formats clearly did not exist. For full legal certainty, we recommend using the official publication channels of the patent offices.

Kind regards,

Andrée Lahaye
Patent Information Marketing


jjgray
Posts: 4
Joined: Fri Oct 15, 2021 1:05 pm

Re: Why is the size of some PDF files bigger than expected?

Post by jjgray » Thu Feb 17, 2022 4:26 pm

As discussed elsewhere, this size explosion is something that should be resolved without delay.
  • Opening EP3944143A1 From EPO Register Plus we see modest file size 1.5MB (but with a meaningless filename “document.pdf” – we should discuss that too).
  • Opening the same file direct from the Publication Server we see the exact same file with a more meaningful filename (albeit with mystery “NW” is inserted before the A1).
  • Opening the same document in new Espacenet we lose the character coding but the file size explodes to 12.5 MB.
  • Opening the same document in Classic Espacenet we lose the character coding and the file size more than doubles to 3.3MB.

Double the file size for less information content is bad enough, but I strongly object to the explosion that we see in New Espacenet.

I am sure that you have colleagues who can report the number of such documents downloaded each year. If this one document is simply downloaded and emailed by one person to another one person, I would guess that there are inevitably 5+ copies of it stored through their systems, local and in the Cloud, not counting transient copies. With multiple recipients in Cc, the number of copies goes up and up. Aside from the inconvenience and cost to users, not to mention the issue of bandwidth inequality between different users, does the EPO have economists who can calculate the carbon footprint of shifting and storing so much data?

IMO this is a high priority issue.


Patent Information Marketing
Posts: 358
Joined: Fri Mar 03, 2017 1:16 pm

Re: Why is the size of some PDF files bigger than expected?

Post by Patent Information Marketing » Tue Feb 22, 2022 12:38 pm

Dear user,

Thank you for your comment. The technical team is aware of the issue and is working on it.
Kind regards

Patent Information Marketing


jjgray
Posts: 4
Joined: Fri Oct 15, 2021 1:05 pm

Re: Why is the size of some PDF files bigger than expected?

Post by jjgray » Thu Feb 15, 2024 5:45 pm

I would love to know if EPO has made any progress resolving this issue.

John


Post Reply