Concatenating PDF pages from OPS
Posted: Thu May 18, 2017 8:19 am
We are currently evaluation downloading PDF patent documents from OPS.
The OPS user manual states that OPS provides them on a page-per-page basis, an apparently the user must concatenate the individual pages at his end to one single running PDF. The EPO website indicates third party PDF packages for doing so. The PDF file specification is however not very strict. Concatenation may be done e.g. as an "incremental update" (as per the PDF ISO specification), on the other hand, many PDF printers seem to produce a "consolidated" concatenated PDF. The former "incrementally updated" concatenated PDF file will have several crossreference sections and trailers, whereas the latter "consolidated" concatenated PDF file will contain only one crossreference section and only one trailer. The ordering of indirect objects in the PDF file may be arbitrary. There are indications in the file as to the author of the file, the software that produces the concatenated file, a digital ID and a modification date. In view of the allowed variabilities in a PDF file, any concatenated PDF file for a given patent publication obtained from individual PDF pages from OPS will predictably differ from the single PDF file for the same patent publication obtained from Espacenet. This might incite a suspicion that the former has been "forged" with respect to the latter.
Is thus there a "recommended" way of concatenating said individual PDF pages from OPS into a single PDF file?
Many thanks.
The OPS user manual states that OPS provides them on a page-per-page basis, an apparently the user must concatenate the individual pages at his end to one single running PDF. The EPO website indicates third party PDF packages for doing so. The PDF file specification is however not very strict. Concatenation may be done e.g. as an "incremental update" (as per the PDF ISO specification), on the other hand, many PDF printers seem to produce a "consolidated" concatenated PDF. The former "incrementally updated" concatenated PDF file will have several crossreference sections and trailers, whereas the latter "consolidated" concatenated PDF file will contain only one crossreference section and only one trailer. The ordering of indirect objects in the PDF file may be arbitrary. There are indications in the file as to the author of the file, the software that produces the concatenated file, a digital ID and a modification date. In view of the allowed variabilities in a PDF file, any concatenated PDF file for a given patent publication obtained from individual PDF pages from OPS will predictably differ from the single PDF file for the same patent publication obtained from Espacenet. This might incite a suspicion that the former has been "forged" with respect to the latter.
Is thus there a "recommended" way of concatenating said individual PDF pages from OPS into a single PDF file?
Many thanks.