"Old collections" of patent data in Patstat (bef. 1930)

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

DaHeller
Posts: 3
Joined: Mon Sep 04, 2017 5:42 pm

"Old collections" of patent data in Patstat (bef. 1930)

Post by DaHeller » Mon Sep 04, 2017 7:34 pm

Hey everyone!

I am using the Autumn 2016 version of Patstat. For a new project, I am currently considering utilizing information on "historic" patents. Therefore, I would like to only consider those patents that were filed before 1930. Also, I only focus on patents filed in Germany.
Now I compared aggregate statistics (sources: German Patent Office / DPMA and Federico (1964) "Historical Patent Statistics, 1791-1961") with data that I obtained from Patstat. I realized that the annual number of patents in Patstat for Germany between 1877 (when the German Empire's Patent Office was first founded) and 1915 was less than 10% of the actual number of patents both filed or granted. For instance, during the period of 1880-1890 the average annual number of patents granted in Germany was actually between 4,000-5,000, whereas there are just around 300-400 patents recorded in Patstats.

Does anyone happen to know, whether there is there any systematic mis-recording of patenting activities in the very early part of Patstat? Is it generally possible to utilize Patstat as a source of historic patent information? Since I want to combine patent data with corporate data of that time, this loss of observations is generally ok, as long as there is no specific selection bias.

Also, I found it very hard to find any further information on the coverage of the late 19th, respectively early 20th century. The only information in the that I found (in the users manual of the Autumn 2016 version) was the following: "Most technical priorities are from FR, US, GB and DE applications, where large old collections, also from before 1900, exist." Does anyone have more information on documentations about these "old collections"? Or does anyone have any suggestions on where to look for further information.

Any kind of information regarding this matter is highly appreciated :)

Thank you for your time!!

DH


Geert Boedt
Posts: 176
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: "Old collections" of patent data in Patstat (bef. 1930)

Post by Geert Boedt » Tue Sep 05, 2017 6:14 pm

Hello DaHeller,

Generally speaking the EPO includes all bibliographical data made available by the national authorities into PATSTAT. (At least the main bibliographical data, some specific attributes are not captured.)

But the bibliographic information for patents before 1920 is very fragmented. For some countries it might be a bit more complete (eg for GB,US), but it is hard to make any recommendation.

Apart from that, we do not have information about which parts of the national collections we do not have. So we don't know exactly what we don't have.
Definitely the national offices would be a better source of information (for instance INPI France has a service to provide historical patent documents https://www.inpi.fr/fr/base-brevets-du-19eme-siecle ).
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


DaHeller
Posts: 3
Joined: Mon Sep 04, 2017 5:42 pm

Re: "Old collections" of patent data in Patstat (bef. 1930)

Post by DaHeller » Tue Sep 26, 2017 3:09 pm

Dear community and specifically dear Geert,

thank you very much for your response!

In the meantime, I have taken a closer look into the characteristics of the patents filed before 1930. It became apparent that the selection bias must be random with regard to several characteristics (e.g. technology field, sole-inventor vs. company, fraction of granted patents, etc.).
However, one main difference seems to play a crucial role, when comparing those old patent information. Before and until 1910, virtually all patents included in PATSTAT whose application authority is Germany, are of appln_kind "A". In other words, with very few exceptions (on average <2 appl./year) there are no appln_kind "D" entries during this period. On the contrary, after 1910 (particularly after 1918) the fraction of "D"-kind applications jumps, while the absolute number of "A"-kind applications remains rather constant until 1930. During the period 1910-1930, a fraction of >98 percent of applications is appln_kind=="D".

Correct me if I am wrong: While "A" applications indicate patent applications, "D"-kind applications mark dummies for de-duplicating. However, I do not really understand what "de-duplicating" means in this context? Are there no "real" applications available?
If de-duplicating would imply an artificial remanufacturing of patent data, the systematic differences in the fraction of patents covered during the pre-WW1 and post-WW1 periods for Germany could potentially be explained by a systematic difference in the remanufacturing, correct? Is there any information on the de-duplicating process of PATSTAT concerning older entries?

Thank you once again - I truly appreciate your support!!

Best regards,
David


Geert Boedt
Posts: 176
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: "Old collections" of patent data in Patstat (bef. 1930)

Post by Geert Boedt » Fri Sep 29, 2017 8:31 am

Hello David,
PATSTAT is not a historic database per se – the data coverage of older entries cannot be guaranteed to be complete – there are gaps in the coverage – bibliographic data may is missing.
I am not sure whether PATSTAT is suited for what you want to do – even when considering all of the below, the out-come will never be a completely reliable representation.

Older documents were not available in electronic form and have been OCR'ed. As a result not all data is coded, and often the application date is not picked-up. For the EPO as such, this is of no major importance; we know that the documents are old (for the prior art searches).

Those applications (in the period from before 1977) may have a date that is all zeroes.


Image

In running queries on very old applications in general and DE applications in particular the following should be considered:
  • the application-date may be all zeroes
  • the application-number may not be the genuine application-number – it may be derived from the publication-number and have a letter ‘D’ in the last position
  • the application kind-code does not have to be ‘A’ – it can also be ‘B’ or ‘C’ or ‘D’ – in all of these cases the application is very likely a patent-application
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


Post Reply