Post by r.vdiemen » Thu Jan 07, 2016 12:53 pm

Dear All,

I'm using the PATSTAT raw data, and am trying to discover which company has received the most patent citations for a specific CPC class. I've narrowed down the data to a list of application ID's relevant to the CPC class, so that I don't need to include that part in the coding every time. I've generated a list of company names and number of citations received, but I could use some advice about whether this is the correct way to approach the question.

I tried a rather simple (but long-winded) approach. I adapted Query 8 provided in the paper written by Rassenfosse, Dernis and Boedt ('an introduction to the Patstat database with example queries'). I'm using MS Access. This generates a list of application id's, and the number of times that application has been cited.

I then linked the results to the table containing person information (tls206) through tls 207, which generates a list of application id's, the amount of times they've been cited and the doc_std_name. Finally, I exported this to Excel and created a pivot table to add up all the citations received by each company/individual.

The problem is that I think this might be double counting the data. For example, application 274358461 has been cited 15 times. This application has two applicants. The method I'm currently using will allocate 15 citations to both of the applicants. Would it make more sense to use a fractional count method?

The second method I tried was through family-family citations through the DOCDB family citations provided in PATSTAT. However, again, how do you allocate a weight to the applicants to determine citations per company?

Thanks for your help in advance!
Kind regards,

Re: Citations per company

Post by nico.rasters » Sat Jan 16, 2016 12:55 pm

As you did, I would also start by finding all application IDs that belong to your specific CPC class.
For each application ID, you can assign a docdb family ID.
Then I would create a citation table, and again assign docdb family ID to the citing applications.
This will allow you to exclude self-citations and it will let you count family-family citations.

How to identify companies? You could use doc_std_name as you are doing, or go for an external database such as the EEE-PPAT from KU Leuven, or the OECD HAN database. Keep in mind that these databases are always released a few months after PATSTAT, so they may not be available yet for the most current release.

I do not think there is a double counting issue when there are two applicants. Or in other words, a fractional count is not required here.
BUT... will you take corporate ownership into account? What if there are two applicants, one being the ultimate parent and the other being a subsidiary? Corporate ownership would also affect self-citations btw. The problem is that it is very time consuming to collect this data, so one option is to turn it into a footnote.
Re: Citations per company

Post by r.vdiemen » Wed Feb 17, 2016 6:43 pm

Thank you Nico! You make a good point about corporate ownership, I'll think about how to incorporate this in the research.

Re: Citations per company

Post by Geert Boedt » Thu Feb 18, 2016 10:39 am

Hello Renée,
I agree with Nico; fractional counting is not necessary.
When doing citation analysis, one should always keep in mind the purpose of the indicators you are creating, and the story you want to support or disprove. You also have to keep in mind what citations are. In non-patent language: citations are publications cited (by examiners, applicants or opponents), to be considered relevant for the granting (or revocation) of the patent. It gives a picture of the relevant "prior art". Researcher then go an extra step by claiming that: if a patent A cites patent B, that means that A is (might be) "building on" technology described in patent B. And if many patents cite that patent B, then patent B must be an "important, valuable, essential, ground breaking,....." patent. This is the philosophy behind this kind of analysis (at least one of them). A next step is then to make this data less granular by looking at the owners of patents B (or A), or the classification codes used for A and B.
Assume that company X and Y are co-applicants for patent B, that has been cited 100 times. Company Z is applicant for patent C that has also been cited 100 times. Is the "value" of patent B less then patent C ? In fractional counting you would give X an Y each 50 citations, and company Z would have 100. I don't think this is correct. In fact, many researchers use the "number of applicants" as a an indicator of value; the more, the better. With regards to "double counting", what you might have to keep in mind is the "family concepts" when doing citation analysis. For more information on this topic -and much more on patent statistics-, I strongly recommend to consult the "OECD Patent Statistics Manual".
