Post by urbanmat » Wed Dec 11, 2013 1:31 pm

Dear all,

I am currently researching the effect of interfirm collaboration on the innovative output of the individual firms. For this purpose I would like to look at the backward citation that are used in new patent applications and if these can be traced back to the specific partnership.
I am looking at the pharmaceutical industry and specifically at five companies: GSK, Pfizer, Sanofi, Novartis and Novo Nordisk. For each of the companies I have now collected a list of partnership that I want to link to a patent output. An example would be a partnership between Novo Nordisk and ZymoGenetics, where the former cited the latter in a subsequent patent application (taking into account time- lag between the creation of a partnership and observable "innovation").

However, as I do not have any experience with PATSTAT and neither coding or programming, I find it rather difficult to create a query that would allow me to access the required data. Would someone be so kind and help me with this issue, or even advise me if that kind of data can be retrieved in an analysable form through PATSTAT?

All help and support it greatly appreciated!



Post by nico.rasters » Wed Dec 11, 2013 9:28 pm

Which version of PATSTAT are you using? And what software do you use for the data analysis? (not relevant for this issue, just asking out of curiosity).

You have five focal firms: GSK, Pfizer, etc.
Each of those focals have partnerships, so there's also a list of partners. The collection of partners probably include focals as well.

You will need to retrieve patents for all firms (focals and partners). You may have additional criteria, such as a time range, geographical coverage, or "granted patents only"... but the basis always starts with finding your firm in PATSTAT.

Here you have the choice between the PERSON table (which contains raw data), the DOC_STD_NMS table (standardized names), or an external database such as EEE-PPAT or OECD HAN.
In the PERSON table you may find Pfiser instead of Pfizer, but also Pfizer Inc., Pfizer Corp., Pfizer USA, etc. In DOC_STD_NMS, EEE-PPAT, and OECD HAN these names tend to be grouped together.

You also need to decide whether you take ownership structures into account or not. This falls outside the scope of PATSTAT but will have an impact on your data. Partnerships can take place on any level (if you are looking at alliances this could be the firm level, parent level or ultimate parent level). Patenting can also happen on any level. In comparison, data from Compustat is consolidated, i.e. ultimate parent level. Basically if you take ownership structures into account it means you have to find the subsidiaries in PATSTAT as well. So more names to match.

Suppose you have your list of names and patents. Now you could just start counting those citations, but then you are ignoring patent families. There are different types of patent families -to make it more complicated. You should look into the DOCDB family and the INPADOC family. I can not give you a proper paper reference at the moment unfortunately. Simplified explanation: you patent in the Netherlands, and then you patent the same invention in the USA and China. It's the same invention, but in PATSTAT you will see three applications; the Dutch one being the priority patent and the American and Chinese one being in the same DOCDB family. The INPADOC family is broader than DOCDB. So you need to decide whether you want to use families and if so which one. This decision imho should be based on your theoretical framework.

Should you also worry about the effect of inventor mobility? Yes... but let's take it one step at a time :)

Feel free to send me an email, though it's also nice to discuss these practical questions on this forum so everyone can benefit.
