I am working with PATSTAT online Autumn 2016.
My sample contains eight companies, for which I need to know how many times their patent families were cited on a yearly basis. I do not care about the citing families. I am interested in the years since 1995 till now and will later link the cited patent families with their publication year (in Excel).
To find the patent families of each company I compiled a list of the relevant standardized name IDs (including some subsidiaries). This is the example for Bayer.
Code: Select all
SELECT cited_docdb_family_id, tls206_person.doc_std_name_id, EARLIEST_PUBLN_YEAR, COUNT(DISTINCT TLS201_APPLN.DOCDB_FAMILY_ID) FROM TLS228_Docdb_Fam_Citn INNER JOIN TLS201_Appln ON TLS228_Docdb_Fam_Citn.Docdb_Family_ID = TLS201_Appln.Docdb_Family_ID INNER JOIN tls207_pers_appln ON TLS201_Appln. Appln_ID = tls207_pers_appln.appln_ID INNER JOIN tls206_person ON tls207_pers_appln.person_id = tls206_person.person_id WHERE tls206_person.doc_std_name_id IN (1396, 541, 32483, 75745, 18722, 1433, 1131634, 1912203, 170552, 2207018, 47493, 2156297, 2177244, 1369743, 219166, 61017, 156313, 181219, 447876, 4183719, 1418004, 2059711, 22621132, 426289, 1244095, 3333009, 502257, 5323449, 2411072, 2625266, 817674, 5460547, 284413, 13005651, 19098253, 9578952, 3319441, 13745930, 4714815, 17524157, 114645, 46388, 15233, 66865, 14441, 6042724, 162409, 2024132, 426661, 19484609, 141701, 56256, 18734547, 21217983, 3166815, 436943, 71254, 11598007, 1185, 1882117, 2669092, 149838, 17588223, 5637632) AND TLS201_APPLN.APPLN_KIND LIKE 'A' AND TLS201_APPLN.earliest_publn_year>= 1995 AND TLS201_APPLN.earliest_publn_year<= 2016 GROUP BY cited_docdb_family_id, EARLIEST_PUBLN_YEAR, tls206_person.doc_std_name_id
While the std_name_id column contains the IDs that I am interested in, the cited_docdb_family_ids are not families of Bayer (The first row in the result table is actually an artificial patent). I tried to change it to a subquery, to somehow link the cited family IDs to the family IDs I get when I search for them using the standardized name IDs. But I don´t understand how to write it properly so that I either get an error in the query that, it is too complicated or I have way too many rows that still don`t have the correct cited families.
The only solution I found is to use WHERE cited_docdb_family_id IN (list with family IDs), but I need over 20 searches for Bayer because of the character limit for the query. Since most of the companies have a lot of families, it is very, very tedious and time consuming. Is there an easier way by changing the query that I have?
And on a side note, I am using publication year. Should I use filing year instead? I liked publication dates, because I thought patents that are not published will not be cited by other companies yet. However I found some patents where filing and publication years were a couple of years apart. I read somewhere that most patents will be published within 18 months of their filing.
Thanks for your help. I am happy about any comments!