Count forward citations per year

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

lola
Posts: 4
Joined: Thu Nov 03, 2016 3:05 pm

Count forward citations per year

Post by lola » Thu Nov 03, 2016 3:34 pm

Hello!

I am working with PATSTAT online Autumn 2016.
My sample contains eight companies, for which I need to know how many times their patent families were cited on a yearly basis. I do not care about the citing families. I am interested in the years since 1995 till now and will later link the cited patent families with their publication year (in Excel).

To find the patent families of each company I compiled a list of the relevant standardized name IDs (including some subsidiaries). This is the example for Bayer.

Code: Select all

SELECT cited_docdb_family_id, tls206_person.doc_std_name_id, EARLIEST_PUBLN_YEAR, 	COUNT(DISTINCT TLS201_APPLN.DOCDB_FAMILY_ID)
FROM 
TLS228_Docdb_Fam_Citn
INNER JOIN 
TLS201_Appln ON TLS228_Docdb_Fam_Citn.Docdb_Family_ID = 
TLS201_Appln.Docdb_Family_ID
INNER JOIN 
tls207_pers_appln ON TLS201_Appln. Appln_ID = tls207_pers_appln.appln_ID
INNER JOIN 
tls206_person ON tls207_pers_appln.person_id = tls206_person.person_id
WHERE
 tls206_person.doc_std_name_id IN 
 (1396, 541, 32483, 75745, 18722, 1433, 1131634, 1912203, 170552, 2207018, 47493, 2156297, 2177244, 1369743, 219166, 61017, 156313, 181219, 447876, 4183719, 1418004, 2059711, 22621132, 426289, 1244095, 3333009, 502257, 5323449, 2411072, 2625266, 817674, 5460547, 284413, 13005651, 19098253, 9578952, 3319441, 13745930, 4714815, 17524157, 114645, 46388, 15233, 66865, 14441, 6042724, 162409, 2024132, 426661, 19484609, 141701, 56256, 18734547, 21217983, 3166815, 436943, 71254, 11598007, 1185, 1882117, 2669092, 149838, 17588223, 5637632)
AND
 TLS201_APPLN.APPLN_KIND LIKE 'A'
AND 
TLS201_APPLN.earliest_publn_year>= 1995
AND 
TLS201_APPLN.earliest_publn_year<= 2016
GROUP BY cited_docdb_family_id, EARLIEST_PUBLN_YEAR, tls206_person.doc_std_name_id

While the std_name_id column contains the IDs that I am interested in, the cited_docdb_family_ids are not families of Bayer (The first row in the result table is actually an artificial patent). I tried to change it to a subquery, to somehow link the cited family IDs to the family IDs I get when I search for them using the standardized name IDs. But I don´t understand how to write it properly so that I either get an error in the query that, it is too complicated or I have way too many rows that still don`t have the correct cited families.
The only solution I found is to use WHERE cited_docdb_family_id IN (list with family IDs), but I need over 20 searches for Bayer because of the character limit for the query. Since most of the companies have a lot of families, it is very, very tedious and time consuming. Is there an easier way by changing the query that I have?

And on a side note, I am using publication year. Should I use filing year instead? I liked publication dates, because I thought patents that are not published will not be cited by other companies yet. However I found some patents where filing and publication years were a couple of years apart. I read somewhere that most patents will be published within 18 months of their filing.


Thanks for your help. I am happy about any comments!
Regards, Lola


mkracker
Posts: 120
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Count forward citations per year

Post by mkracker » Mon Nov 07, 2016 1:57 pm

Hi ,

Thank you for the clear description of your problem. I think you mixed up 1 or 2 attributes regarding cited / citing families. As a consequence, you get a strange result.

I adapted your query (changes are marked) and hope that this is what you are looking for.

SELECT tls201_appln.docdb_family_id, tls206_person.doc_std_name_id, EARLIEST_PUBLN_YEAR,
COUNT(DISTINCT tls228_docdb_fam_citn.DOCDB_FAMILY_ID) as NbOfCitingFam
FROM
TLS228_Docdb_Fam_Citn
INNER JOIN
TLS201_Appln ON TLS228_Docdb_Fam_Citn.cited_Docdb_Family_ID =
TLS201_Appln.Docdb_Family_ID
INNER JOIN
tls207_pers_appln ON TLS201_Appln. Appln_ID = tls207_pers_appln.appln_ID
INNER JOIN
tls206_person ON tls207_pers_appln.person_id = tls206_person.person_id
WHERE
tls206_person.doc_std_name_id IN
(1396, 541, 32483, 75745, 18722, 1433, 1131634, 1912203, 170552, 2207018, 47493, 2156297, 2177244, 1369743, 219166, 61017, 156313, 181219, 447876, 4183719, 1418004, 2059711, 22621132, 426289, 1244095, 3333009, 502257, 5323449, 2411072, 2625266, 817674, 5460547, 284413, 13005651, 19098253, 9578952, 3319441, 13745930, 4714815, 17524157, 114645, 46388, 15233, 66865, 14441, 6042724, 162409, 2024132, 426661, 19484609, 141701, 56256, 18734547, 21217983, 3166815, 436943, 71254, 11598007, 1185, 1882117, 2669092, 149838, 17588223, 5637632)
AND
TLS201_APPLN.APPLN_KIND LIKE 'A'
AND
TLS201_APPLN.earliest_publn_year>= 1995
AND
TLS201_APPLN.earliest_publn_year<= 2016
GROUP BY tls201_appln.docdb_family_id, EARLIEST_PUBLN_YEAR, tls206_person.doc_std_name_id
ORDER BY tls201_appln.docdb_family_id, EARLIEST_PUBLN_YEAR, tls206_person.doc_std_name_id


In short:
  • You have to JOIN the applications of the required applicants with the cited(!) families.
  • In the COUNT you want to count the citing(!) families, which are in attribute TLS228_DODCB_FAM_CITN.DOCDB_FAMILY_ID
  • You want to group by the (cited) families of your list of applicants, so I changed the GROUP BY attribute as well. Consequently, the first attribute of the SELECT has to be changed as well
  • I added the ORDER BY clause for the sake of readability of the output
Now the result is "Only" 53 498 rows in the PATSTAT 2016 Autumn Edition.
I did not check the result very carefully. So please do a thorough test before using it.

About which date to use: You are right: Publications which have not been published cannot be cited by other parties.
I would not to use the publication date for most kind of analysis. Usually I want to have the date which is closest to the invention, so I would use the EARLIEST_FILING_DATE or the APPLN_FILING_DATE, both in TLS201_APPLN. Still, many of the new applications will not show up, because they may have not been published or, if published, not have been cited yet.

I did not understand your remark about the character limit of PATSTAT Online, which is 10 000 characters. This should be sufficient for your type of queries.

I hope I could help,
Martin / EPO PATSTAT
-------------------------------------------
Martin Kracker / EPO


lola
Posts: 4
Joined: Thu Nov 03, 2016 3:05 pm

Re: Count forward citations per year

Post by lola » Tue Nov 08, 2016 12:27 pm

Hello Martin!

Thanks for your reply!
Some of the problems are now fixed :) The family id is now of the cited family that I am interested in and no longer the citing family.
The number of citations however are the total number of citations received by the family. For example the family Id 3518917 has been cited a total of 6 times if I check using nb_citing_docdb_fam.
In the query I get three rows for that family Id with years 1999, 2000 and 2010 and each time the number of citations received is 6. So the query gives me the the earliest publication years of the cited family and the total number of citations received by that family.
However, I need the number of citations received per year instead of total. I thought that I could aggregate the citing families according to their earliest publication year and then use that as the number of citations received in that year. For example, for my research I would like to use the Current Impact Index and then I need to know the year of the cited patent as well as if it was cited in the first five years. Since I use a couple of citation indicators I thought it would be better to have one table with citations per year and then later calculate my indicators from that one table. I would like to also have total number of citations a company received each year and then compare that to more "sophisticated" indicators.

Regards, Lola


mkracker
Posts: 120
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Count forward citations per year

Post by mkracker » Sun Nov 13, 2016 8:18 am

Hi Lola,

I thought this over again. If you want to have family-family citations by publication year (or by any other grouping), then you cannot use table TLS228_DOCDB_FAM_CITN. For normal uses this table is very convenient, but it does not contain the data to do the grouping you need, e.g. by publication year.

Consequently you have to compute your family citation yourself, using the publication data in (TLS211_PAT_PUBLN), citation information (TLS212_CITATION) and DOCDB family information (TLS201_APPLN). I recommend you create some auxiliary tables to hold temporary results. Putting all logic in one query would make it very difficult to understand and to test.

Best regards,
Martin
-------------------------------------------
Martin Kracker / EPO


lola
Posts: 4
Joined: Thu Nov 03, 2016 3:05 pm

Re: Count forward citations per year

Post by lola » Sun Nov 13, 2016 2:09 pm

Hello Martin,

in the first post when I talked about the character limit of the query, I first used a query with a list of family IDs

Code: Select all

SELECT EARLIEST_PUBLN_YEAR, COUNT(DISTINCT TLS201_APPLN.DOCDB_FAMILY_ID)
FROM TLS228_DOCDB_FAM_CITN
INNER JOIN TLS201_APPLN ON TLS228_DOCDB_FAM_CITN.DOCDB_FAMILY_ID = 
TLS201_APPLN.DOCDB_FAMILY_ID
WHERE CITED_DOCDB_FAMILY_ID = 3485203
GROUP BY EARLIEST_PUBLN_YEAR
Because this query gave me the forward citations to the family for each year, I extended it and used a list of family IDs. Because of the character limit of 10,000 I needed to split it up into about 12 queries per company, since there were a large number of family IDs.
If it is rather complicated to ask for the name IDs (like you suggesting I create auxiliary tables first) I could still use the above approach. I just thought there would be a simple way of inserting a subquery that could ask for all the family IDs of a list of name IDs and setting those as the cited family IDs. But I think I just go back to my old approach.

Thank you for taking the time and answering all my questions!
Regards, Lola


lola
Posts: 4
Joined: Thu Nov 03, 2016 3:05 pm

Re: Count forward citations per year

Post by lola » Sun Nov 13, 2016 5:12 pm

Sorry, I just thought of two more questions.
When I calculate forward family-family citations by hand using publication years, I have the same citing family multiple times with different earliest publication years. Why does this happen? And should I use the earliest of those publication years?
If I count only distinct citing families I get the number that is also in the indicator nb_citing_docdb_fam. I want to know the number of forward citations to defer the value of the company and now I am confused how I should count them.
Regards, Lola


mkracker
Posts: 120
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Count forward citations per year

Post by mkracker » Tue Nov 15, 2016 10:31 pm

Hi Lola,

Instead of running multiple queries you can do it more elegantly by using a subquery like this:

Code: Select all

select *
from tls201_appln
where docdb_family_id IN  
	-- subquery, which returns a list of DOCDB_FAMILY_IDs
	(select docdb_family_id
	from tls201_appln a 
    join tls207_pers_appln pa on a.appln_id = pa.appln_id
    join tls206_person p on pa.person_id = p.person_id
    where p.doc_std_name_id in (1396, 541, 32483, 75745 ) -- just some persons
	)
Subqueries can be used in the SELECT, FROM and (most often) in the WHERE clause. Once you are familiar with them, they can be vey useful.

I do not understand your questions of your last post without seeing the actual query. Because it's getting very special, just email me at "patstat [at] epo . org"
-------------------------------------------
Martin Kracker / EPO


Post Reply