Count number of patent applications and citations per company

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.

EPO / PATSTAT Support
Posts: 424
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Count number of patent applications and citations per company

Post by EPO / PATSTAT Support » Wed Mar 15, 2023 8:23 am

Hello Xiaohan,
that query creates a too heavy load on the server and it is automatic cancelled because it takes too much resources on the server. (There is a build in limitation to avoid the system working on a query for more then one hour.)
I run it on an my local system and it took about 4 hours. The result generates millions of rows, and even more during the execution of the query. As you said yourself, there are not much restrictions in the query to narrow down the generated data set.
As a matter of exception, i have posted the results here (csv file).
The file will stay available for 30 days.
http://webserv.epo.org/download.nsf/0/5 ... enDocument
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


xy4u20
Posts: 9
Joined: Mon Feb 27, 2023 3:03 pm

Re: Count number of patent applications and citations per company

Post by xy4u20 » Thu Mar 16, 2023 3:19 am

Much appreciated!!

I am planning to purchase PATSTAT online database, will this large load of query code work properly after the purchase?

Best,
Xiaohan


xy4u20
Posts: 9
Joined: Mon Feb 27, 2023 3:03 pm

Re: Count number of patent applications and citations per company

Post by xy4u20 » Wed Mar 22, 2023 8:12 pm

Dear Community,

Query killed has appeared once again, and I'm not sure if this time it's due to a syntax problem or an overload.
The IPC symbol is massive and detailed and I'm sure I put all the slashes "/" in the 9th position.

Here is my query code, could you check it for me please?

Code: Select all

SELECT psn_id, psn_name, ipc_class_symbol,
COUNT(distinct(case when t1.appln_filing_year = 1980 then t1.appln_id end)) as '1980',
COUNT(distinct(case when t1.appln_filing_year = 1981 then t1.appln_id end)) as '1981',
COUNT(distinct(case when t1.appln_filing_year = 1982 then t1.appln_id end)) as '1982',
COUNT(distinct(case when t1.appln_filing_year = 1983 then t1.appln_id end)) as '1983',
COUNT(distinct(case when t1.appln_filing_year = 1984 then t1.appln_id end)) as '1984',
COUNT(distinct(case when t1.appln_filing_year = 1985 then t1.appln_id end)) as '1985',
COUNT(distinct(case when t1.appln_filing_year = 1986 then t1.appln_id end)) as '1986',
COUNT(distinct(case when t1.appln_filing_year = 1987 then t1.appln_id end)) as '1987',
COUNT(distinct(case when t1.appln_filing_year = 1988 then t1.appln_id end)) as '1988',
COUNT(distinct(case when t1.appln_filing_year = 1989 then t1.appln_id end)) as '1989',
COUNT(distinct(case when t1.appln_filing_year = 1990 then t1.appln_id end)) as '1990',
COUNT(distinct(case when t1.appln_filing_year = 1991 then t1.appln_id end)) as '1991',
COUNT(distinct(case when t1.appln_filing_year = 1992 then t1.appln_id end)) as '1992',
COUNT(distinct(case when t1.appln_filing_year = 1993 then t1.appln_id end)) as '1993',
COUNT(distinct(case when t1.appln_filing_year = 1994 then t1.appln_id end)) as '1994',
COUNT(distinct(case when t1.appln_filing_year = 1995 then t1.appln_id end)) as '1995',
COUNT(distinct(case when t1.appln_filing_year = 1996 then t1.appln_id end)) as '1996',
COUNT(distinct(case when t1.appln_filing_year = 1997 then t1.appln_id end)) as '1997',
COUNT(distinct(case when t1.appln_filing_year = 1998 then t1.appln_id end)) as '1998',
COUNT(distinct(case when t1.appln_filing_year = 1999 then t1.appln_id end)) as '1999',
COUNT(distinct(case when t1.appln_filing_year = 2000 then t1.appln_id end)) as '2000',
COUNT(distinct(case when t1.appln_filing_year = 2001 then t1.appln_id end)) as '2001',
COUNT(distinct(case when t1.appln_filing_year = 2002 then t1.appln_id end)) as '2002',
COUNT(distinct(case when t1.appln_filing_year = 2003 then t1.appln_id end)) as '2003',
COUNT(distinct(case when t1.appln_filing_year = 2004 then t1.appln_id end)) as '2004',
COUNT(distinct(case when t1.appln_filing_year = 2005 then t1.appln_id end)) as '2005',
COUNT(distinct(case when t1.appln_filing_year = 2006 then t1.appln_id end)) as '2006',
COUNT(distinct(case when t1.appln_filing_year = 2007 then t1.appln_id end)) as '2007',
COUNT(distinct(case when t1.appln_filing_year = 2008 then t1.appln_id end)) as '2008',
COUNT(distinct(case when t1.appln_filing_year = 2009 then t1.appln_id end)) as '2009',
COUNT(distinct(case when t1.appln_filing_year = 2010 then t1.appln_id end)) as '2010',
COUNT(distinct(case when t1.appln_filing_year = 2011 then t1.appln_id end)) as '2011',
COUNT(distinct(case when t1.appln_filing_year = 2012 then t1.appln_id end)) as '2012',
COUNT(distinct(case when t1.appln_filing_year = 2013 then t1.appln_id end)) as '2013',
COUNT(distinct(case when t1.appln_filing_year = 2014 then t1.appln_id end)) as '2014',
COUNT(distinct(case when t1.appln_filing_year = 2015 then t1.appln_id end)) as '2015',
COUNT(distinct(case when t1.appln_filing_year = 2016 then t1.appln_id end)) as '2016',
COUNT(distinct(case when t1.appln_filing_year = 2017 then t1.appln_id end)) as '2017',
COUNT(distinct(case when t1.appln_filing_year = 2018 then t1.appln_id end)) as '2018',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2019',
COUNT(distinct(case when t1.appln_filing_year = 2020 then t1.appln_id end)) as '2020',
COUNT(distinct(case when t1.appln_filing_year = 2021 then t1.appln_id end)) as '2021',
COUNT(distinct(case when t1.appln_filing_year = 2022 then t1.appln_id end)) as '2022',
COUNT(distinct(t1.appln_id)) as "total"
FROM tls201_appln t1 JOIN tls209_appln_ipc t2
	ON t1.appln_id = t2.appln_id
Join tls207_pers_appln on t1.appln_id = tls207_pers_appln.appln_id
Join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
WHERE
t1.appln_id in (select appln_id from tls209_appln_ipc where ipc_class_symbol in ('A01G  23/00','A01G  25/00','A01N  25/00','A43B   1/12','A43B  21/14','A61L  11/00','A62D   3/00','A62D   3/02','A62D 101/00','B01D  45/00','B01D  53/00','B01D  53/02','B01D  53/04','B01D  53/047','B01D  53/14','B01D  53/22','B01D  53/24','B01D  53/62','B01D  53/92','B03B   9/06','B03C   3/00','B09B   3/00','B22F   8/00','B29B  17/00','B60K   6/00','B60K   6/10','B60K   6/20','B60K   6/28','B60K   6/30','B60K  16/00','B60L   3/00','B60L   7/10','B60L   8/00','B60L   9/00','B60L  50/30','B60L  50/50','B60W  10/26','B60W  20/00','B61D  17/02','B62D  35/00','B62D  35/02','B62D  67/00','B62M   1/00','B62M   3/00','B62M   5/00','B62M   6/00','B63B   1/34','B63B  35/00','B63B  35/32','B63H   9/00','B63H  13/00','B63H  16/00','B63H  19/02','B63H  19/04','B63H  21/18','B63J   4/00','B64G   1/44','B65G   5/00','C01B  32/50','C01B  33/02','C02F   1/00','C02F   1/14','C02F   1/16','C02F   3/00','C02F   3/28','C02F   9/00','C02F  11/04','C02F  11/14','C04B   7/24','C04B  18/04','C05F   7/00','C07C  67/00','C07C  69/00','C08J  11/00','C08J  11/04','C09K   3/22','C09K   3/32','C09K   5/00','C09K  11/01','C09K  17/00','C10B  21/18','C10B  53/00','C10B  53/02','C10G   1/10','C10J   3/02','C10J   3/46','C10J   3/86','C10L   1/00','C10L   1/02','C10L   1/14','C10L   1/19','C10L   1/182','C10L   3/00','C10L   5/00','C10L   5/40','C10L   5/42','C10L   5/44','C10L   5/46','C10L   5/48','C10L   9/00','C10L  10/02','C10L  10/06','C11B  11/00','C11B  13/00','C11C   3/10','C12M   1/107','C12N   1/13','C12N   1/15','C12N   1/21','C12N   5/10','C12N  15/00','C12N   9/24','C12P   5/02','C12P   7/06','C12P   7/649','C14C   3/32','C21B   3/04','C21B   7/22','C21B   5/06','C21C   5/38','C22B   7/00','C22B  19/30','C22B  25/06','C23C  14/14','C23C  16/24','C25C   1/00','C30B  29/06','D01F  13/00','D01G  11/00','D21B   1/08','D21B   1/32','D21C   5/02','D21C  11/00','D21F   5/20','E02B   9/00','E02B   9/08','E02B  15/04','E02D   3/00','E03C   1/12','E04B   1/62','E04B   1/74','E04B   1/88','E04B   1/90','E04B   2/00','E04B   5/00','E04B   7/00','E04B   9/00','E04C   1/40','E04C   1/41','E04C   2/284','E04D   1/28','E04D   3/35','E04D  13/00','E04D  13/16','E04D  13/18','E04F  13/08','E04F  15/18','E04H   1/00','E04H  12/00','E06B   3/263','E21B  41/00','E21B  43/16','E21F  17/16','F01K  17/00','F01K  23/04','F01K  23/06','F01K  27/00','F01N   3/00','F01N   5/00','F01N   9/00','F02B  43/00','F02B  75/10','F02C   1/05','F02C   3/28','F02C   6/18','F02G   5/00','F02M  21/02','F02M  27/02','F03B  13/12','F03B  15/00','F03D   1/04','F03D   9/00','F03D  13/00','F03D  13/20','F03G   4/00','F03G   5/00','F03G   6/00','F03G   7/04','F03G   7/05','F03G   7/08','F16H   3/00','F16H  48/00','F21K  99/00','F21L   4/00','F21L   4/02','F21S   9/03','F22B   1/00','F22B   1/02','F23B  80/02','F23B  90/00','F23C   9/00','F23G   5/00','F23G   5/027','F23G   5/46','F23G   7/00','F23G   7/06','F23G   7/10','F23J   7/00','F23J  15/00','F24D   3/00','F24D   5/00','F24D  11/00','F24D  11/02','F24D  15/04','F24D  17/00','F24D  17/02','F24D  18/00','F24D  19/00','F24F   5/00','F24F  12/00','F24H   4/00','F24H   7/00','F24S  10/10','F24S  23/00','F24S  90/00','F24T  10/00','F24V  30/00','F25B  27/00','F25B  27/02','F25B  30/00','F25B  30/06','F25J   3/02','F26B   3/00','F26B   3/28','F27B   1/18','F27B  15/12','F27D  17/00','F28D  17/00','F28D  20/00','F28D  20/02','G02B   7/183','G05F   1/67','G08B  21/12','G21C  13/10','G21F   9/00','H01G   9/20','H01G  11/00','H01J   9/50','H01J   9/52','H01L  25/00','H01L  25/03','H01L  25/16','H01L  25/18','H01L  27/30','H01L  27/142','H01L  31/00','H01L  31/042','H01L  31/0525','H01L  33/00','H01L  51/42','H01L  51/50','H01M   4/86','H01M   6/52','H01M   8/00','H01M  10/44','H01M  10/54','H01M  12/00','H01M  14/00','H01M  50/00','H02J   3/28','H02J   7/00','H02J   7/35','H02J   9/00','H02J  15/00','H02K   7/18','H02K  29/08','H02K  49/10','H02N  10/00','H02S  10/00','H02S  40/44','H05B  33/00') ) -- Add IPC symbol here
And t1.appln_filing_year between 1980 and 2022
And applt_seq_nr > 0 and invt_seq_nr = 0
And granted = 'Y'
And psn_sector = 'company'
GROUP BY psn_id, psn_name, ipc_class_symbol
ORDER BY total desc
Looking forwards to your reply!

Best,
Xiaohan


EPO / PATSTAT Support
Posts: 424
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Count number of patent applications and citations per company

Post by EPO / PATSTAT Support » Thu Mar 30, 2023 3:21 pm

Making a "count" on the attribute nb_citing_docdb_fam is not correct. nb_citing_docdb_fam already gives you the the number of forward citations, so I assume you want to make the sum. You also have to keep in mind that the nb_citing_docdb_fam is a number that is calculated at FAMILY level, but your query is looking at individual patents. So if an applicant has a family with 4 members (assuming filed in the same year), then it will make the sum 4 times, for each of the members of the families. That is wrong. And also the IPC codes are defined at application level (contrary to CPC codes which are the same for all family members.)
You need to re-think your data (aggregation) model.
I would work with an intermediate table which aggregates the application defined data at family level.
(And use a small sample with a limited amount of patents to see if the model works.)
Take as example the query below that generates all relevant data on 51 patents.

Code: Select all

SELECT psn_id, psn_name, LEFT(ipc_class_symbol,3)ipc, year(a.earliest_filing_date) earliest_year, a.docdb_family_id, a.nb_citing_docdb_fam
FROM tls201_appln a
Join tls207_pers_appln b on a.appln_id = b.appln_id
Join tls206_person c on b.person_id = c.person_id
Join tls209_appln_ipc d on a.appln_id = d.appln_id
WHERE
appln_filing_year between 1980 and 2022
And granted = 'Y'
And LEFT(ipc_class_symbol,3) in ('A01','A43','A61','A62','B01','B03','B09','B22','B60','B61','B62','B63','B64'
,'B65','C01','C02','C04','C05','C07','C08','C09','C10','C11','C12','C14','C21','C23','C25','C30','D01','D21','E02'
,'E03','E04','E06','E21','F01','F02','F03','F16','F21','F22','F23','F24','F25','F26','F27','F28','G01','G02','G05'
,'G06','G08','G21','H01','H02','H05') --add 3-digit IPC here
And applt_seq_nr > 0 and invt_seq_nr = 0
And psn_sector = 'company'
and psn_name = 'FIO CORPORATION'
group by psn_id, psn_name, LEFT(ipc_class_symbol,3), year(a.earliest_filing_date), a.docdb_family_id, a.nb_citing_docdb_fam
ORDER BY a.docdb_family_id, LEFT(ipc_class_symbol,3)
When you analyse the data, you see immediately that your intended level of aggregation of IPC codes at 3 digits will lead to a reduction of rows. (we get 99 rows with the query below, without loosing information which we need.)

Code: Select all

SELECT distinct a.*, c.*, left(ipc_class_symbol,3) ipc
FROM tls201_appln a
Join tls207_pers_appln b on a.appln_id = b.appln_id
Join tls206_person c on b.person_id = c.person_id
Join tls209_appln_ipc d on a.appln_id = d.appln_id
WHERE
appln_filing_year between 1980 and 2022
And granted = 'Y'
And LEFT(ipc_class_symbol,3) in ('A01','A43','A61','A62','B01','B03','B09','B22','B60','B61','B62','B63','B64'
,'B65','C01','C02','C04','C05','C07','C08','C09','C10','C11','C12','C14','C21','C23','C25','C30','D01','D21','E02'
,'E03','E04','E06','E21','F01','F02','F03','F16','F21','F22','F23','F24','F25','F26','F27','F28','G01','G02','G05'
,'G06','G08','G21','H01','H02','H05') --add 3-digit IPC here
And applt_seq_nr > 0 and invt_seq_nr = 0
And psn_sector = 'company'
and psn_name = 'FIO CORPORATION'
ORDER BY a.docdb_family_id, a.appln_id,LEFT(ipc_class_symbol,3)
Remember we want to avoid double counting due to family members having the same IPC or the same filing year. So we can use for example the "earliest filing year" instead of filing year. This will again further group family members that were filed in the same year. At the end we need a psn_name, filing year, and number forward citations to create the table you want.
Here is the query that creates your "source data". It further reduced the set to 36 rows (starting from 240.) You can now use this query as a sub-query where you do your "SUM-CASE-year(a.earliest_filing_date)" (not count !) of the attribute a.nb_citing_docdb_fam.

Code: Select all

SELECT psn_id, psn_name, LEFT(ipc_class_symbol,3)ipc, year(a.earliest_filing_date) earliest_year, a.docdb_family_id, a.nb_citing_docdb_fam
FROM tls201_appln a
Join tls207_pers_appln b on a.appln_id = b.appln_id
Join tls206_person c on b.person_id = c.person_id
Join tls209_appln_ipc d on a.appln_id = d.appln_id
WHERE
appln_filing_year between 1980 and 2022
And granted = 'Y'
And LEFT(ipc_class_symbol,3) in ('A01','A43','A61','A62','B01','B03','B09','B22','B60','B61','B62','B63','B64'
,'B65','C01','C02','C04','C05','C07','C08','C09','C10','C11','C12','C14','C21','C23','C25','C30','D01','D21','E02'
,'E03','E04','E06','E21','F01','F02','F03','F16','F21','F22','F23','F24','F25','F26','F27','F28','G01','G02','G05'
,'G06','G08','G21','H01','H02','H05') --add 3-digit IPC here
And applt_seq_nr > 0 and invt_seq_nr = 0
And psn_sector = 'company'
and psn_name = 'FIO CORPORATION'
group by psn_id, psn_name, LEFT(ipc_class_symbol,3), year(a.earliest_filing_date), a.docdb_family_id, a.nb_citing_docdb_fam
ORDER BY a.docdb_family_id, LEFT(ipc_class_symbol,3)
The methodology still has a small shortcoming because a patent family can have family members filed in different years. So the number of forward citations will be added to the individual years.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply