Hello Xiaohan,
that query creates a too heavy load on the server and it is automatic cancelled because it takes too much resources on the server. (There is a build in limitation to avoid the system working on a query for more then one hour.)
I run it on an my local system and it took about 4 hours. The result generates millions of rows, and even more during the execution of the query. As you said yourself, there are not much restrictions in the query to narrow down the generated data set.
As a matter of exception, i have posted the results here (csv file).
The file will stay available for 30 days.
http://webserv.epo.org/download.nsf/0/5 ... enDocument
Count number of patent applications and citations per company
-
- Posts: 440
- Joined: Thu Feb 22, 2007 5:33 pm
- Contact:
Re: Count number of patent applications and citations per company
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org
EPO - Vienna
patstat @ epo.org
Re: Count number of patent applications and citations per company
Much appreciated!!
I am planning to purchase PATSTAT online database, will this large load of query code work properly after the purchase?
Best,
Xiaohan
I am planning to purchase PATSTAT online database, will this large load of query code work properly after the purchase?
Best,
Xiaohan
Re: Count number of patent applications and citations per company
Dear Community,
Query killed has appeared once again, and I'm not sure if this time it's due to a syntax problem or an overload.
The IPC symbol is massive and detailed and I'm sure I put all the slashes "/" in the 9th position.
Here is my query code, could you check it for me please?
Looking forwards to your reply!
Best,
Xiaohan
Query killed has appeared once again, and I'm not sure if this time it's due to a syntax problem or an overload.
The IPC symbol is massive and detailed and I'm sure I put all the slashes "/" in the 9th position.
Here is my query code, could you check it for me please?
Code: Select all
SELECT psn_id, psn_name, ipc_class_symbol,
COUNT(distinct(case when t1.appln_filing_year = 1980 then t1.appln_id end)) as '1980',
COUNT(distinct(case when t1.appln_filing_year = 1981 then t1.appln_id end)) as '1981',
COUNT(distinct(case when t1.appln_filing_year = 1982 then t1.appln_id end)) as '1982',
COUNT(distinct(case when t1.appln_filing_year = 1983 then t1.appln_id end)) as '1983',
COUNT(distinct(case when t1.appln_filing_year = 1984 then t1.appln_id end)) as '1984',
COUNT(distinct(case when t1.appln_filing_year = 1985 then t1.appln_id end)) as '1985',
COUNT(distinct(case when t1.appln_filing_year = 1986 then t1.appln_id end)) as '1986',
COUNT(distinct(case when t1.appln_filing_year = 1987 then t1.appln_id end)) as '1987',
COUNT(distinct(case when t1.appln_filing_year = 1988 then t1.appln_id end)) as '1988',
COUNT(distinct(case when t1.appln_filing_year = 1989 then t1.appln_id end)) as '1989',
COUNT(distinct(case when t1.appln_filing_year = 1990 then t1.appln_id end)) as '1990',
COUNT(distinct(case when t1.appln_filing_year = 1991 then t1.appln_id end)) as '1991',
COUNT(distinct(case when t1.appln_filing_year = 1992 then t1.appln_id end)) as '1992',
COUNT(distinct(case when t1.appln_filing_year = 1993 then t1.appln_id end)) as '1993',
COUNT(distinct(case when t1.appln_filing_year = 1994 then t1.appln_id end)) as '1994',
COUNT(distinct(case when t1.appln_filing_year = 1995 then t1.appln_id end)) as '1995',
COUNT(distinct(case when t1.appln_filing_year = 1996 then t1.appln_id end)) as '1996',
COUNT(distinct(case when t1.appln_filing_year = 1997 then t1.appln_id end)) as '1997',
COUNT(distinct(case when t1.appln_filing_year = 1998 then t1.appln_id end)) as '1998',
COUNT(distinct(case when t1.appln_filing_year = 1999 then t1.appln_id end)) as '1999',
COUNT(distinct(case when t1.appln_filing_year = 2000 then t1.appln_id end)) as '2000',
COUNT(distinct(case when t1.appln_filing_year = 2001 then t1.appln_id end)) as '2001',
COUNT(distinct(case when t1.appln_filing_year = 2002 then t1.appln_id end)) as '2002',
COUNT(distinct(case when t1.appln_filing_year = 2003 then t1.appln_id end)) as '2003',
COUNT(distinct(case when t1.appln_filing_year = 2004 then t1.appln_id end)) as '2004',
COUNT(distinct(case when t1.appln_filing_year = 2005 then t1.appln_id end)) as '2005',
COUNT(distinct(case when t1.appln_filing_year = 2006 then t1.appln_id end)) as '2006',
COUNT(distinct(case when t1.appln_filing_year = 2007 then t1.appln_id end)) as '2007',
COUNT(distinct(case when t1.appln_filing_year = 2008 then t1.appln_id end)) as '2008',
COUNT(distinct(case when t1.appln_filing_year = 2009 then t1.appln_id end)) as '2009',
COUNT(distinct(case when t1.appln_filing_year = 2010 then t1.appln_id end)) as '2010',
COUNT(distinct(case when t1.appln_filing_year = 2011 then t1.appln_id end)) as '2011',
COUNT(distinct(case when t1.appln_filing_year = 2012 then t1.appln_id end)) as '2012',
COUNT(distinct(case when t1.appln_filing_year = 2013 then t1.appln_id end)) as '2013',
COUNT(distinct(case when t1.appln_filing_year = 2014 then t1.appln_id end)) as '2014',
COUNT(distinct(case when t1.appln_filing_year = 2015 then t1.appln_id end)) as '2015',
COUNT(distinct(case when t1.appln_filing_year = 2016 then t1.appln_id end)) as '2016',
COUNT(distinct(case when t1.appln_filing_year = 2017 then t1.appln_id end)) as '2017',
COUNT(distinct(case when t1.appln_filing_year = 2018 then t1.appln_id end)) as '2018',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2019',
COUNT(distinct(case when t1.appln_filing_year = 2020 then t1.appln_id end)) as '2020',
COUNT(distinct(case when t1.appln_filing_year = 2021 then t1.appln_id end)) as '2021',
COUNT(distinct(case when t1.appln_filing_year = 2022 then t1.appln_id end)) as '2022',
COUNT(distinct(t1.appln_id)) as "total"
FROM tls201_appln t1 JOIN tls209_appln_ipc t2
ON t1.appln_id = t2.appln_id
Join tls207_pers_appln on t1.appln_id = tls207_pers_appln.appln_id
Join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
WHERE
t1.appln_id in (select appln_id from tls209_appln_ipc where ipc_class_symbol in ('A01G 23/00','A01G 25/00','A01N 25/00','A43B 1/12','A43B 21/14','A61L 11/00','A62D 3/00','A62D 3/02','A62D 101/00','B01D 45/00','B01D 53/00','B01D 53/02','B01D 53/04','B01D 53/047','B01D 53/14','B01D 53/22','B01D 53/24','B01D 53/62','B01D 53/92','B03B 9/06','B03C 3/00','B09B 3/00','B22F 8/00','B29B 17/00','B60K 6/00','B60K 6/10','B60K 6/20','B60K 6/28','B60K 6/30','B60K 16/00','B60L 3/00','B60L 7/10','B60L 8/00','B60L 9/00','B60L 50/30','B60L 50/50','B60W 10/26','B60W 20/00','B61D 17/02','B62D 35/00','B62D 35/02','B62D 67/00','B62M 1/00','B62M 3/00','B62M 5/00','B62M 6/00','B63B 1/34','B63B 35/00','B63B 35/32','B63H 9/00','B63H 13/00','B63H 16/00','B63H 19/02','B63H 19/04','B63H 21/18','B63J 4/00','B64G 1/44','B65G 5/00','C01B 32/50','C01B 33/02','C02F 1/00','C02F 1/14','C02F 1/16','C02F 3/00','C02F 3/28','C02F 9/00','C02F 11/04','C02F 11/14','C04B 7/24','C04B 18/04','C05F 7/00','C07C 67/00','C07C 69/00','C08J 11/00','C08J 11/04','C09K 3/22','C09K 3/32','C09K 5/00','C09K 11/01','C09K 17/00','C10B 21/18','C10B 53/00','C10B 53/02','C10G 1/10','C10J 3/02','C10J 3/46','C10J 3/86','C10L 1/00','C10L 1/02','C10L 1/14','C10L 1/19','C10L 1/182','C10L 3/00','C10L 5/00','C10L 5/40','C10L 5/42','C10L 5/44','C10L 5/46','C10L 5/48','C10L 9/00','C10L 10/02','C10L 10/06','C11B 11/00','C11B 13/00','C11C 3/10','C12M 1/107','C12N 1/13','C12N 1/15','C12N 1/21','C12N 5/10','C12N 15/00','C12N 9/24','C12P 5/02','C12P 7/06','C12P 7/649','C14C 3/32','C21B 3/04','C21B 7/22','C21B 5/06','C21C 5/38','C22B 7/00','C22B 19/30','C22B 25/06','C23C 14/14','C23C 16/24','C25C 1/00','C30B 29/06','D01F 13/00','D01G 11/00','D21B 1/08','D21B 1/32','D21C 5/02','D21C 11/00','D21F 5/20','E02B 9/00','E02B 9/08','E02B 15/04','E02D 3/00','E03C 1/12','E04B 1/62','E04B 1/74','E04B 1/88','E04B 1/90','E04B 2/00','E04B 5/00','E04B 7/00','E04B 9/00','E04C 1/40','E04C 1/41','E04C 2/284','E04D 1/28','E04D 3/35','E04D 13/00','E04D 13/16','E04D 13/18','E04F 13/08','E04F 15/18','E04H 1/00','E04H 12/00','E06B 3/263','E21B 41/00','E21B 43/16','E21F 17/16','F01K 17/00','F01K 23/04','F01K 23/06','F01K 27/00','F01N 3/00','F01N 5/00','F01N 9/00','F02B 43/00','F02B 75/10','F02C 1/05','F02C 3/28','F02C 6/18','F02G 5/00','F02M 21/02','F02M 27/02','F03B 13/12','F03B 15/00','F03D 1/04','F03D 9/00','F03D 13/00','F03D 13/20','F03G 4/00','F03G 5/00','F03G 6/00','F03G 7/04','F03G 7/05','F03G 7/08','F16H 3/00','F16H 48/00','F21K 99/00','F21L 4/00','F21L 4/02','F21S 9/03','F22B 1/00','F22B 1/02','F23B 80/02','F23B 90/00','F23C 9/00','F23G 5/00','F23G 5/027','F23G 5/46','F23G 7/00','F23G 7/06','F23G 7/10','F23J 7/00','F23J 15/00','F24D 3/00','F24D 5/00','F24D 11/00','F24D 11/02','F24D 15/04','F24D 17/00','F24D 17/02','F24D 18/00','F24D 19/00','F24F 5/00','F24F 12/00','F24H 4/00','F24H 7/00','F24S 10/10','F24S 23/00','F24S 90/00','F24T 10/00','F24V 30/00','F25B 27/00','F25B 27/02','F25B 30/00','F25B 30/06','F25J 3/02','F26B 3/00','F26B 3/28','F27B 1/18','F27B 15/12','F27D 17/00','F28D 17/00','F28D 20/00','F28D 20/02','G02B 7/183','G05F 1/67','G08B 21/12','G21C 13/10','G21F 9/00','H01G 9/20','H01G 11/00','H01J 9/50','H01J 9/52','H01L 25/00','H01L 25/03','H01L 25/16','H01L 25/18','H01L 27/30','H01L 27/142','H01L 31/00','H01L 31/042','H01L 31/0525','H01L 33/00','H01L 51/42','H01L 51/50','H01M 4/86','H01M 6/52','H01M 8/00','H01M 10/44','H01M 10/54','H01M 12/00','H01M 14/00','H01M 50/00','H02J 3/28','H02J 7/00','H02J 7/35','H02J 9/00','H02J 15/00','H02K 7/18','H02K 29/08','H02K 49/10','H02N 10/00','H02S 10/00','H02S 40/44','H05B 33/00') ) -- Add IPC symbol here
And t1.appln_filing_year between 1980 and 2022
And applt_seq_nr > 0 and invt_seq_nr = 0
And granted = 'Y'
And psn_sector = 'company'
GROUP BY psn_id, psn_name, ipc_class_symbol
ORDER BY total desc
Best,
Xiaohan
-
- Posts: 440
- Joined: Thu Feb 22, 2007 5:33 pm
- Contact:
Re: Count number of patent applications and citations per company
Making a "count" on the attribute nb_citing_docdb_fam is not correct. nb_citing_docdb_fam already gives you the the number of forward citations, so I assume you want to make the sum. You also have to keep in mind that the nb_citing_docdb_fam is a number that is calculated at FAMILY level, but your query is looking at individual patents. So if an applicant has a family with 4 members (assuming filed in the same year), then it will make the sum 4 times, for each of the members of the families. That is wrong. And also the IPC codes are defined at application level (contrary to CPC codes which are the same for all family members.)
You need to re-think your data (aggregation) model.
I would work with an intermediate table which aggregates the application defined data at family level.
(And use a small sample with a limited amount of patents to see if the model works.)
Take as example the query below that generates all relevant data on 51 patents.
When you analyse the data, you see immediately that your intended level of aggregation of IPC codes at 3 digits will lead to a reduction of rows. (we get 99 rows with the query below, without loosing information which we need.)
Remember we want to avoid double counting due to family members having the same IPC or the same filing year. So we can use for example the "earliest filing year" instead of filing year. This will again further group family members that were filed in the same year. At the end we need a psn_name, filing year, and number forward citations to create the table you want.
Here is the query that creates your "source data". It further reduced the set to 36 rows (starting from 240.) You can now use this query as a sub-query where you do your "SUM-CASE-year(a.earliest_filing_date)" (not count !) of the attribute a.nb_citing_docdb_fam.
The methodology still has a small shortcoming because a patent family can have family members filed in different years. So the number of forward citations will be added to the individual years.
You need to re-think your data (aggregation) model.
I would work with an intermediate table which aggregates the application defined data at family level.
(And use a small sample with a limited amount of patents to see if the model works.)
Take as example the query below that generates all relevant data on 51 patents.
Code: Select all
SELECT psn_id, psn_name, LEFT(ipc_class_symbol,3)ipc, year(a.earliest_filing_date) earliest_year, a.docdb_family_id, a.nb_citing_docdb_fam
FROM tls201_appln a
Join tls207_pers_appln b on a.appln_id = b.appln_id
Join tls206_person c on b.person_id = c.person_id
Join tls209_appln_ipc d on a.appln_id = d.appln_id
WHERE
appln_filing_year between 1980 and 2022
And granted = 'Y'
And LEFT(ipc_class_symbol,3) in ('A01','A43','A61','A62','B01','B03','B09','B22','B60','B61','B62','B63','B64'
,'B65','C01','C02','C04','C05','C07','C08','C09','C10','C11','C12','C14','C21','C23','C25','C30','D01','D21','E02'
,'E03','E04','E06','E21','F01','F02','F03','F16','F21','F22','F23','F24','F25','F26','F27','F28','G01','G02','G05'
,'G06','G08','G21','H01','H02','H05') --add 3-digit IPC here
And applt_seq_nr > 0 and invt_seq_nr = 0
And psn_sector = 'company'
and psn_name = 'FIO CORPORATION'
group by psn_id, psn_name, LEFT(ipc_class_symbol,3), year(a.earliest_filing_date), a.docdb_family_id, a.nb_citing_docdb_fam
ORDER BY a.docdb_family_id, LEFT(ipc_class_symbol,3)
Code: Select all
SELECT distinct a.*, c.*, left(ipc_class_symbol,3) ipc
FROM tls201_appln a
Join tls207_pers_appln b on a.appln_id = b.appln_id
Join tls206_person c on b.person_id = c.person_id
Join tls209_appln_ipc d on a.appln_id = d.appln_id
WHERE
appln_filing_year between 1980 and 2022
And granted = 'Y'
And LEFT(ipc_class_symbol,3) in ('A01','A43','A61','A62','B01','B03','B09','B22','B60','B61','B62','B63','B64'
,'B65','C01','C02','C04','C05','C07','C08','C09','C10','C11','C12','C14','C21','C23','C25','C30','D01','D21','E02'
,'E03','E04','E06','E21','F01','F02','F03','F16','F21','F22','F23','F24','F25','F26','F27','F28','G01','G02','G05'
,'G06','G08','G21','H01','H02','H05') --add 3-digit IPC here
And applt_seq_nr > 0 and invt_seq_nr = 0
And psn_sector = 'company'
and psn_name = 'FIO CORPORATION'
ORDER BY a.docdb_family_id, a.appln_id,LEFT(ipc_class_symbol,3)
Here is the query that creates your "source data". It further reduced the set to 36 rows (starting from 240.) You can now use this query as a sub-query where you do your "SUM-CASE-year(a.earliest_filing_date)" (not count !) of the attribute a.nb_citing_docdb_fam.
Code: Select all
SELECT psn_id, psn_name, LEFT(ipc_class_symbol,3)ipc, year(a.earliest_filing_date) earliest_year, a.docdb_family_id, a.nb_citing_docdb_fam
FROM tls201_appln a
Join tls207_pers_appln b on a.appln_id = b.appln_id
Join tls206_person c on b.person_id = c.person_id
Join tls209_appln_ipc d on a.appln_id = d.appln_id
WHERE
appln_filing_year between 1980 and 2022
And granted = 'Y'
And LEFT(ipc_class_symbol,3) in ('A01','A43','A61','A62','B01','B03','B09','B22','B60','B61','B62','B63','B64'
,'B65','C01','C02','C04','C05','C07','C08','C09','C10','C11','C12','C14','C21','C23','C25','C30','D01','D21','E02'
,'E03','E04','E06','E21','F01','F02','F03','F16','F21','F22','F23','F24','F25','F26','F27','F28','G01','G02','G05'
,'G06','G08','G21','H01','H02','H05') --add 3-digit IPC here
And applt_seq_nr > 0 and invt_seq_nr = 0
And psn_sector = 'company'
and psn_name = 'FIO CORPORATION'
group by psn_id, psn_name, LEFT(ipc_class_symbol,3), year(a.earliest_filing_date), a.docdb_family_id, a.nb_citing_docdb_fam
ORDER BY a.docdb_family_id, LEFT(ipc_class_symbol,3)
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org
EPO - Vienna
patstat @ epo.org