check on my query

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

Elisa_D
Posts: 13
Joined: Wed Jan 26, 2022 10:38 am

check on my query

Post by Elisa_D » Thu Apr 07, 2022 12:16 pm

Dear Community,
I am trying to figure out the trend of patent applications for a certain NACE category through the years, from 1970 to 2020.
I would like to see the number of patent applications for NACE 21.00 (pharmaceuticals), and I want to count 1 for each patent, regardless of the number of countries in which the patent is portected (member of the patent family), and, as I know that there is the wage of NACE to take into account, I planned to repeat my query per scenario, that is changing the wage in each serach.
For simplicity, I wrote the query for the time span 2000-2020, and is:

Code: Select all

SELECT  
COUNT(distinct(case when  a.appln_filing_year = 2000 then a.appln_id end)) as '2000',
COUNT(distinct(case when   a.appln_filing_year = 2001 then a.appln_id end)) as '2001',
COUNT(distinct(case when   a.appln_filing_year = 2002 then a.appln_id end)) as '2002',
COUNT(distinct(case when   a.appln_filing_year = 2003 then a.appln_id end)) as '2003',
COUNT(distinct(case when   a.appln_filing_year = 2004 then a.appln_id end)) as '2004',
COUNT(distinct(case when   a.appln_filing_year = 2005 then a.appln_id end)) as '2005',
COUNT(distinct(case when   a.appln_filing_year = 2006 then a.appln_id end)) as '2006',
COUNT(distinct(case when   a.appln_filing_year = 2007 then a.appln_id end)) as '2007',
COUNT(distinct(case when   a.appln_filing_year = 2008 then a.appln_id end)) as '2008',
COUNT(distinct(case when   a.appln_filing_year = 2009 then a.appln_id end)) as '2009', 
COUNT(distinct(case when  a.appln_filing_year = 2010 then a.appln_id end)) as '2010',
COUNT(distinct(case when  a.appln_filing_year = 2011 then a.appln_id end)) as '2011',
COUNT(distinct(case when   a.appln_filing_year = 2012 then a.appln_id end)) as '2012',
COUNT(distinct(case when   a.appln_filing_year = 2013 then a.appln_id end)) as '2013',
COUNT(distinct(case when   a.appln_filing_year = 2014 then a.appln_id end)) as '2014',
COUNT(distinct(case when   a.appln_filing_year = 2015 then a.appln_id end)) as '2015',
COUNT(distinct(case when   a.appln_filing_year = 2016 then a.appln_id end)) as '2016',
COUNT(distinct(case when   a.appln_filing_year = 2017 then a.appln_id end)) as '2017',
COUNT(distinct(case when   a.appln_filing_year = 2018 then a.appln_id end)) as '2018',
COUNT(distinct(case when   a.appln_filing_year = 2019 then a.appln_id end)) as '2019',
COUNT(distinct(case when   a.appln_filing_year = 2020 then a.appln_id end)) as '2020'
 FROM tls201_appln a 
 join tls229_appln_nace2 n on a.appln_id = n.appln_id
 WHERE
 a.appln_id < 900000000 -- exclude artificial applications
 AND n.nace2_code like '21.00'
 AND n.weight >= 0.25  --Will change for each scenario
Is the query okay? I am worried that the query will count for example 10 times a single application only beacuse the applicant asked for its protection in 10 different countries (patent family members).
Thank you so much for tour help.
Best regards,
Elisa


EPO / PATSTAT Support
Posts: 440
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: check on my query

Post by EPO / PATSTAT Support » Thu Apr 07, 2022 2:07 pm

Hello Elisa,
your worries are 100% justified. You will indeed count all patents for each single family member. With the assumption that those family members will have the same NACE classification codes; but that is to be expected for family members.
To avoid this you have the count unique docdb_family_ids. Then you will count each family only once.
Because family members can have different filing years, which would again lead to some degree of double counting, it is even advisable to use the earliest filing year instead of the filing year. Example: you have a family with a priority filed in August 2019, then a first filing claiming priority in December 2019 and a second filing in January 2020. The family will be counted for 2019 as well as 2020. But if you use the "earliest filing year" only 2019 will be considered because the filing from 2020 will have the year of the priority filing.
Further on there is a syntax error: "AND n.nace2_code like '21.00'" --> if you use like you have to have the wildcard in the value. On top of that; 21.00 does not exist, it is 21.

So in a nutshell: to be sure what NACE codes are effectively used in PATSTAT (so called domain of possible values) you can run the following query to check all possible correct values (85 codes).

Code: Select all

SELECT distinct 
nace2_code
,nace2_descr
FROM tls902_ipc_nace2
order by nace2_code
And if you intend to do this for other NACE codes, you can run them all in one go, and make it more descriptive. Something like this (it will take a a good 40 minutes to run it if you remove the "n.nace2_code like '21%' " line) :

Code: Select all

SELECT list.nace2_code, list.nace2_descr,
COUNT(distinct(case when a.earliest_filing_year = 2000 then a.docdb_family_id end)) as '2000',
COUNT(distinct(case when a.earliest_filing_year = 2001 then a.docdb_family_id end)) as '2001',
COUNT(distinct(case when a.earliest_filing_year = 2002 then a.docdb_family_id end)) as '2002',
COUNT(distinct(case when a.earliest_filing_year = 2003 then a.docdb_family_id end)) as '2003',
COUNT(distinct(case when a.earliest_filing_year = 2004 then a.docdb_family_id end)) as '2004',
COUNT(distinct(case when a.earliest_filing_year = 2005 then a.docdb_family_id end)) as '2005',
COUNT(distinct(case when a.earliest_filing_year = 2006 then a.docdb_family_id end)) as '2006',
COUNT(distinct(case when a.earliest_filing_year = 2007 then a.docdb_family_id end)) as '2007',
COUNT(distinct(case when a.earliest_filing_year = 2008 then a.docdb_family_id end)) as '2008',
COUNT(distinct(case when a.earliest_filing_year = 2009 then a.docdb_family_id end)) as '2009', 
COUNT(distinct(case when a.earliest_filing_year = 2010 then a.docdb_family_id end)) as '2010',
COUNT(distinct(case when a.earliest_filing_year = 2011 then a.docdb_family_id end)) as '2011',
COUNT(distinct(case when a.earliest_filing_year = 2012 then a.docdb_family_id end)) as '2012',
COUNT(distinct(case when a.earliest_filing_year = 2013 then a.docdb_family_id end)) as '2013',
COUNT(distinct(case when a.earliest_filing_year = 2014 then a.docdb_family_id end)) as '2014',
COUNT(distinct(case when a.earliest_filing_year = 2015 then a.docdb_family_id end)) as '2015',
COUNT(distinct(case when a.earliest_filing_year = 2016 then a.docdb_family_id end)) as '2016',
COUNT(distinct(case when a.earliest_filing_year = 2017 then a.docdb_family_id end)) as '2017',
COUNT(distinct(case when a.earliest_filing_year = 2018 then a.docdb_family_id end)) as '2018',
COUNT(distinct(case when a.earliest_filing_year = 2019 then a.docdb_family_id end)) as '2019',
COUNT(distinct(case when a.earliest_filing_year = 2020 then a.docdb_family_id end)) as '2020'
 FROM tls201_appln a 
 join tls229_appln_nace2 n on a.appln_id = n.appln_id
 join (select distinct nace2_code,nace2_descr FROM tls902_ipc_nace2) list on n.nace2_code = list.nace2_code
 WHERE
 a.appln_id < 900000000 -- exclude artificial applications
 AND n.nace2_code like '21%'
 -- if you remove above line it will run the query for all NACE codes; takes about 30 minutes
 group by list.nace2_code, list.nace2_descr
 order by list.nace2_code asc
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Elisa_D
Posts: 13
Joined: Wed Jan 26, 2022 10:38 am

Re: check on my query

Post by Elisa_D » Fri Apr 08, 2022 11:37 am

Thank you so much for your reply.
Another question, in order to simplify and speed my work. What I should do if I want different rows for different weights? Do I need to run the query every time changing the weight or can I insert a sort of "for" cycle: for example "for n.weight =0.25" and then repeat "for n.weight=0.5", "n.weight=0.75", and so on?
Thank you again so much!!
Elisa


EPO / PATSTAT Support
Posts: 440
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: check on my query

Post by EPO / PATSTAT Support » Tue Apr 26, 2022 9:44 am

Hello Elisa,
if you want to have "extra rows" indicating the respective weights assigned to the nace code 21, then you simply add it in the the SELECT clause and also in the GROUP BY clause.

Code: Select all

SELECT list.nace2_code, list.nace2_descr, n.weight,
COUNT(distinct(case when a.earliest_filing_year = 2000 then a.docdb_family_id end)) as '2000',
COUNT(distinct(case when a.earliest_filing_year = 2001 then a.docdb_family_id end)) as '2001',
COUNT(distinct(case when a.earliest_filing_year = 2002 then a.docdb_family_id end)) as '2002',
COUNT(distinct(case when a.earliest_filing_year = 2003 then a.docdb_family_id end)) as '2003',
COUNT(distinct(case when a.earliest_filing_year = 2004 then a.docdb_family_id end)) as '2004',
COUNT(distinct(case when a.earliest_filing_year = 2005 then a.docdb_family_id end)) as '2005',
COUNT(distinct(case when a.earliest_filing_year = 2006 then a.docdb_family_id end)) as '2006',
COUNT(distinct(case when a.earliest_filing_year = 2007 then a.docdb_family_id end)) as '2007',
COUNT(distinct(case when a.earliest_filing_year = 2008 then a.docdb_family_id end)) as '2008',
COUNT(distinct(case when a.earliest_filing_year = 2009 then a.docdb_family_id end)) as '2009', 
COUNT(distinct(case when a.earliest_filing_year = 2010 then a.docdb_family_id end)) as '2010',
COUNT(distinct(case when a.earliest_filing_year = 2011 then a.docdb_family_id end)) as '2011',
COUNT(distinct(case when a.earliest_filing_year = 2012 then a.docdb_family_id end)) as '2012',
COUNT(distinct(case when a.earliest_filing_year = 2013 then a.docdb_family_id end)) as '2013',
COUNT(distinct(case when a.earliest_filing_year = 2014 then a.docdb_family_id end)) as '2014',
COUNT(distinct(case when a.earliest_filing_year = 2015 then a.docdb_family_id end)) as '2015',
COUNT(distinct(case when a.earliest_filing_year = 2016 then a.docdb_family_id end)) as '2016',
COUNT(distinct(case when a.earliest_filing_year = 2017 then a.docdb_family_id end)) as '2017',
COUNT(distinct(case when a.earliest_filing_year = 2018 then a.docdb_family_id end)) as '2018',
COUNT(distinct(case when a.earliest_filing_year = 2019 then a.docdb_family_id end)) as '2019',
COUNT(distinct(case when a.earliest_filing_year = 2020 then a.docdb_family_id end)) as '2020'
 FROM tls201_appln a 
 join tls229_appln_nace2 n on a.appln_id = n.appln_id
 join (select distinct nace2_code,nace2_descr FROM tls902_ipc_nace2) list on n.nace2_code = list.nace2_code
 WHERE
 a.appln_id < 900000000 -- exclude artificial applications
 AND n.nace2_code like '21%'
 -- if you remove above line it will run the query for all NACE codes; takes about 30 minutes
 group by list.nace2_code, list.nace2_descr, n.weight
 order by list.nace2_code asc, n.weight desc
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply