Understanding results of number of patent applications

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

khanhhienbi
Posts: 15
Joined: Thu Sep 03, 2020 9:49 am

Understanding results of number of patent applications

Post by khanhhienbi » Thu Jun 01, 2023 3:45 pm

Dear PATSTAT community,

I am currently working with the PATSTAT Spring 2023 edition. I would like to get the number of patent applications filed by Netherlands companies in the pharmaceutical field for the period of 12 years, from 2011 to 2022. Here are my codes:

Code: Select all

SELECT psn_name, person_ctry_code, appln_auth,
COUNT(distinct(case when t1.appln_filing_year = 2011 then t1.appln_id end)) as '2011',
COUNT(distinct(case when t1.appln_filing_year = 2012 then t1.appln_id end)) as '2012', 
COUNT(distinct(case when t1.appln_filing_year = 2013 then t1.appln_id end)) as '2013',
COUNT(distinct(case when t1.appln_filing_year = 2014 then t1.appln_id end)) as '2014',
COUNT(distinct(case when t1.appln_filing_year = 2015 then t1.appln_id end)) as '2015',
COUNT(distinct(case when t1.appln_filing_year = 2016 then t1.appln_id end)) as '2016',
COUNT(distinct(case when t1.appln_filing_year = 2017 then t1.appln_id end)) as '2017',
COUNT(distinct(case when t1.appln_filing_year = 2018 then t1.appln_id end)) as '2018',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2019',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2020',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2021',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2022'
FROM   tls201_appln t1 JOIN  tls230_appln_techn_field on t1.appln_id = tls230_appln_techn_field.appln_id
join tls207_pers_appln on t1.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
WHERE
tls230_appln_techn_field.techn_field_nr = 16--> number used for techn_field = 'Pharmaceutical'
AND t1.appln_filing_year between 2011 and 2022
and applt_seq_nr > 0 and invt_seq_nr = 0
and person_ctry_code = 'NL'
group by psn_name, person_ctry_code, appln_auth
I'm not sure why some companies that have values of 0 for every year in the time period specified are included in the dataset?

I also want to look at the biotechnology industry. In this case, how can I deal with the duplications?

Many thanks for your support!


EPO / PATSTAT Support
Posts: 440
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Understanding results of number of patent applications

Post by EPO / PATSTAT Support » Fri Jun 02, 2023 8:55 am

Hello khanhhienbi,
there is a small error in your query, the last 3 lines in your SELECT clause have a correct "heading/alias" for the columns (2020,2021,2022), but they all show the values for 2019. So the real values are not shown. Below is a correction.

Code: Select all

SELECT psn_name, person_ctry_code, appln_auth,
COUNT(distinct(case when t1.appln_filing_year = 2011 then t1.appln_id end)) as '2011',
COUNT(distinct(case when t1.appln_filing_year = 2012 then t1.appln_id end)) as '2012', 
COUNT(distinct(case when t1.appln_filing_year = 2013 then t1.appln_id end)) as '2013',
COUNT(distinct(case when t1.appln_filing_year = 2014 then t1.appln_id end)) as '2014',
COUNT(distinct(case when t1.appln_filing_year = 2015 then t1.appln_id end)) as '2015',
COUNT(distinct(case when t1.appln_filing_year = 2016 then t1.appln_id end)) as '2016',
COUNT(distinct(case when t1.appln_filing_year = 2017 then t1.appln_id end)) as '2017',
COUNT(distinct(case when t1.appln_filing_year = 2018 then t1.appln_id end)) as '2018',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2019',
COUNT(distinct(case when t1.appln_filing_year = 2020 then t1.appln_id end)) as '2020',
COUNT(distinct(case when t1.appln_filing_year = 2021 then t1.appln_id end)) as '2021',
COUNT(distinct(case when t1.appln_filing_year = 2022 then t1.appln_id end)) as '2022', 
COUNT(distinct(case when t1.appln_filing_year between 2011 and 2022 then t1.appln_id end)) as 'Total'
FROM   tls201_appln t1 JOIN  tls230_appln_techn_field on t1.appln_id = tls230_appln_techn_field.appln_id
join tls207_pers_appln on t1.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
WHERE
tls230_appln_techn_field.techn_field_nr = 16--> number used for techn_field = 'Pharmaceutical'
AND t1.appln_filing_year between 2011 and 2022
and applt_seq_nr > 0 and invt_seq_nr = 0
and person_ctry_code = 'NL'
group by psn_name, person_ctry_code, appln_auth
order by total desc
On your question on separating Pharmaceuticals from Biotechnology; that is possible but does it make sense to do so because of the intertwining of the technological fields? From a database point of view, you can indeed take all patents that only have a Pharmaceuticals label separate from those that have a Biotechnology label, but (without checking), I expect that there will be large overlap.
In fact, also Organic fine chemistry and probably Macromolecular chemistry, polymers will probably have overlaps with Pharmaceuticals and Biotechnology.
The technical sectors are assigned based on an IPC concordance table. Because patents can be assigned multiple IPC codes, they can end up in multiple technologies. Therefore we have added a ,[weight] in table tls230_appln_techn_field, so that each patent has a total value of 1 (summing the weights). This can as such be used to analyse in depth the degree of a patent belonging to a certain technology group.
Or aggregated at company level: to label a company as rather Pharmaceutical or Biotech.(care needs to be taken to not draw wrong conclusions, as the "nature" of a company can have switched - which in itself might be an interesting area of research....)
Here is the example query that takes all Pharmaceuticals and excludes Biotechnology

Code: Select all

SELECT psn_name, person_ctry_code, appln_auth,
COUNT(distinct(case when t1.appln_filing_year = 2011 then t1.appln_id end)) as '2011',
COUNT(distinct(case when t1.appln_filing_year = 2012 then t1.appln_id end)) as '2012', 
COUNT(distinct(case when t1.appln_filing_year = 2013 then t1.appln_id end)) as '2013',
COUNT(distinct(case when t1.appln_filing_year = 2014 then t1.appln_id end)) as '2014',
COUNT(distinct(case when t1.appln_filing_year = 2015 then t1.appln_id end)) as '2015',
COUNT(distinct(case when t1.appln_filing_year = 2016 then t1.appln_id end)) as '2016',
COUNT(distinct(case when t1.appln_filing_year = 2017 then t1.appln_id end)) as '2017',
COUNT(distinct(case when t1.appln_filing_year = 2018 then t1.appln_id end)) as '2018',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2019',
COUNT(distinct(case when t1.appln_filing_year = 2020 then t1.appln_id end)) as '2020',
COUNT(distinct(case when t1.appln_filing_year = 2021 then t1.appln_id end)) as '2021',
COUNT(distinct(case when t1.appln_filing_year = 2022 then t1.appln_id end)) as '2022', 
COUNT(distinct(case when t1.appln_filing_year between 2011 and 2022 then t1.appln_id end)) as 'Total'
FROM   tls201_appln t1
join tls207_pers_appln on t1.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
WHERE
t1.appln_id in (select appln_id from tls230_appln_techn_field where tls230_appln_techn_field.techn_field_nr = 16)
--> number used to include techn_field = 'Pharmaceutical')
and t1.appln_id not in (select appln_id from tls230_appln_techn_field where tls230_appln_techn_field.techn_field_nr = 15)
--> number used to exclude techn_field = 'Biotechnology')
and t1.appln_filing_year between 2011 and 2022
and applt_seq_nr > 0 and invt_seq_nr = 0
and person_ctry_code = 'NL'
group by psn_name, person_ctry_code, appln_auth
order by total desc
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


khanhhienbi
Posts: 15
Joined: Thu Sep 03, 2020 9:49 am

Re: Understanding results of number of patent applications

Post by khanhhienbi » Sat Jun 03, 2023 9:15 am

Thank you very much for your response! All the codes provided work fine!

I'm wondering if there is any way to aggregate the number of patent applications filed by a Netherland firm both in domestic and foreign countries? As I can see from the result table after running the second code, one firm can file patent applications to different patent authorities then how can one identify if the patent applications are the same or different to correctly aggregate them?

Thank you in advance for your support!


i.dijkstra.8
Posts: 1
Joined: Fri Apr 12, 2024 2:47 pm

Re: Understanding results of number of patent applications

Post by i.dijkstra.8 » Fri Apr 12, 2024 2:58 pm

I am a student and I am writing a thesis about the influence of international experience on innovation performance. To measure the innovation performance, I want to gather data on granted patents of firms in the biopharmaceutical industry. Every query I copied and pasted from this forum, shows an Error 4102 and says: there is a syntax error in your query. But when I used queries of the forum a month ago it did not show this error all the time. Am I doing something wrong, or is it the fact that I use a free trial that I do not get any data? The data I want to gather is:
top 1000 firms, with their amount of granted patents per year between 2010 and 2020 in the biopharmaceutical industry.
I saw that this could be the right query and I copied pasted it, but still it said error 4102.
The exact same one I copied does not work, and gives me an error:

Code: Select all

SELECT psn_name, person_ctry_code, appln_auth,
COUNT(distinct(case when t1.appln_filing_year = 2011 then t1.appln_id end)) as '2011',
COUNT(distinct(case when t1.appln_filing_year = 2012 then t1.appln_id end)) as '2012', 
COUNT(distinct(case when t1.appln_filing_year = 2013 then t1.appln_id end)) as '2013',
COUNT(distinct(case when t1.appln_filing_year = 2014 then t1.appln_id end)) as '2014',
COUNT(distinct(case when t1.appln_filing_year = 2015 then t1.appln_id end)) as '2015',
COUNT(distinct(case when t1.appln_filing_year = 2016 then t1.appln_id end)) as '2016',
COUNT(distinct(case when t1.appln_filing_year = 2017 then t1.appln_id end)) as '2017',
COUNT(distinct(case when t1.appln_filing_year = 2018 then t1.appln_id end)) as '2018',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2019',
COUNT(distinct(case when t1.appln_filing_year = 2020 then t1.appln_id end)) as '2020',
COUNT(distinct(case when t1.appln_filing_year = 2021 then t1.appln_id end)) as '2021',
COUNT(distinct(case when t1.appln_filing_year = 2022 then t1.appln_id end)) as '2022', 
COUNT(distinct(case when t1.appln_filing_year between 2011 and 2022 then t1.appln_id end)) as 'Total'
FROM   tls201_appln t1 JOIN  tls230_appln_techn_field on t1.appln_id = tls230_appln_techn_field.appln_id
join tls207_pers_appln on t1.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
WHERE
tls230_appln_techn_field.techn_field_nr = 16--> number used for techn_field = 'Pharmaceutical'
AND t1.appln_filing_year between 2011 and 2022
and applt_seq_nr > 0 and invt_seq_nr = 0
and person_ctry_code = 'NL'
group by psn_name, person_ctry_code, appln_auth
order by total desc
and when I changed it slightly to fit my Thesis, it still showed no results:

Code: Select all

SELECT top 1000  psn_name, 
COUNT(distinct(case when appln_filing_year = 2010 then tls201_appln.appln_id end)) as '2010',
COUNT(distinct(case when  appln_filing_year = 2011 then tls201_appln.appln_id end)) as '2011',
COUNT(distinct(case when  appln_filing_year = 2012 then tls201_appln.appln_id end)) as '2012',
COUNT(distinct(case when  appln_filing_year = 2013 then tls201_appln.appln_id end)) as '2013',
COUNT(distinct(case when  appln_filing_year = 2014 then tls201_appln.appln_id end)) as '2014',
COUNT(distinct(case when  appln_filing_year = 2015 then tls201_appln.appln_id end)) as '2015',
COUNT(distinct(case when  appln_filing_year = 2016 then tls201_appln.appln_id end)) as '2016',
COUNT(distinct(case when  appln_filing_year = 2017 then tls201_appln.appln_id end)) as '2017',
COUNT(distinct(case when  appln_filing_year = 2018 then tls201_appln.appln_id end)) as '2018',
COUNT(distinct(case when  appln_filing_year = 2019 then tls201_appln.appln_id end)) as '2019',
COUNT(distinct(case when  appln_filing_year = 2020 then tls201_appln.appln_id end)) as '2020',
count (tls201_appln.appln_id) total
FROM   tls201_appln  join tls207_pers_appln on tls201_appln.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
join tls801_country on tls206_person.person_ctry_code = tls801_country.ctry_code
WHERE
t1.appln_id in (select appln_id from tls230_appln_techn_field where tls230_appln_techn_field.techn_field_nr = 16)
--> number used to include techn_field = 'Pharmaceutical')
and t1.appln_id not in (select appln_id from tls230_appln_techn_field where tls230_appln_techn_field.techn_field_nr = 15)
--> number used to exclude techn_field = 'Biotechnology')
and t1.appln_filing_year between 2010 and 2020
and applt_seq_nr > 0 and invt_seq_nr = 0
group by psn_name, person_ctry_code, appln_auth
order by total desc
Could you please help me?


EPO / PATSTAT Support
Posts: 440
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Understanding results of number of patent applications

Post by EPO / PATSTAT Support » Wed Apr 17, 2024 2:17 pm

@khanhhienbi, you can count distinct docdb families, but then you will loose the granularity of knowing where the patents were filed. It is 2 different things: number of families is a proxy for "number of inventions". Family size is an indication of the geographical scope of where the invention is (intended to be) protected.
So: counting applications is counting the number of patent filings but NOT the number of inventions. Using patents to measure "innovative capacity" mostly means counting (distinct) docdb families and NOT counting patent filings.
This approach is generally used and accepted by scholars, but there is some debate whether to count docdb or inpadoc families. We at the EPO, use the docdb family concept for our patent insight reports.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


EPO / PATSTAT Support
Posts: 440
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Understanding results of number of patent applications

Post by EPO / PATSTAT Support » Wed Apr 17, 2024 3:38 pm

@ i.dijkstra.8
There are no functional restrictions exept for the download volumes and the limitation to 1 month free use.
Your first query works fine. (no errors)
The second one had a small error due to inconsistent use of the tls201_appln alias (t1).
From a concept point of view, if you want to measure "innovation performance", I would rather count docdb families which is generally more accepted as explained to @khanhhienbi
See attached excel sheet for correction and suggestion. (SQL3)
patent_applications_Pharmaceutical.xlsx
(348.55 KiB) Downloaded 2 times
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply