Count number of patent applications and citations per company

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.

tulmer
Posts: 1
Joined: Tue Aug 13, 2019 2:52 pm

Count number of patent applications and citations per company

Post by tulmer » Wed Aug 14, 2019 12:38 pm

Dear PATSTAT community,

I am currently writing a research paper on the influence of different financing methods on the innovation output of listed European companies. The number of patents per company and their citations are my dependent variable. Therefore, I would like to obtain the number of patents (year of application, but only counted if ultimately granted) and the number of forward citations of these patents for a list of around 2000 European companies in the years from 2003 to 2013. In addition to that, I would like to filter companies out, which haven’t had at least one granted patent in that timeframe.

Is it possible to write a query, which would provide me with a list of all listed European companies (or all companies, which are in the EuroStoxx 600 or the STOXX All Europe Total Market), which had at least one granted patent in that time frame and the number of their patent applications in the years 2003-2013? Afterwards I would rule out those companies manually, which aren’t on my list (of companies). Is this possible or would I have to look for the number of applications of every company individually?

If I am informed correctly the best database would be DOCDB, but since I am new to both PATSTAT and SQL every help is highly appreciated.

Best
Thomas


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Count number of patent applications and citations per company

Post by EPO / PATSTAT Support » Fri Aug 16, 2019 2:35 pm

Hello Thomas,
the only variable not available in PATSTAT is whether or not a company is in the EuroStoxx 600 or the STOXX All Europe Total Market noted. All the rest should be available.

Here is a sample query that might be useful to get you started; to limit the list, it has only the number of family citations for the granted patents. The way you specified your request would require to create intermediate data tables which can not be done via PATSTAT Online.

Code: Select all

SELECT  psn_name, person_ctry_code, appln_auth+appln_nr,  
appln_filing_date   ,nb_citing_docdb_fam
FROM
tls201_appln a
join tls207_pers_appln b on a.appln_id = b.appln_id
join tls206_person c on b.person_id = c.person_id
join tls801_country on c.person_ctry_code= tls801_country.ctry_code
WHERE
appln_auth = 'EP'
and appln_filing_year between 2003 and 2013
and granted = 'Y'
and eu_member = 'Y'
and applt_seq_nr > 0 and invt_seq_nr = 0
and nb_citing_docdb_fam > 0
and psn_sector = 'company'
ORDER BY psn_name, appln_filing_date
The above query will return about 280.000 applications which is too much for an extraction or download. But if you add further restrictions based on for example the company name, you could reduce the set considerably.
For all explanations of the used attributes, kindly consult the PATSTAT documentation. Especially with regards to the pre-aggregated attribute: nb_citing_docdb_fam; many researchers tend to use a more restrictive approach by excluding self-citations. To do more advanced citation analysis, you would need to have the raw data on a local computer system so that you have access to the citations.

The query gives you a list of all companies in the EU that had at least 1 granted patent filed between 2003 and 2013. You could use this allready to match it to your stock data in order to have an idea on the amount of data.

Code: Select all

SELECT distinct  psn_name, person_ctry_code
FROM
tls201_appln a
join tls207_pers_appln b on a.appln_id = b.appln_id
join tls206_person c on b.person_id = c.person_id
join tls801_country on c.person_ctry_code= tls801_country.ctry_code
WHERE
appln_auth = 'EP'
and appln_filing_year between 2003 and 2013
and granted = 'Y'
and eu_member = 'Y'
and applt_seq_nr > 0 and invt_seq_nr = 0
and psn_sector = 'company'
ORDER BY psn_name
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Count number of patent applications and citations per company

Post by EPO / PATSTAT Support » Tue Aug 27, 2019 5:21 pm

Hello Thomas,
here is an extension on the previous query.
It gives you all granted applications filed by 'OSRAM OPTO SEMICONDUCTORS' and the number of forward citations. This figure is family-family based citations, but we do not distinguish between the origin of the citations. (search, applicant, examination, etc...)
Most researchers might fine-tun this to exclude for example the applicant citations, and putting a maximum time frame on when the citations were given. (for example 3 years)
There is a good example in the Getting Started publication on how to do that.

Code: Select all

SELECT distinct  psn_name, person_ctry_code, appln_auth, appln_nr, appln_kind, appln_filing_date,  nb_citing_docdb_fam , publn_date as date_of_grant
FROM
tls201_appln a
join tls207_pers_appln b on a.appln_id = b.appln_id
join tls206_person c on b.person_id = c.person_id
join tls801_country on c.person_ctry_code= tls801_country.ctry_code
join tls211_pat_publn on a.appln_id = tls211_pat_publn.appln_id
WHERE
appln_auth = 'EP'
and appln_filing_year between 2003 and 2013
and granted = 'Y'
and eu_member = 'Y'
and applt_seq_nr > 0 and invt_seq_nr = 0
and psn_sector = 'company'
and publn_first_grant = 'Y'
and psn_name = 'OSRAM OPTO SEMICONDUCTORS'
ORDER BY psn_name, appln_filing_date
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


xy4u20
Posts: 9
Joined: Mon Feb 27, 2023 3:03 pm

Re: Count number of patent applications and citations per company

Post by xy4u20 » Mon Feb 27, 2023 8:01 pm

Dear PATSTAT community,

I am currently looking for patent citation data to support research in corporate innovation.
I am aware that PATSTAT cannot provide information such as ISIN or CUSIP, so the company name is the key id for me, followed by the number of citations forwarded.
As permissions are trial at this stage, I understand that some query codes are forbidden on the web version.
Having a general understanding of SQL and referring to what is shared in that forum, here is the query code I have written.

My target results are for companies worldwide, from 1980 to 2017.
I would like to use tls212_Citation, but am not sure I really need to. As I am concerned with the number of forwarded citations rather than a more detailed explanation.
I have tried to retrieve the data serially this way (COUNT(distinct(case when appln_filing_year = 1980 then tls201_appln.appln_id end)) as '1980'), but it takes a long time and does not give results for all companies.,

I would like to know what the number of citations to patents was in each of those 38 years. However the current results do not show this. Could you help me to optimise my query code please?

Looking forward to hearing from you, I would be very appreciative!

Best regards,
Xiaohan

Code: Select all

SELECT  psn_name, person_ctry_code, appln_auth, appln_nr, appln_kind, appln_filing_year, nb_citing_docdb_fam
FROM
tls201_appln a
join tls207_pers_appln b on a.appln_id = b.appln_id
join tls206_person c on b.person_id = c.person_id
join tls801_country on c.person_ctry_code= tls801_country.ctry_code
WHERE
appln_filing_year between 1980 and 2017
and granted = 'Y'
and eu_member = 'Y'
and applt_seq_nr > 0 and invt_seq_nr = 0
and nb_citing_docdb_fam > 0
and psn_sector = 'company'
ORDER BY psn_name, appln_filing_year


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Count number of patent applications and citations per company

Post by EPO / PATSTAT Support » Tue Feb 28, 2023 12:09 pm

Hello Xiaohan,
from a syntax point of view, your sql is correct, but it generates 2.8 million rows. Basically 1 row for each patent. If you want to aggregate the number of forward citations per application filing year, then you have to remove the application data from the SELECT clause. You could then for example make the sum of the forward citations per filing year. (I am doubtful whether that makes sense, but it depends on what you want to prove. Simply summing them would include bias due to family members, age of the application, etc.... You need to give this a good thought...).

A good approach to get started on any SQL query is to think in excel spreadsheet terms. What do you want in the rows, what do you want in the columns, and how will you calculate the cells relevant to what is in the row or the columns.

Here is an example of your adapted query.
(which also clearly illustrates that there is room for improvement on the harmonisation and grouping of the applicant names.)

Code: Select all

SELECT  psn_name,
sum(case when appln_filing_year = 1980 then a.nb_citing_docdb_fam end) as "1980",
sum(case when appln_filing_year = 1981 then a.nb_citing_docdb_fam end) as "1981",
sum(case when appln_filing_year = 1982 then a.nb_citing_docdb_fam end) as "1982",
sum(case when appln_filing_year = 1983 then a.nb_citing_docdb_fam end) as "1983",
sum(case when appln_filing_year = 1984 then a.nb_citing_docdb_fam end) as "1984",
sum(case when appln_filing_year = 1985 then a.nb_citing_docdb_fam end) as "1985",
sum(case when appln_filing_year = 1986 then a.nb_citing_docdb_fam end) as "1986",
sum(case when appln_filing_year = 1987 then a.nb_citing_docdb_fam end) as "1987",
sum(case when appln_filing_year = 1988 then a.nb_citing_docdb_fam end) as "1988",
sum(case when appln_filing_year = 1989 then a.nb_citing_docdb_fam end) as "1989",
sum(case when appln_filing_year = 1990 then a.nb_citing_docdb_fam end) as "1990",
sum(case when appln_filing_year = 1991 then a.nb_citing_docdb_fam end) as "1991",
sum(case when appln_filing_year = 1992 then a.nb_citing_docdb_fam end) as "1992",
sum(case when appln_filing_year = 1993 then a.nb_citing_docdb_fam end) as "1993",
sum(case when appln_filing_year = 1994 then a.nb_citing_docdb_fam end) as "1994",
sum(case when appln_filing_year = 1995 then a.nb_citing_docdb_fam end) as "1995",
sum (a.nb_citing_docdb_fam) total
FROM tls201_appln a
join tls207_pers_appln b on a.appln_id = b.appln_id
join tls206_person c on b.person_id = c.person_id
join tls801_country on c.person_ctry_code= tls801_country.ctry_code
WHERE
appln_filing_year between 1980 and 1995
and granted = 'Y'
and eu_member = 'Y'
and applt_seq_nr > 0 and invt_seq_nr = 0
and nb_citing_docdb_fam > 0
and psn_sector = 'company'
group by psn_name
having sum (a.nb_citing_docdb_fam) > 100
-- to limit to table
ORDER BY psn_name
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


xy4u20
Posts: 9
Joined: Mon Feb 27, 2023 3:03 pm

Re: Count number of patent applications and citations per company

Post by xy4u20 » Thu Mar 02, 2023 9:29 pm

Hello PATSTAT community,

Thank you for your reply, it helps me a lot!

I rewrote the code to include a query based on granted patents filed between 1980 and 1985 and encountered an error. The reason is Invalid column name 'tls201_appln'. Could you tell me what the problem is?
Secondly can the query code get both the number of patents granted and the number of prior citations to the patent?

Code: Select all

SELECT  psn_name,
sum(case when appln_filing_year = 1980 then a.tls201_appln.appln_id end) as "1980",
sum(case when appln_filing_year = 1981 then a.tls201_appln.appln_id end) as "1981",
sum(case when appln_filing_year = 1982 then a.tls201_appln.appln_id end) as "1982",
sum(case when appln_filing_year = 1983 then a.tls201_appln.appln_id end) as "1983",
sum(case when appln_filing_year = 1984 then a.tls201_appln.appln_id end) as "1984",
sum(case when appln_filing_year = 1985 then a.tls201_appln.appln_id end) as "1985",
sum (a.tls201_appln.appln_id) total
FROM tls201_appln a
join tls207_pers_appln b on a.appln_id = b.appln_id
join tls206_person c on b.person_id = c.person_id
join tls801_country on c.person_ctry_code= tls801_country.ctry_code
WHERE
appln_filing_year between 1980 and 1995
and granted = 'Y'
and oecd_member = 'Y'
and applt_seq_nr > 0 and invt_seq_nr = 0
and nb_citing_docdb_fam > 0
and psn_sector = 'company'
group by psn_name
ORDER BY psn_name
[/quote]

On the other hand I try to link the green list specified by IPC to my table, which is https://www.wipo.int/classifications/ip ... ntory/home. But when I add psn_name to the code it is pointed out as a syntax error. This seems to conflict with tls206_person, how should I fix this? As I am concerned with companies level data, the company name is very important to me. When replacing GROUP BY and ORDER BY with psn_name the code is still invalid.
(Since I don't know to how to write these three requirements into one query code, I am working separately.)

Code: Select all

SELECT psn_name, appln_filing_year, ipc_class_symbol, appln_auth, person_ctry_code
FROM tls201_appln 
JOIN tls209_appln_ipc  ON tls209_appln_ipc.appln_id = tls201_appln.appln_id
join tls207_pers_appln on tls207_pers_appln.appln_id = tls201_appln.appln_id
join tls206_person on tls206_person.person_id = tls207_pers_appln.person_id 
WHERE ipc_class_symbol LIKE 'H01M   4%' -- or other IPC class
AND appln_filing_year BETWEEN 1980 AND 1985 -- as example here
GROUP BY appln_filing_year, ipc_class_symbol , appln_auth, person_ctry_code
ORDER BY appln_filing_year, ipc_class_symbol
As I cannot create a table in the web version, what code should I use when I want to look up forward citations that are defined as green patents by the IPC? Should this add (tls201_appln.appln_id in (select appln_id from tls209_appln_ipc where left(ipc_class_symbol) = ''H01M 4%'') to the first query code or should it add forward citations to the second code?

Looking forwards to your reply, much appreciated!

Best regards,
Xiaohan


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Count number of patent applications and citations per company

Post by EPO / PATSTAT Support » Thu Mar 09, 2023 2:41 pm

On your first query, you made a mistake to rename tls201_appln to a , but then still used the original name.
Further you made a SUM of the appln_id's which should be a "count(distinct( case..... end)) as...."
In the SELECT clause you can add as many columns as you want, based of course on the available data in the joined tables. As the WHERE clause already limited the data set to granted patents, you can simply COUNT the number of appln_id's for each year, and you will then know how many of the patents filed in that year were granted.
Something like this:

Code: Select all

SELECT  psn_name,
count(distinct(case when appln_filing_year = 1994 then a.appln_id end))  as "granted_1994",
sum(case when appln_filing_year = 1994 then a.nb_citing_docdb_fam end) as "1994",
count(distinct(case when appln_filing_year = 1995 then a.appln_id end))  as "granted_1995",
sum(case when appln_filing_year = 1995 then a.nb_citing_docdb_fam end) as "1995",
sum (a.nb_citing_docdb_fam) total,
count(a.appln_id) as total_granted
FROM tls201_appln a
join tls207_pers_appln b on a.appln_id = b.appln_id
join tls206_person c on b.person_id = c.person_id
join tls801_country on c.person_ctry_code= tls801_country.ctry_code
WHERE
appln_filing_year between 1994 and 1995
and granted = 'Y'
and eu_member = 'Y'
and applt_seq_nr > 0 and invt_seq_nr = 0
and nb_citing_docdb_fam > 0
and psn_sector = 'company'
group by psn_name
having sum (a.nb_citing_docdb_fam) > 100
-- to limit to table
ORDER BY psn_name
If you want to include an IPC limitation:

Code: Select all

SELECT  psn_name,
count(distinct(case when appln_filing_year = 1994 then a.appln_id end))  as "granted_1994",
sum(case when appln_filing_year = 1994 then a.nb_citing_docdb_fam end) as "1994",
count(distinct(case when appln_filing_year = 1995 then a.appln_id end))  as "granted_1995",
sum(case when appln_filing_year = 1995 then a.nb_citing_docdb_fam end) as "1995",
sum (a.nb_citing_docdb_fam) total,
count(a.appln_id) as total_granted
FROM tls201_appln a
join tls207_pers_appln b on a.appln_id = b.appln_id
join tls206_person c on b.person_id = c.person_id
join tls801_country on c.person_ctry_code= tls801_country.ctry_code
WHERE
appln_filing_year between 1994 and 1995
and granted = 'Y'
and eu_member = 'Y'
and applt_seq_nr > 0 and invt_seq_nr = 0
and nb_citing_docdb_fam > 0
and psn_sector = 'company'
and a.appln_id in (select appln_id from tls209_appln_ipc where ipc_class_symbol  LIKE 'H01M   4%')
group by psn_name
ORDER BY psn_name
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


xy4u20
Posts: 9
Joined: Mon Feb 27, 2023 3:03 pm

Re: Count number of patent applications and citations per company

Post by xy4u20 » Mon Mar 13, 2023 10:12 pm

Dear PATSTAT community,

Thank you very much for your reply.
I add 57 IPCs to the query code and the program runs for a long time without pulling out any results. The error message is "Query killed".
Should I separate my query code?

Best,
Xiaohan


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Count number of patent applications and citations per company

Post by EPO / PATSTAT Support » Tue Mar 14, 2023 6:27 pm

difficult to say, please post your SQL.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


xy4u20
Posts: 9
Joined: Mon Feb 27, 2023 3:03 pm

Re: Count number of patent applications and citations per company

Post by xy4u20 » Tue Mar 14, 2023 6:40 pm

Thanks for the reply, my query code is shown below.
I have tried retrieving data from 1980 to 1985, with two IPC limits added. Such an attempt gave successful results, so I expanded the query code to what it is now.
My target result is worldwide companies, so I have not added appln_auth, person_ctry_code or community limits. To get more results I removed the "nb_citing_docdb_fam > 0" limit.
Looking forwards to your reply!

Best,
Xiaohan

Code: Select all

SELECT  psn_id, psn_name, LEFT(ipc_class_symbol,3) IPC,
COUNT(distinct(case when appln_filing_year = 1980 then a.nb_citing_docdb_fam end)) as "1980",
COUNT(distinct(case when appln_filing_year = 1981 then a.nb_citing_docdb_fam end)) as "1981",
COUNT(distinct(case when appln_filing_year = 1982 then a.nb_citing_docdb_fam end)) as "1982",
COUNT(distinct(case when appln_filing_year = 1983 then a.nb_citing_docdb_fam end)) as "1983",
COUNT(distinct(case when appln_filing_year = 1984 then a.nb_citing_docdb_fam end)) as "1984",
COUNT(distinct(case when appln_filing_year = 1985 then a.nb_citing_docdb_fam end)) as "1985",
COUNT(distinct(case when appln_filing_year = 1986 then a.nb_citing_docdb_fam end)) as "1986",
COUNT(distinct(case when appln_filing_year = 1987 then a.nb_citing_docdb_fam end)) as "1987",
COUNT(distinct(case when appln_filing_year = 1988 then a.nb_citing_docdb_fam end)) as "1988",
COUNT(distinct(case when appln_filing_year = 1989 then a.nb_citing_docdb_fam end)) as "1989",
COUNT(distinct(case when appln_filing_year = 1990 then a.nb_citing_docdb_fam end)) as "1990",
COUNT(distinct(case when appln_filing_year = 1991 then a.nb_citing_docdb_fam end)) as "1991",
COUNT(distinct(case when appln_filing_year = 1992 then a.nb_citing_docdb_fam end)) as "1992",
COUNT(distinct(case when appln_filing_year = 1993 then a.nb_citing_docdb_fam end)) as "1993",
COUNT(distinct(case when appln_filing_year = 1994 then a.nb_citing_docdb_fam end)) as "1994",
COUNT(distinct(case when appln_filing_year = 1995 then a.nb_citing_docdb_fam end)) as "1995",
COUNT(distinct(case when appln_filing_year = 1996 then a.nb_citing_docdb_fam end)) as "1996",
COUNT(distinct(case when appln_filing_year = 1997 then a.nb_citing_docdb_fam end)) as "1997",
COUNT(distinct(case when appln_filing_year = 1998 then a.nb_citing_docdb_fam end)) as "1998",
COUNT(distinct(case when appln_filing_year = 1999 then a.nb_citing_docdb_fam end)) as "1999",
COUNT(distinct(case when appln_filing_year = 2000 then a.nb_citing_docdb_fam end)) as "2000",
COUNT(distinct(case when appln_filing_year = 2001 then a.nb_citing_docdb_fam end)) as "2001",
COUNT(distinct(case when appln_filing_year = 2002 then a.nb_citing_docdb_fam end)) as "2002",
COUNT(distinct(case when appln_filing_year = 2003 then a.nb_citing_docdb_fam end)) as "2003",
COUNT(distinct(case when appln_filing_year = 2004 then a.nb_citing_docdb_fam end)) as "2004",
COUNT(distinct(case when appln_filing_year = 2005 then a.nb_citing_docdb_fam end)) as "2005",
COUNT(distinct(case when appln_filing_year = 2006 then a.nb_citing_docdb_fam end)) as "2006",
COUNT(distinct(case when appln_filing_year = 2007 then a.nb_citing_docdb_fam end)) as "2007",
COUNT(distinct(case when appln_filing_year = 2008 then a.nb_citing_docdb_fam end)) as "2008",
COUNT(distinct(case when appln_filing_year = 2009 then a.nb_citing_docdb_fam end)) as "2009",
COUNT(distinct(case when appln_filing_year = 2010 then a.nb_citing_docdb_fam end)) as "2010",
COUNT(distinct(case when appln_filing_year = 2011 then a.nb_citing_docdb_fam end)) as "2011",
COUNT(distinct(case when appln_filing_year = 2012 then a.nb_citing_docdb_fam end)) as "2012",
COUNT(distinct(case when appln_filing_year = 2013 then a.nb_citing_docdb_fam end)) as "2013",
COUNT(distinct(case when appln_filing_year = 2014 then a.nb_citing_docdb_fam end)) as "2014",
COUNT(distinct(case when appln_filing_year = 2015 then a.nb_citing_docdb_fam end)) as "2015",
COUNT(distinct(case when appln_filing_year = 2016 then a.nb_citing_docdb_fam end)) as "2016",
COUNT(distinct(case when appln_filing_year = 2017 then a.nb_citing_docdb_fam end)) as "2017",
COUNT(distinct(case when appln_filing_year = 2018 then a.nb_citing_docdb_fam end)) as "2018",
COUNT(distinct(case when appln_filing_year = 2019 then a.nb_citing_docdb_fam end)) as "2019",
COUNT(distinct(case when appln_filing_year = 2020 then a.nb_citing_docdb_fam end)) as "2020",
COUNT(distinct(case when appln_filing_year = 2021 then a.nb_citing_docdb_fam end)) as "2021",
COUNT(distinct(case when appln_filing_year = 2022 then a.nb_citing_docdb_fam end)) as "2022",
COUNT(distinct(a.nb_citing_docdb_fam)) as "total"
FROM tls201_appln a
Join tls207_pers_appln b on a.appln_id = b.appln_id
Join tls206_person c on b.person_id = c.person_id
Join tls801_country on c.person_ctry_code= tls801_country.ctry_code
Join tls209_appln_ipc d on a.appln_id = d.appln_id
WHERE
appln_filing_year between 1980 and 2022
And granted = 'Y'
And LEFT(ipc_class_symbol,3) in ('A01','A43','A61','A62','B01','B03','B09','B22','B60','B61','B62','B63','B64','B65','C01','C02','C04','C05','C07','C08','C09','C10','C11','C12','C14','C21','C23','C25','C30','D01','D21','E02','E03','E04','E06','E21','F01','F02','F03','F16','F21','F22','F23','F24','F25','F26','F27','F28','G01','G02','G05','G06','G08','G21','H01','H02','H05') --add 3-digit IPC here
And applt_seq_nr > 0 and invt_seq_nr = 0
And psn_sector = 'company'
GROUP BY psn_id, psn_name, LEFT(ipc_class_symbol,3)
ORDER BY psn_id, psn_name, LEFT(ipc_class_symbol,3)


Post Reply