Dear Patstat Support,
I have a similar inquiry as Valentin. I am also currently working on my Master thesis. I am not only looking at one country but at several countries as I will use the synthetic control method later on to compare Sweden with other countries regarding the patents.
I want to get the patent family and corresponding to that what fraction of countries the inventors are from.
I am currently exploring the PATSTAT database and trying to code the correct query for the data I am looking for. However, I am not very experienced in MYSQL and have difficulties retrieving the correct data.
Would it be possible that you help me in retrieving the data? This would be of great value to me.
What I am looking for is the following:
get the columns:
- patent family by INPADOC family not sure if docdb is methodologically that good for me to use. →
ipadoc_family_id
- appln_filing_date = patent priority date; appln_filing_year
- column corresponding to Y02T subclasses
- extract all patents falling into the category Y02T and subcategories of Y02T
- patents with their earliest filing year between 1978 and 2021 and their respective family
- only include families with at least one transnational filing (EPO or PCT)
- get the inventors’ country locations and weigh them within each family
Especially the last bullet point is difficult for me to get.
I tried to execute the code that was posted as the last post in this forum but it does not execute correctly because:
I have access to PATSTAT through my university (Technical University Munich) and I am connecting to their database to get to PATSTAT so I have to use
MySQL queries. I believe the code posted was in Postgresql or something similar.
I changed it slightly. There might be a mistake as I am very new to SQL. My version is the following:
Altough it runs without an error I get an empty result (No rows only the column header as an output). Howevever, as I did not. change the specifics in the code I should get the same result, which is at least something, as the person who posted the code.
-- SQLINES LICENSE FOR EVALUATION USE ONLY
CREATE TABLE paolo_oppelt.my_sample_2
SELECT
distinct tls201_appln.appln_id,appln_auth, nb_inventors, tls201_appln.docdb_family_id,
# 1. Convert PostgreSQL to MYSQL then debug MYSQL
# (MAX) is postgresql not mysql
# use longtext instead
# problem with longtext is that CAST() does not support longtext
# use char instead
# maybe use GROUP_CONCAT instead of STRING_AGG
# this is why I have to delete this line:
# docdb_family_size ,GROUP_CONCAT(CAST(person_ctry_code AS char), ',') inventors_country
# and replace it with
#
https://database.guide/mysql-group_conc ... tring_agg/
docdb_family_size ,GROUP_CONCAT(CAST(person_ctry_code AS char)) inventors_country
# COUNT(distinct( tls201_appln.docdb_family_id ))
FROM tls201_appln join tls207_pers_appln on tls201_appln.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
join tls224_appln_cpc on tls201_appln.appln_id = tls224_appln_cpc.appln_id
WHERE
invt_seq_nr > 0
and tls201_appln.appln_id
in (select appln_id from tls207_pers_appln join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id where person_ctry_code = 'FR' and invt_seq_nr > 0)
AND appln_filing_year > 1970
and cpc_class_symbol LIKE 'Y02B%'
and tls201_appln.docdb_family_id in (select docdb_family_id from tls201_appln where docdb_family_size > 1)
and tls201_appln.docdb_family_id not in (select docdb_family_id from tls201_appln where appln_auth = 'EP')
and tls201_appln.docdb_family_id not in (select docdb_family_id from tls201_appln where nb_inventors = 1)
group by tls201_appln.appln_id, docdb_family_id, appln_auth, nb_inventors,docdb_family_size
# #
https://database.guide/mysql-group_conc ... tring_agg/
having GROUP_CONCAT(CAST(person_ctry_code AS char)) like '%[^FR,]%' # -- only French
order by docdb_family_id
I read the article
https://www.researchgate.net/publicatio ... le_Queries on PATSTAT and they explain how to get the inventor fraction but only for patent application not for the family.
I tried their code but it seems to run for ages (more than 40min) after which I abort it, thus I don't get results.
# These are the columns that are selected as output they are from different input tables
# fractional_count is one of the output columns. It is created as a new column
# t1 and t2 are not tables but subqueries
# t1 reports the count of inventors by country
# t2 reports the total number of inventors by patents
SELECT
person_ctry_code, SUM(tot_in_ctry/tot_in_patent) AS fractional_count
FROM
(SELECT
# I think t my table
t.appln_id,
# The IFNULL() function returns a specified value if the expression is NULL.
ifnull(t1.person_ctry_code, '') AS person_ctry_code,
ifnull(t1.tot_in_ctry, 1) AS tot_in_ctry,
ifnull(t2.tot_in_patent, 1) AS tot_in_patent
FROM
paolo_oppelt.our_sample t
LEFT OUTER JOIN
# --> Accounts for missing inventor references in
# tls207_pers_appln table
(SELECT
a.appln_id,
b.person_ctry_code,
COUNT(b.person_id) AS tot_in_ctry
FROM
tls207_pers_appln a
INNER JOIN tls206_person b ON a.person_id = b.person_id
WHERE
a.invt_seq_nr > 0
GROUP BY a.appln_id, person_ctry_code
# --> Compiles country-level count of inventors per patent
) t1 ON t.appln_id = t1.appln_id
LEFT OUTER JOIN
(SELECT
appln_id, MAX(invt_seq_nr) AS tot_in_patent
FROM
tls207_pers_appln
GROUP BY appln_id HAVING MAX(invt_seq_nr) > 0
# --> Compiles total count of inventors per patent
) t2 ON t.appln_id = t2.appln_id
) our_sample_with_country
GROUP BY person_ctry_code;
# ORDER BY SUM(tot_in_ctry/tot_in_patent) DESC;
CREATE VIEW new_sample AS
It would be great if you could help me with my inquiry as it is quite difficult for me to find the problem in the code for me because I am new to coding with DQL.
Thank you very much.
Kind regards,
Paolo