Page 1 of 1
PATSTAT Online Y02E SQL query
Posted: Wed Feb 26, 2020 9:09 pm
by aroh
Hello,
I am using the PATSTAT Online database 2019 Autumn edition. I want to extract patent data by linking the 4 tables - 201, 224, 207 and 206. I want to see country-wise patents related to wind energy filed for the time period 1999 - 2018 tagged using the Y02E scheme. I want information related appln_id, earliest_filing_date, person_country_code, apple_auth.
Code: Select all
SELECT DISTINCT a.appln_id, a.appln_auth, a.appln_nr, a.appln_kind, a.earliest_filing_date, b.cpc_class_symbol, c.invt_seq_nr, d.person_name, d.person_ctry_code
FROM tls201_appln a JOIN tls224_appln_cpc b ON a.appln_id = b.appln_id
LEFT OUTER JOIN tls207_pers_appln c on a.appln_id = c.appln_id
LEFT OUTER JOIN tls206_person d ON c.person_id = d.person_id
WHERE b.cpc_class_symbol = 'Y02E 10/7%' AND c.invt_seq_nr > 0
ORDER BY a.appln_auth, a.earliest_filing_date, a.appln_id, c.invt_seq_nr, d.person_ctry_code
This is the code I have used and I wanted to know the changes required to the code.
Thank you.
Re: PATSTAT Online Y02E SQL query
Posted: Thu Feb 27, 2020 2:42 pm
by EPO / PATSTAT Support
Hello Aroh,
there is a small SQL syntax error in your query.
A wildcard % comes allways in combination with LIKE. And the number of digits before the / is exaclty eight.
So:
b.cpc_class_symbol like 'Y02E 10/7%' will work.
There are 2 spaces beteen the E and the 10.
Instead of a wildcard, you could also use:
WHERE left (b.cpc_class_symbol, 10) = 'Y02E 10/7' AND c.invt_seq_nr > 0
I also added a STRING_AGG to group the CPC codes into 2 field, which groups the rows.
Code: Select all
SELECT DISTINCT a.appln_id, a.appln_auth, a.appln_nr, a.appln_kind, a.earliest_filing_date,
STRING_AGG (cast( b.cpc_class_symbol as NVARCHAR(MAX)), '; ') cpc_GROUP,
c.invt_seq_nr, d.person_name, d.person_ctry_code
FROM tls201_appln a JOIN tls224_appln_cpc b ON a.appln_id = b.appln_id
LEFT OUTER JOIN tls207_pers_appln c on a.appln_id = c.appln_id
LEFT OUTER JOIN tls206_person d ON c.person_id = d.person_id
WHERE left (b.cpc_class_symbol, 10) = 'Y02E 10/7' AND c.invt_seq_nr > 0
and appln_filing_year between 1999 and 2018
group by a.appln_id, a.appln_auth, a.appln_nr, a.appln_kind, a.earliest_filing_date,
c.invt_seq_nr, d.person_name, d.person_ctry_code
ORDER BY a.appln_auth, a.earliest_filing_date, a.appln_id, c.invt_seq_nr, d.person_ctry_code
Re: PATSTAT Online Y02E SQL query
Posted: Thu Feb 27, 2020 4:53 pm
by aroh
Hello,
Thank you for your prompt response and help.
If I use - WHERE left (b.cpc_class_symbol, 10) = 'Y02E 10/7' AND c.invt_seq_nr > 0 , will it allow me to select all the patent applications tagged with Y02E10/70-766 under the Y02E tag for wind energy?
Also, can you please explain the function of STRING_AGG in the code? I haven't seen its use in the EPO document 'Using PATSTAT with SQL for beginners'. Thank you once again.
Re: PATSTAT Online Y02E SQL query
Posted: Thu Feb 27, 2020 5:15 pm
by aroh
Hello,
This is in continuation to my earlier query. There are a lot of patent applications where the inventor country is not found. I saw the following code in the PATSTAT manual. Will the problem of not finding inventor country be solved using fractional count?
Code: Select all
SELECT d.person_ctry_code, SUM(tot_in_ctry/tot_in_patent) AS fractional_count
FROM
(SELECT
t.appln_id,
isnull (t1.person_ctry_code, ' ') AS person_ctry_code,
isnull (t1.tot_in_ctry, 1) AS tot_in_ctry,
isnull (t2.tot_in_patent, 1) AS tot_in_patent
FROM
our_sample t
LEFT OUTER JOIN
(SELECT
c.appln_id,
d.person_ctry_code,
COUNT (d.person_id) AS tot_in_ctry
FROM
tls207_pers_appln c
INNER JOIN tls206_person d ON c.person_id = d.person_id
WHERE
c.invt_seq_nr > 0
GROUP BY c.appln_id, person_ctry_code
) t1 ON t.appln_id = t1.appln_id
LEFT OUTER JOIN
(SELECT
appln_id, MAX(invt_seq_nr) AS tot_in_patent
FROM
tls207_pers_appln
GROUP BY appln_id HAVING MAX(invt_seq_nr) > 0
) t2 ON t.appln_id = t2.appln_id
) our_sample_with_country
GROUP BY person_ctry_code
ORDER BY SUM(tot_in_ctry/tot_in_patent) DESC;
Re: PATSTAT Online Y02E SQL query
Posted: Fri Feb 28, 2020 10:05 am
by EPO / PATSTAT Support
aroh wrote: ↑Thu Feb 27, 2020 4:53 pm
Hello,
Thank you for your prompt response and help.
If I use - WHERE left (b.cpc_class_symbol, 10) = 'Y02E 10/7' AND c.invt_seq_nr > 0 , will it allow me to select all the patent applications tagged with Y02E10/70-766 under the Y02E tag for wind energy?
Also, can you please explain the function of STRING_AGG in the code? I haven't seen its use in the EPO document 'Using PATSTAT with SQL for beginners'. Thank you once again.
Yes, Y02E10/766 will be included with the above WHERE clause. The "PATSTAT with SQL for beginners" only covers a fraction of the possible SQL functions. STRING_AGG is a fairly new function which comes in handy to group rows in the output. There are many good SQL-learning websites that provide sample queries. I often use
https://www.w3schools.com/sql/default.asp to look up SQL syntax.
Re: PATSTAT Online Y02E SQL query
Posted: Fri Feb 28, 2020 12:13 pm
by EPO / PATSTAT Support
aroh wrote: ↑Thu Feb 27, 2020 5:15 pm
Hello,
This is in continuation to my earlier query. There are a lot of patent applications where the inventor country is not found. I saw the following code in the PATSTAT manual. Will the problem of not finding inventor country be solved using fractional count?
Code: Select all
SELECT d.person_ctry_code, SUM(tot_in_ctry/tot_in_patent) AS fractional_count
FROM
(SELECT
t.appln_id,
isnull (t1.person_ctry_code, ' ') AS person_ctry_code,
isnull (t1.tot_in_ctry, 1) AS tot_in_ctry,
isnull (t2.tot_in_patent, 1) AS tot_in_patent
FROM
our_sample t
LEFT OUTER JOIN
(SELECT
c.appln_id,
d.person_ctry_code,
COUNT (d.person_id) AS tot_in_ctry
FROM
tls207_pers_appln c
INNER JOIN tls206_person d ON c.person_id = d.person_id
WHERE
c.invt_seq_nr > 0
GROUP BY c.appln_id, person_ctry_code
) t1 ON t.appln_id = t1.appln_id
LEFT OUTER JOIN
(SELECT
appln_id, MAX(invt_seq_nr) AS tot_in_patent
FROM
tls207_pers_appln
GROUP BY appln_id HAVING MAX(invt_seq_nr) > 0
) t2 ON t.appln_id = t2.appln_id
) our_sample_with_country
GROUP BY person_ctry_code
ORDER BY SUM(tot_in_ctry/tot_in_patent) DESC;
What is the source document for the code ?
Maybe you could restrict the analysis to persons with no missing country data ? (which can indeed yield biased estimates as the missing data is not random)
Weighting to compensate for missing data is a possible solution, what approach to use is up to researchers (and the tools they have available...)
Re: PATSTAT Online Y02E SQL query
Posted: Fri Feb 28, 2020 3:03 pm
by aroh
Hello,
Thank you for the information. I have used the paper titled "An Introduction to the Patstat Database with Example Queries", retrieved from
https://onlinelibrary.wiley.com/doi/ful ... 8462.12073
Re: PATSTAT Online Y02E SQL query
Posted: Fri Feb 28, 2020 4:35 pm
by EPO / PATSTAT Support
No, the weighing in this code is not a sollution for missing person country codes.
Rather have a look at this - chapter 3.4.