Page 1 of 1

PATSTAT Online Y02E SQL query

Posted: Wed Feb 26, 2020 9:09 pm
by aroh
Hello,
I am using the PATSTAT Online database 2019 Autumn edition. I want to extract patent data by linking the 4 tables - 201, 224, 207 and 206. I want to see country-wise patents related to wind energy filed for the time period 1999 - 2018 tagged using the Y02E scheme. I want information related appln_id, earliest_filing_date, person_country_code, apple_auth.

Code: Select all

SELECT DISTINCT a.appln_id, a.appln_auth, a.appln_nr, a.appln_kind, a.earliest_filing_date, b.cpc_class_symbol, c.invt_seq_nr, d.person_name, d.person_ctry_code
FROM tls201_appln a JOIN tls224_appln_cpc b ON a.appln_id = b.appln_id
LEFT OUTER JOIN tls207_pers_appln c on a.appln_id = c.appln_id
LEFT OUTER JOIN tls206_person d ON c.person_id = d.person_id
WHERE b.cpc_class_symbol = 'Y02E  10/7%' AND c.invt_seq_nr > 0
ORDER BY a.appln_auth, a.earliest_filing_date, a.appln_id, c.invt_seq_nr, d.person_ctry_code
This is the code I have used and I wanted to know the changes required to the code.
Thank you.

Re: PATSTAT Online Y02E SQL query

Posted: Thu Feb 27, 2020 2:42 pm
by EPO / PATSTAT Support
Hello Aroh,
there is a small SQL syntax error in your query.
A wildcard % comes allways in combination with LIKE. And the number of digits before the / is exaclty eight.
So: b.cpc_class_symbol like 'Y02E 10/7%' will work.
There are 2 spaces beteen the E and the 10.
Instead of a wildcard, you could also use:
WHERE left (b.cpc_class_symbol, 10) = 'Y02E 10/7' AND c.invt_seq_nr > 0

I also added a STRING_AGG to group the CPC codes into 2 field, which groups the rows.

Code: Select all

SELECT DISTINCT a.appln_id, a.appln_auth, a.appln_nr, a.appln_kind, a.earliest_filing_date, 
STRING_AGG (cast( b.cpc_class_symbol as NVARCHAR(MAX)), '; ') cpc_GROUP,
 c.invt_seq_nr, d.person_name, d.person_ctry_code
FROM tls201_appln a JOIN tls224_appln_cpc b ON a.appln_id = b.appln_id
LEFT OUTER JOIN tls207_pers_appln c on a.appln_id = c.appln_id
LEFT OUTER JOIN tls206_person d ON c.person_id = d.person_id
WHERE left (b.cpc_class_symbol, 10) =  'Y02E  10/7' AND c.invt_seq_nr > 0
and appln_filing_year between 1999 and 2018 
group by a.appln_id, a.appln_auth, a.appln_nr, a.appln_kind, a.earliest_filing_date, 
 c.invt_seq_nr, d.person_name, d.person_ctry_code
ORDER BY a.appln_auth, a.earliest_filing_date,  a.appln_id, c.invt_seq_nr, d.person_ctry_code

Re: PATSTAT Online Y02E SQL query

Posted: Thu Feb 27, 2020 4:53 pm
by aroh
Hello,
Thank you for your prompt response and help.
If I use - WHERE left (b.cpc_class_symbol, 10) = 'Y02E 10/7' AND c.invt_seq_nr > 0 , will it allow me to select all the patent applications tagged with Y02E10/70-766 under the Y02E tag for wind energy?
Also, can you please explain the function of STRING_AGG in the code? I haven't seen its use in the EPO document 'Using PATSTAT with SQL for beginners'. Thank you once again.

Re: PATSTAT Online Y02E SQL query

Posted: Thu Feb 27, 2020 5:15 pm
by aroh
Hello,
This is in continuation to my earlier query. There are a lot of patent applications where the inventor country is not found. I saw the following code in the PATSTAT manual. Will the problem of not finding inventor country be solved using fractional count?

Code: Select all

SELECT d.person_ctry_code, SUM(tot_in_ctry/tot_in_patent) AS fractional_count
FROM 
         (SELECT 
                      t.appln_id, 
                      isnull (t1.person_ctry_code, ' ') AS person_ctry_code,
                      isnull (t1.tot_in_ctry, 1) AS tot_in_ctry,
                      isnull (t2.tot_in_patent, 1) AS tot_in_patent
                      FROM 
                      our_sample t                   
                 LEFT OUTER JOIN
           (SELECT
                     c.appln_id,
                     d.person_ctry_code,
                     COUNT (d.person_id) AS tot_in_ctry
            FROM
                      tls207_pers_appln c
                      INNER JOIN tls206_person d ON c.person_id = d.person_id 
           WHERE 
                     c.invt_seq_nr > 0
           GROUP BY c.appln_id, person_ctry_code
           ) t1 ON t.appln_id = t1.appln_id 
                     LEFT OUTER JOIN
           (SELECT
                     appln_id, MAX(invt_seq_nr) AS tot_in_patent
            FROM 
                    tls207_pers_appln
            GROUP BY appln_id HAVING MAX(invt_seq_nr) > 0
            )  t2 ON t.appln_id = t2.appln_id
            ) our_sample_with_country        
GROUP BY person_ctry_code
ORDER BY SUM(tot_in_ctry/tot_in_patent) DESC;

Re: PATSTAT Online Y02E SQL query

Posted: Fri Feb 28, 2020 10:05 am
by EPO / PATSTAT Support
aroh wrote:
Thu Feb 27, 2020 4:53 pm
Hello,
Thank you for your prompt response and help.
If I use - WHERE left (b.cpc_class_symbol, 10) = 'Y02E 10/7' AND c.invt_seq_nr > 0 , will it allow me to select all the patent applications tagged with Y02E10/70-766 under the Y02E tag for wind energy?
Also, can you please explain the function of STRING_AGG in the code? I haven't seen its use in the EPO document 'Using PATSTAT with SQL for beginners'. Thank you once again.
Yes, Y02E10/766 will be included with the above WHERE clause. The "PATSTAT with SQL for beginners" only covers a fraction of the possible SQL functions. STRING_AGG is a fairly new function which comes in handy to group rows in the output. There are many good SQL-learning websites that provide sample queries. I often use https://www.w3schools.com/sql/default.asp to look up SQL syntax.

Re: PATSTAT Online Y02E SQL query

Posted: Fri Feb 28, 2020 12:13 pm
by EPO / PATSTAT Support
aroh wrote:
Thu Feb 27, 2020 5:15 pm
Hello,
This is in continuation to my earlier query. There are a lot of patent applications where the inventor country is not found. I saw the following code in the PATSTAT manual. Will the problem of not finding inventor country be solved using fractional count?

Code: Select all

SELECT d.person_ctry_code, SUM(tot_in_ctry/tot_in_patent) AS fractional_count
FROM 
         (SELECT 
                      t.appln_id, 
                      isnull (t1.person_ctry_code, ' ') AS person_ctry_code,
                      isnull (t1.tot_in_ctry, 1) AS tot_in_ctry,
                      isnull (t2.tot_in_patent, 1) AS tot_in_patent
                      FROM 
                      our_sample t                   
                 LEFT OUTER JOIN
           (SELECT
                     c.appln_id,
                     d.person_ctry_code,
                     COUNT (d.person_id) AS tot_in_ctry
            FROM
                      tls207_pers_appln c
                      INNER JOIN tls206_person d ON c.person_id = d.person_id 
           WHERE 
                     c.invt_seq_nr > 0
           GROUP BY c.appln_id, person_ctry_code
           ) t1 ON t.appln_id = t1.appln_id 
                     LEFT OUTER JOIN
           (SELECT
                     appln_id, MAX(invt_seq_nr) AS tot_in_patent
            FROM 
                    tls207_pers_appln
            GROUP BY appln_id HAVING MAX(invt_seq_nr) > 0
            )  t2 ON t.appln_id = t2.appln_id
            ) our_sample_with_country        
GROUP BY person_ctry_code
ORDER BY SUM(tot_in_ctry/tot_in_patent) DESC;
What is the source document for the code ?
Maybe you could restrict the analysis to persons with no missing country data ? (which can indeed yield biased estimates as the missing data is not random)
Weighting to compensate for missing data is a possible solution, what approach to use is up to researchers (and the tools they have available...)

Re: PATSTAT Online Y02E SQL query

Posted: Fri Feb 28, 2020 3:03 pm
by aroh
Hello,
Thank you for the information. I have used the paper titled "An Introduction to the Patstat Database with Example Queries", retrieved from https://onlinelibrary.wiley.com/doi/ful ... 8462.12073

Re: PATSTAT Online Y02E SQL query

Posted: Fri Feb 28, 2020 4:35 pm
by EPO / PATSTAT Support
aroh wrote:
Fri Feb 28, 2020 3:03 pm
Hello,
Thank you for the information. I have used the paper titled "An Introduction to the Patstat Database with Example Queries", retrieved from https://onlinelibrary.wiley.com/doi/ful ... 8462.12073
No, the weighing in this code is not a sollution for missing person country codes.
Rather have a look at this - chapter 3.4.