Page 1 of 1

Help with SQL query_caunt applications per Country

Posted: Wed Jan 04, 2017 4:30 pm
by ani1004
Dear community,

I'm working on a PATSTAT Online (Autumn 2016) patent query. My first aim is to replicate the patent statistics for Germany that are available here https://www.epo.org/about-us/annual-rep ... stics.html for my master thesis.

Here my query that i use for search for all patent applications from Germany in 2015 :

select *
from tls201_appln
JOIN tls207_pers_appln on tls201_appln.appln_id= tls207_pers_appln.appln_id
where appln_auth = 'DE'
and appln_filing_year = 2015

Here i get 72 435 results = applications but in the annual statistics for Germany there is only 24.820
Why do i get such a difference :?: :?

Another example is number of Applications from Siemens

Here my query :

select *
from tls201_appln
join tls207_pers_appln on tls201_appln.appln_id= tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id= tls206_person.person_id
where tls206_person.doc_std_name like 'siemens %'
and appln_filing_year = 2015
and person_ctry_code = 'De'

Here as a result i get 2054 applications but in the annual statistics for Germany there is only 1894

What am i doing wrong?
Thank you a lot in advance.

Re: Help with SQL query_caunt applications per Country

Posted: Fri Jan 13, 2017 8:43 am
by mkracker
I think there are two major issues in what you do:

Firstly, your queries do not return applications, but records as defined in your query.
In your queries you join the application table with TLS207, which has a record for every person (applicant or inventor) of an application. Consequently, if an application has e. g. 1 applicant and 3 inventors, you will receive 4 records for the same application.

Just look at the result sorted by APPLN_ID, and you will clearly see what I mean.

To limit your results to applications only, you must write the SELECT clause like this:
SELECT DISTINCT tls201_appln.*; This will return only application attributes and will eliminate all duplicates.

Secondly, you are comparing apples and pears.
PATSTAT Biblio, which you are using, contains patent applications from patent offices all over the world. In contrast, the file GERMANY.XLS (see bottom of page https://www.epo.org/about-us/annual-rep ... stics.html) you mentioned, provides the numbers of applications from German applicants at the EPO, whether being filed directly at the EPO or coming via the international (PCT) route.

It is not possible to exactly produce the EPO statistic with PATSTAT, because the source data is different, like:
  • PATSTAT contains only applications which have been published. The EPO statistics likely counts all EP applications, even if they are later withdrawn before the publication or if they are not published for other reasons. I assume this is the main reason why you will consistently get a number which is often 10%-15% lower.
  • There can be a long delay for applications between their filing and the date they enter the EP regional phase. The delay for PCT filings entering the EP regional phase, and being available as an EP filing in any of the patent databases can be up to 31 months (in the case of an earlier - national - priority filing is claimed and the subsequent PCT filing made use of the full 12 months); so the total is longer then the classical 18 months for a direct filing without priority.The timeline in the attachment illustrates it.
    As a result, you should consider only applications filed no later than 2013, because for later years the applications are not yet in the database.
  • Whether you consider only the first applicant or all applicants might make a difference when selecting certain applicants only.
To get roughly the same numbers for year 2013 as in the EPO statistics, you could use a query like this:

All EP applications (135 547 records):

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
WHERE appln_auth = 'EP'
AND appln_kind = 'A'  -- exclude PCT filings where the EPO only served as the Receiving Office
and appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
EP applications filed by German applicants (22 380 records):

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND appln_kind = 'A'  -- exclude PCT filings where the EPO only served as the Receiving Office
AND a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
AND applt_seq_nr > 0  -- consider only applicants
AND person_ctry_code = 'DE'  -- applicant must be from Germany
EP applications filed by the German company SIEMENS (1 566 records):

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND appln_kind = 'A'  -- exclude PCT filings where the EPO only served as the Receiving Office
AND a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
AND applt_seq_nr > 0  -- consider only applicants
AND person_ctry_code = 'DE'  -- applicant must be from Germany
AND psn_name like 'SIEMENS%' -- do not require a blank after SIEMENS, otherwise you would miss the plain name "SIEMENS"
Best regards,
Martin, EPO / PATSTAT

Re: Help with SQL query_caunt applications per Country

Posted: Tue Dec 04, 2018 12:29 pm
by emamoncio
Hi, Martin!

Thank you very much for your answer. Though I do not have the exact question, I am asking here since my problem has something to do with applications per country.

I am using the Summer 2018 PATSTAT Database and I want to obtain information on Chinese firms who applied patents to the EPO. I will later then match this with the list of firms that I have to later do my analysis on Chinese firm collaboration.

I have tried to run the query that you have above and adapted it to what I want to get. However, the PATSTAT Database does not give any results. It runs for more than 20 hours and then nothing happens. I have also tried to limit the query to 10 results just to see if it works and this time it works. I wonder what I could to go around this problem.


Query:

SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND appln_kind = 'A'
AND a.appln_id < 900000000
AND applt_seq_nr > 0
AND person_ctry_code = 'CN'

Thank you!

Re: Help with SQL query_caunt applications per Country

Posted: Tue Dec 04, 2018 1:34 pm
by EPO / PATSTAT Support
Your query is perfectly fine. I ran it on PATSTAT Online and it took just 7 seconds to retrieve all 48 901 rows.

In case you used PATSTAT Online, please try it again. Probably the system had some hick-up. Note that PATSTAT Online kills all queries automatically if they take more than 1 hour. So waiting longer than this is no use.

In case you use or own database management system (like MySQL), make sure that all attributes used in the JOINs have an index. Also at least some of the attributes used in the WHERE clause should ideally be indexed. If this does not help, you need to contact your DBA (database administrator) in your organisation.

All the best,
Martin