Page 1 of 1

Help with SQL query_caunt applications per Country

Posted: Wed Jan 04, 2017 4:30 pm
by ani1004
Dear community,

I'm working on a PATSTAT Online (Autumn 2016) patent query. My first aim is to replicate the patent statistics for Germany that are available here https://www.epo.org/about-us/annual-rep ... stics.html for my master thesis.

Here my query that i use for search for all patent applications from Germany in 2015 :

Code: Select all

select *
from tls201_appln 
JOIN tls207_pers_appln on tls201_appln.appln_id= tls207_pers_appln.appln_id
where appln_auth = 'DE' 
and appln_filing_year = 2015
Here i get 72 435 results = applications but in the annual statistics for Germany there is only 24.820
Why do i get such a difference :?: :?

Another example is number of Applications from Siemens

Here my query :

Code: Select all

select *
from tls201_appln 
join tls207_pers_appln  on tls201_appln.appln_id= tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id= tls206_person.person_id
where tls206_person.doc_std_name like 'siemens %'
and appln_filing_year = 2015
and person_ctry_code = 'De'
Here as a result i get 2054 applications but in the annual statistics for Germany there is only 1894

What am i doing wrong?
Thank you a lot in advance.

Re: Help with SQL query_caunt applications per Country

Posted: Fri Jan 13, 2017 8:43 am
by mkracker
I think there are two major issues in what you do:

Firstly, your queries do not return applications, but records as defined in your query.
In your queries you join the application table with TLS207, which has a record for every person (applicant or inventor) of an application. Consequently, if an application has e. g. 1 applicant and 3 inventors, you will receive 4 records for the same application.

Just look at the result sorted by APPLN_ID, and you will clearly see what I mean.

To limit your results to applications only, you must write the SELECT clause like this:
SELECT DISTINCT tls201_appln.*; This will return only application attributes and will eliminate all duplicates.

Secondly, you are comparing apples and pears.
PATSTAT Biblio, which you are using, contains patent applications from patent offices all over the world. In contrast, the file GERMANY.XLS (see bottom of page https://www.epo.org/about-us/annual-rep ... stics.html) you mentioned, provides the numbers of applications from German applicants at the EPO, whether being filed directly at the EPO or coming via the international (PCT) route.

It is not possible to exactly produce the EPO statistic with PATSTAT, because the source data is different, like:
  • PATSTAT contains only applications which have been published. The EPO statistics likely counts all EP applications, even if they are later withdrawn before the publication or if they are not published for other reasons. I assume this is the main reason why you will consistently get a number which is often 10%-15% lower.
  • There can be a long delay for applications between their filing and the date they enter the EP regional phase. The delay for PCT filings entering the EP regional phase, and being available as an EP filing in any of the patent databases can be up to 31 months (in the case of an earlier - national - priority filing is claimed and the subsequent PCT filing made use of the full 12 months); so the total is longer then the classical 18 months for a direct filing without priority.The timeline in the attachment illustrates it.
    As a result, you should consider only applications filed no later than 2013, because for later years the applications are not yet in the database.
  • Whether you consider only the first applicant or all applicants might make a difference when selecting certain applicants only.
To get roughly the same numbers for year 2013 as in the EPO statistics, you could use a query like this:

All EP applications (135 547 records):

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
WHERE appln_auth = 'EP'
AND appln_kind = 'A'  -- exclude PCT filings where the EPO only served as the Receiving Office
and appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
EP applications filed by German applicants (22 380 records):

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND appln_kind = 'A'  -- exclude PCT filings where the EPO only served as the Receiving Office
AND a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
AND applt_seq_nr > 0  -- consider only applicants
AND person_ctry_code = 'DE'  -- applicant must be from Germany
EP applications filed by the German company SIEMENS (1 566 records):

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND appln_kind = 'A'  -- exclude PCT filings where the EPO only served as the Receiving Office
AND a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
AND applt_seq_nr > 0  -- consider only applicants
AND person_ctry_code = 'DE'  -- applicant must be from Germany
AND psn_name like 'SIEMENS%' -- do not require a blank after SIEMENS, otherwise you would miss the plain name "SIEMENS"
Best regards,
Martin, EPO / PATSTAT

Re: Help with SQL query_caunt applications per Country

Posted: Tue Dec 04, 2018 12:29 pm
by emamoncio
Hi, Martin!

Thank you very much for your answer. Though I do not have the exact question, I am asking here since my problem has something to do with applications per country.

I am using the Summer 2018 PATSTAT Database and I want to obtain information on Chinese firms who applied patents to the EPO. I will later then match this with the list of firms that I have to later do my analysis on Chinese firm collaboration.

I have tried to run the query that you have above and adapted it to what I want to get. However, the PATSTAT Database does not give any results. It runs for more than 20 hours and then nothing happens. I have also tried to limit the query to 10 results just to see if it works and this time it works. I wonder what I could to go around this problem.


Query:

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND appln_kind = 'A'  
AND a.appln_id < 900000000 
AND applt_seq_nr > 0  
AND person_ctry_code = 'CN'  
Thank you!

Re: Help with SQL query_caunt applications per Country

Posted: Tue Dec 04, 2018 1:34 pm
by EPO / PATSTAT Support
Your query is perfectly fine. I ran it on PATSTAT Online and it took just 7 seconds to retrieve all 48 901 rows.

In case you used PATSTAT Online, please try it again. Probably the system had some hick-up. Note that PATSTAT Online kills all queries automatically if they take more than 1 hour. So waiting longer than this is no use.

In case you use or own database management system (like MySQL), make sure that all attributes used in the JOINs have an index. Also at least some of the attributes used in the WHERE clause should ideally be indexed. If this does not help, you need to contact your DBA (database administrator) in your organisation.

All the best,
Martin

Re: Help with SQL query_caunt applications per Country

Posted: Tue Jan 09, 2024 8:22 pm
by udsp
Hi all!

I have an issue that is very similar to ani1004’s. Unfortunately, I don’t have a whole lot of experience in writing sql queries so apologies for the very basic nature of my question.

I would like to get (actually just count) the number of unique patent filings for years 2008 and 2009 (by year) with the constraint that at least one applicant is from Germany.

Alternatively, I would like to check the same thing with the constraint that at least one inventor is from Germany.

Moreover, I would also like to run the analysis again when PCT filings are included.

I am struggling right now because to me it seems that the code above is just checking whether the first applicant is German and my attempts to fix that have failed. Moreover, I thought deleting the line

Code: Select all

AND appln_kind = ‘A’
from above would include PCT filings. However, deleting the line does not change my patent count.

I would be super grateful for your help!

Thank you!

Re: Help with SQL query_caunt applications per Country

Posted: Wed Jan 10, 2024 12:47 pm
by EPO / PATSTAT Support
The data format (and data base model) has slightly changed between when the original topic was posted and now. This means that the "AND appln_kind = 'A' (to exclude PCT's) is not needed anymore because PCT applications now have "WO" as application authority. So adding that line will not change anything (for EP applications) anymore.

Code: Select all

SELECT appln_filing_year,appln_auth, count(distinct (a.appln_id))
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth in ('EP','WO') -- change according to needs.
AND a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year between 2008 and 2009
AND applt_seq_nr > 0  -- consider only applicants
-- AND invt_seq_nr > 0 -- use this to get a count that is inventor based. (and remove above line: applt_seq_nr > 0 ).
AND person_ctry_code = 'DE'  -- applicant must be from Germany
group by appln_auth,appln_filing_year
order by appln_auth,appln_filing_year
Keep in mind that you will have overlap (duplicates) between PCT and EP applications.
With other words, a German applicant can have a PCT (WO) application that enters the EP phase, for one and the same invention. So it will be added to the count for EP as well as WO.