I think there are two major issues in what you do:
Firstly, your queries do not return
applications, but records as defined in your query.
In your queries you join the application table with TLS207, which has a record for every person (applicant or inventor) of an application. Consequently, if an application has e. g. 1 applicant and 3 inventors, you will receive 4 records for the same application.
Just look at the result sorted by APPLN_ID, and you will clearly see what I mean.
To limit your results to applications only, you must write the SELECT clause like this:
SELECT DISTINCT tls201_appln.*; This will return only application attributes and will eliminate all duplicates.
Secondly, you are comparing apples and pears.
PATSTAT Biblio, which you are using, contains
patent applications from patent offices all over the world. In contrast, the file GERMANY.XLS (see bottom of page
https://www.epo.org/about-us/annual-rep ... stics.html) you mentioned, provides the numbers of applications from German applicants
at the EPO, whether being filed directly at the EPO or coming via the international (PCT) route.
It is not possible to exactly produce the EPO statistic with PATSTAT, because the source data is different, like:
- PATSTAT contains only applications which have been published. The EPO statistics likely counts all EP applications, even if they are later withdrawn before the publication or if they are not published for other reasons. I assume this is the main reason why you will consistently get a number which is often 10%-15% lower.
- There can be a long delay for applications between their filing and the date they enter the EP regional phase. The delay for PCT filings entering the EP regional phase, and being available as an EP filing in any of the patent databases can be up to 31 months (in the case of an earlier - national - priority filing is claimed and the subsequent PCT filing made use of the full 12 months); so the total is longer then the classical 18 months for a direct filing without priority.The timeline in the attachment illustrates it.
As a result, you should consider only applications filed no later than 2013, because for later years the applications are not yet in the database.
- Whether you consider only the first applicant or all applicants might make a difference when selecting certain applicants only.
To get roughly the same numbers for year 2013 as in the EPO statistics, you could use a query like this:
All EP applications (135 547 records):
Code: Select all
SELECT DISTINCT a.*
FROM tls201_appln a
WHERE appln_auth = 'EP'
AND appln_kind = 'A' -- exclude PCT filings where the EPO only served as the Receiving Office
and appln_id < 900000000 -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
EP applications filed by German applicants (22 380 records):
Code: Select all
SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND appln_kind = 'A' -- exclude PCT filings where the EPO only served as the Receiving Office
AND a.appln_id < 900000000 -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
AND applt_seq_nr > 0 -- consider only applicants
AND person_ctry_code = 'DE' -- applicant must be from Germany
EP applications filed by the German company SIEMENS (1 566 records):
Code: Select all
SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND appln_kind = 'A' -- exclude PCT filings where the EPO only served as the Receiving Office
AND a.appln_id < 900000000 -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
AND applt_seq_nr > 0 -- consider only applicants
AND person_ctry_code = 'DE' -- applicant must be from Germany
AND psn_name like 'SIEMENS%' -- do not require a blank after SIEMENS, otherwise you would miss the plain name "SIEMENS"
Best regards,
Martin, EPO / PATSTAT