Hello Tim,
it is a bit more complicated.
In fact, it is always rather complicated to replicate statistics published by patent offices because the approach (or interpretation of the description) and the used data might be different.
Without reading the whole DPMA report I would ask the following questions:
the document specifies "applications received at the DPMA": does that include filings that were withdrawn before the publication ? This is what we do at the EPO, and the result is that official published figures are higher +10% at the EPO, and results can not be replicated.
the document specifies "German companies and inventors filed 47,047 patent applications at the DPMA", strictly spoken, I would assume this to be at least 1 German applicant or at least 1 German Inventor or both and more. (PATSTAT allows all variations, but I am not 100% sure how DPMA published figures were produced)
the document specifies :"DPMA direct applications and DPMA PCT national phase": for 2010 does that mean the date that the PCT entered the national phase, or the PCT filing date ? (EPO uses entry into EP phase), another question would be how the analyst dealt with the priority filings and utility models: there are about 1800 PCT's filed in 2010 at DPMA that have a DE utility model or patent as priority. Most have a date in 2009 (12 months before the PCT), but if you put 2009 besides 2010 you will have double counts at "invention level" but not at "application level" . (Patent offices normally publish data to represent "work done", so applications - versus - innovative capacity "inventions" which we want to measure for econometrics.)
With regards to you query, it needs a small correction: PCT applications filed at DPMA can be identified via the attribute "Receiving office" and not as in the past via kind code = "W".
Code: Select all
SELECT TLS201_APPLN.APPLN_FILING_YEAR, COUNT( DISTINCT TLS201_APPLN.APPLN_ID) AS Applications
FROM TLS206_PERSON
INNER JOIN TLS207_PERS_APPLN ON TLS206_PERSON.PERSON_ID = TLS207_PERS_APPLN.PERSON_ID
INNER JOIN TLS201_APPLN ON TLS207_PERS_APPLN.APPLN_ID = TLS201_APPLN.APPLN_ID
WHERE TLS201_APPLN.APPLN_FILING_YEAR between 2000 and 2017
AND TLS206_PERSON.PERSON_CTRY_CODE='DE'
AND (TLS201_APPLN.APPLN_AUTH='DE' or receiving_office = 'DE')
GROUP BY TLS201_APPLN.APPLN_FILING_YEAR
HAVING COUNT(TLS201_APPLN.APPLN_ID)>0
order by TLS201_APPLN.APPLN_FILING_YEAR
But on the other hand, PCT's that have entered the German national phase are in PATSTAT as normal DE applications (they will have an internat_Appln_id <> 0), so I would think that there is no need to take into account the PCT applications at all.
One more observation: if you look at the the DE applications filed in 2010 that originate from a PCT, and you compare that with WO application where there is a legal status event indicating that the WO application entered the DE national phase you will see that the sets are
far from equal. Experience has learned that one can take the "common denominator" from the 2 sets to get a more complete overview.
PCT data:
Code: Select all
SELECT distinct tls201_appln.*, tls231_inpadoc_legal_event.*
FROM TLS206_PERSON
INNER JOIN TLS207_PERS_APPLN ON TLS206_PERSON.PERSON_ID = TLS207_PERS_APPLN.PERSON_ID
INNER JOIN TLS201_APPLN ON TLS207_PERS_APPLN.APPLN_ID = TLS201_APPLN.APPLN_ID
join tls231_inpadoc_legal_event on tls201_appln.appln_id = tls231_inpadoc_legal_event.appln_id
WHERE TLS201_APPLN.APPLN_FILING_YEAR = 2010
AND TLS206_PERSON.PERSON_CTRY_CODE='DE'
AND (TLS201_APPLN.APPLN_AUTH='WO')
and event_code = 'wwe'
and ref_doc_auth = 'DE'
order by ref_doc_nr
PCT-indication for DE filings (called national phase applications)
Code: Select all
select distinct tls201_appln.* from tls201_appln
INNER JOIN TLS207_PERS_APPLN ON tls201_appln.appln_id = TLS207_PERS_APPLN.appln_id
INNER JOIN tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
where internat_appln_id <>0 and appln_auth = 'DE' and appln_filing_year = 2010 and TLS206_PERSON.PERSON_CTRY_CODE='DE'
and ipr_type = 'PI'
So if my presumption on the 10% unpublished is correct, then you would have 45.224 applications, and the common set could be source for the missing 1800. ( I leave it to you to check this in detail.)
An easier short cut would be if DPMA gives you the set of 47.047 DE applications, and you simply check which ones are not in PATSTAT, and try to find to find out why they are not available. The data which EPO receives from DPMA is of such good quality that differences are mostly due to "different ways of counting" and not missing data.