How to count distinct patent applications?
Posted: Fri Nov 09, 2012 10:21 pm
After years of working with the 2009 version of PATSTAT, I recently switched to the latest one. By looking at the documentation of the new version, I feel a bit uncomfortable on how to count distinct patent applications - and I see I'm not the only one having this question. In previous versions, I relied on APPLN_ID to identify distinct patent applications (after filtering out artificial patent applications due to missing priority and citation documents).
The warning at page 52 of the DATA CATALOG worries me ("Warning: Please consider that the application kind code landscape can be at times complicated, eg for German applications ...").
If I understand this correctly, we cannot rely anymore on APPLN_ID to uniquely identify patent applications? (different APPLN_ID’s can point to the same real world patent application).
Technically speaking, a new APPLN_ID is assigned in PATSTAT for every unique combination of patent authority, application number, and application kind. Now (again ignoring artificial patent applications), is it possible that the same real world patent application (and hence the same combination of application authority and number) is present in table APPLN with different application document kinds (and hence different APPLN_ID’s)? The warning about Germany suggests this is possible.
If we check for records in table APPLN with multiple document kinds for the same application authority/number combination, millions of records are found. I presumed these were cases where different types of patents accidentally have the same application number because the respective patent authority uses different numbering schemes/sequences for every type of patent (e.g. JP 2007001482: the same combination of application authority and number is present 3 times in table APPLN with 3 different application kinds: patent, design patent and PCT filing, hence 3 different APPLN_ID’s, makes sense).
If we cannot rely on this – if there are cases where the same authority/number combination with multiple application kinds do refer to the same real world application - how can we identify distinct patent applications?
I'm also confused by the documentation on application document kinds (see again page 52: document kind D K L M N = dummy for de-duplicating). I looked up an example: EP 91904833, 2 different application document kinds (A and D), and hence 2 different APPLN_ID’s. When I look up this patent in Espacenet, it looks like both patents are identical, but with two different patent publication numbers, and the same application number, except for a “D” in the last position of the application number (is this duplication an EPO error?).
Does this mean that every time we find a document kind D, K, L, M and N in table APPLN, we can ignore that APPLN_ID and only count the application with the same authority/number and application document kind different from D, K, L, M and N? (only a limited number of cases are found for EPO and USPTO, but a large number for the German patent office).
Is Germany an exception, and if so, how to count distinct applications for the German patent office (the documentation refers to the forum for a sample query for German applications, but I did not find it).
Regards,
Tom Magerman
The warning at page 52 of the DATA CATALOG worries me ("Warning: Please consider that the application kind code landscape can be at times complicated, eg for German applications ...").
If I understand this correctly, we cannot rely anymore on APPLN_ID to uniquely identify patent applications? (different APPLN_ID’s can point to the same real world patent application).
Technically speaking, a new APPLN_ID is assigned in PATSTAT for every unique combination of patent authority, application number, and application kind. Now (again ignoring artificial patent applications), is it possible that the same real world patent application (and hence the same combination of application authority and number) is present in table APPLN with different application document kinds (and hence different APPLN_ID’s)? The warning about Germany suggests this is possible.
If we check for records in table APPLN with multiple document kinds for the same application authority/number combination, millions of records are found. I presumed these were cases where different types of patents accidentally have the same application number because the respective patent authority uses different numbering schemes/sequences for every type of patent (e.g. JP 2007001482: the same combination of application authority and number is present 3 times in table APPLN with 3 different application kinds: patent, design patent and PCT filing, hence 3 different APPLN_ID’s, makes sense).
If we cannot rely on this – if there are cases where the same authority/number combination with multiple application kinds do refer to the same real world application - how can we identify distinct patent applications?
I'm also confused by the documentation on application document kinds (see again page 52: document kind D K L M N = dummy for de-duplicating). I looked up an example: EP 91904833, 2 different application document kinds (A and D), and hence 2 different APPLN_ID’s. When I look up this patent in Espacenet, it looks like both patents are identical, but with two different patent publication numbers, and the same application number, except for a “D” in the last position of the application number (is this duplication an EPO error?).
Does this mean that every time we find a document kind D, K, L, M and N in table APPLN, we can ignore that APPLN_ID and only count the application with the same authority/number and application document kind different from D, K, L, M and N? (only a limited number of cases are found for EPO and USPTO, but a large number for the German patent office).
Is Germany an exception, and if so, how to count distinct applications for the German patent office (the documentation refers to the forum for a sample query for German applications, but I did not find it).
Regards,
Tom Magerman