Page 1 of 1

Dead links, wrong data?

Posted: Sun Jan 31, 2016 1:00 pm
by nico.rasters
http://www.epo.org/searching/data/data/ ... gular.html used to point to a Kind Code concordance list. It's now a 404 page. This link is mentioned on http://worldwide.espacenet.com/help?loc ... =kindcodes and in the Data Catalog 5.02. You can find the kind codes on https://www.epo.org/searching-for-paten ... gular.html

Another wild goose chase was my search for the PATSTAT Online login url. The PATSTAT Online User Manual (which I now can not find either) and Google point at http://www.epo.org/searching-for-patent ... tstat.html. However, the real link is https://data.epo.org/expert-services/start.html

Also, the Data Catalogue on http://www.epo.org/searching-for-patent ... tstat.html covers the Spring edition instead of Autumn.

In the Autumn version I noticed a new field in tls201_appln, namely earliest_filing_id. It appears to be referring to a priority patent, but this is not the case. For example docdb_family_id 45420505 has three applications 340572313, 375715005, 376610796 which all name themselves as earliest_filing_id. But according to tls204_appln_prior 376610796 is the priority. So for priorities I'd stick with http://gder.phpnet.org/rassenfosse/down ... nt_inv.sql

Re: Dead links, wrong data?

Posted: Sun Jan 31, 2016 1:27 pm
by nico.rasters
Also found (in Autumn 2015):
SELECT tls206_person.*
FROM tls207_pers_appln
INNER JOIN tls206_person ON tls207_pers_appln.person_id = tls206_person.person_id
WHERE tls207_pers_appln.appln_id = 339979633
ORDER BY doc_std_name_id;

Note how SIMPSON STEVEN LEWIS CHARLES occurs twice as doc_std_name. Once erroneously for Julian Richard Davis.

Re: Dead links, wrong data?

Posted: Wed Feb 10, 2016 1:34 pm
by mkracker
Dear Nico,

You really had bad luck. Just 2 days before your post all the web pages of the EPO home page related to patent information has been relaunched, with a different structure and of course different URLs. Consequently, many URLs contained in already published documents became invalid. In the mean time, I updated the most relevant documents (Data Catalogs, PATSTAT Online user manual), so they now contain working links.
The new website structure has also some advantages: You will now find all PATSTAT related data in one place: http://www.epo.org/searching-for-patent ... tstat.html. Most documents, like the PATSTAT Online user manual, are now in the "Downloads" tab.
Sorry for the confusion during the transitional period.

Regarding attribute EARLIEST_FILING_ID in table TLS201_APPLN. Unlike you assumed, this attribute not necessarily refers to a priority.The Data Catalog defines it as:
Derived from the tables
- TLS201_APPLN self-priority
- TLS201_APPLN PCT application
- TLS204_APPLN_PRIOR Paris Convention priority
- TLS205_TECH_REL technical relations
- TLS216_APPLN_CONTN application continuations

Comments: If multiple applications have been filed on the earliest filing date, then conceptually any of these applications can be regarded as the earliest application. Nevertheless, preference is given to the international application. In other words: If there is a PCT application which was filed on the earliest application date, then the APPLN_ID of this PCT application is taken as the EARLIEST_FILING_ID. Otherwise the application with the smallest APPLN_ID will be taken.
In short: It might be the case that there are multiple related applications filed on the same earliest date. This is the case in your example. Surprisingly, there are quite some applications (< 1%) which are filed on the same day as their priority. We will analyse whether in these cases we should prefer the ID of a priority over a non-priority application.

You also mentioned an example of a wrongly assigned DOC_STD_NAME. This is a known issue, which we unfortunately cannot solve. In section 8 "Known Deficiencies" of the Data Catalog it is written:
TLS206_PERSON / TLS906_PERSON: DOCDB standardized names:
Some DOCDB standardised names are wrongly assigned to persons of US patents, because the sequence of persons in the USPTO data source and that in DOCDB sometimes do not match correctly. There is no know fix. When working with US patent applicants or inventors, you should avoid using the DOCDB standardised name. Instead, you might consider other harmonized names available in table TLS906_PERSON.
As a workaround you might look at table TLS906_PERSON, which contains 2 more types of harmonized names.