Inconsistent data

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

Eduardo Mercadante
Posts: 12
Joined: Thu Sep 17, 2020 12:04 pm

Inconsistent data

Post by Eduardo Mercadante » Fri Feb 24, 2023 4:56 pm

Hi,

On Dec 2, 2022, I extract from the Autumn 2022 Edition a sample of applications filed in Brazil. Today, I extracted another sample from the same Edition which should include these same applications but also twins (using docdb_family_id) filed in other offices. To my surprise, the two sets of BR application didn't match. There were 27 families with BR applications in the first set that are not in the second set.

I decided to investigate this, so I looked at one of these weird families in my first set. Originally, I had extracted appln_id 3878826, which belongs to the docdb_family_id 38370342. Indeed, the live version of PATSTAT Online has this application, but now this application is listed under docdb_family_id 34982347. Apart from this, most of the other variables in tls201_appln are also different from the extraction two months ago, including the earliest_filing_id, while the inpadoc_family_id remains the same. In addition, I checked the legal event data on tls231_inpadoc_legal_event for this application and now there is a more recent event, from Jun 7, 2022, which was not in the extraction from December.

Also, I checked the family indicated in the original extraction. There is no longer any application listed in PATSTAT under that family.

I would appreciate if someone could explain to me how this could have happened. Are there internal updates within the same edition of PATSTAT? How can these internal updates change so many variables (that I know, bibliographic and legal event)?

All the best,
Eduardo Mercadante
PhD Candidate - London School of Economics


EPO / PATSTAT Support
Posts: 440
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Inconsistent data

Post by EPO / PATSTAT Support » Fri Feb 24, 2023 6:09 pm

Hello Eduardo,
intermediate updates on the same data base are normally never done, and did not happen this time.
PATSTAT online has 2 editions available: Spring 2022 and Autumn 2022.

If you run :

Code: Select all

select * from tls201_appln where appln_id = 3878826
on Spring 2022 you will get docdb family id = 38370342, for the Autumn edition you will get 34982347.

As a general rule, the value of the DOCDB_FAMILY_ID will not change. It will be the same
across editions of DOCDB and PATSTAT. However, corrections to priority numbers or
changes in the priority pictures (priority numbers changing from active to inactive or vice/versa) might lead to a change in the family ID of a given application.
In this case, we can also observe that in the latest release, a priority filing was linked to the BR applications, the BR application therefore got added to the correct family, and the old family_id became irrelevant. (And does not have any data in Autum 2022.)

Code: Select all

SELECT *   FROM  tls204_appln_prior where appln_id = 3878826;
So I assume that your first data extraction was on the Spring version, and now you are working on the Autumn version.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Eduardo Mercadante
Posts: 12
Joined: Thu Sep 17, 2020 12:04 pm

Re: Inconsistent data

Post by Eduardo Mercadante » Fri Feb 24, 2023 6:33 pm

Hi,

Thanks for the quick and helpful answer. Indeed, I must have not noticed that the Spring 2022 Edition was selected in the initial screen. My data stops in Dec 2021.

I did not know this much could change between editions. And I'm also glad to know that intermediate updates are rare.

All the best,
Eduardo Mercadante
PhD Candidate - London School of Economics


Post Reply