Post by lucagaglia » Sun Jul 26, 2020 4:23 pm

Dear community,

is it possible to merge PATSTAT with Orbis?

I saw that the variable tls206_person.han_harmonized report whether the name tls206_person.han_name of the applicant has been armonizes to be matched with Orbis. I would have guessed that it is possible to merge the datasets using tls206_person.han_name and possibly tls206_person.han_id.

However, after inspecting some examples, it looks like there is no perfect match between names in Orbis and those obtained from tls206_person.han_name, as usually there is at least some letters or abbreviations that differ between the two sources.

Is there a proper/standard way to merge the dataset?

How about tls206_person.doc_std_name and tls206_person.psn_name? What's the difference with respect to han and could those be used too in the merge?


Re: Question: how to merge PATSTAT with Orbis

Post by mkracker » Fri Jul 31, 2020 12:52 pm

The HAN (Harmonized Applicant Name) attributes in PATSTAT are dervied from a public data set (see attachment for details) of the OECD - so all credits for HAN go to them.

If the PATSTSTAT attribute HAN_HARMONIZED = 2, then the HAN_NAME should match with a name in ORBIS. But I do not have access to ORBIS, so I cannot confirm this. But on the other hand, to my knowledge ORBIS contains a reference to PATSTAT's PERSON_ID attribute. I suggest you check this.

The HAN_ID is a more or less random number.

The PSN (PATSTAT Standardized Names) attributes and the DOC_STD_NAME on table TLS206_PERSON also contain harmonized names , but they have been created by different processes. No matching with existing data sets are done. The PATSTAT Data Catalog contains more details about the harmonization methods.

There are unlimited ways to harmonized names, but there is no single "correct" way. You have to check which set of harmonized names one is best for your purpose.
Martin Kracker / EPO

