Over-merging of Chinese psn_name/psn_id?

This is your chance to raise your questions or discuss challenges, trends and hot topics around patent information from Asian countries.
Post Reply

Posts: 1
Joined: Fri Sep 29, 2023 5:14 pm

Over-merging of Chinese psn_name/psn_id?

Post by bbran » Fri Sep 29, 2023 5:27 pm


I have been trying to link Chinese names to the pinyin inventor names in my PATSTAT dataset. I have been using 'psn_id' for my inventor identification number.

When attempting to match a unique Chinese name to the 'psn_id' for a particular inventor, I found multiple Chinese names connected to the same 'psn_id'. Investigation of the original patents shows that people with different Chinese names have all been harmonized into a single 'psn_id'. Most strange is that the 'psn_level' is 0 for a number of these unique 'person_id's, which I believe means that the 'psn_id' and 'person_id' should be the same.

For an example of this, look at psn_id = 19316927.

I'm curious about whether this a systemic problem across Chinese inventors, or if people think this is an anomaly. If it is a systemic problem, do users generally refrain from using 'psn_id' when analyzing Chinese patents?

Thank you

Posts: 2
Joined: Mon Oct 09, 2023 9:39 am

Re: Over-merging of Chinese psn_name/psn_id?

Post by foodothers » Thu Dec 28, 2023 10:45 am

It seems like you're encountering a data harmonization issue in the PATSTAT dataset, particularly with Chinese inventors. The problem you described—where different Chinese names are linked to the same 'psn_id'—indicates a potential challenge in accurately identifying individual inventors.
Systemic data harmonization issues can arise when different data sources are merged or standardized, leading to inconsistencies in the representation of individuals. In the context of Chinese names, variations in transcription, different versions of names, and cultural differences in name order can contribute to complexities in matching inventors.
"When it comes to luck, you make your own.tunnel rush" — Bruce Springsteen

Posts: 1
Joined: Tue Jan 16, 2024 11:39 am

Re: Over-merging of Chinese psn_name/psn_id?

Post by fleetbench » Thu Feb 15, 2024 10:44 am

It appears that there is a problem with data harmonization in the PATSTAT dataset, namely when it comes to inventors from China. It may be difficult to correctly identify individual inventors when several Chinese names are associated with the same "psn_id," as you said in your problem description.

Post Reply