2018b - 2019a person concordance table taking into account old person id's

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

Posts: 227
Joined: Thu Feb 22, 2007 5:33 pm

2018b - 2019a person concordance table taking into account old person id's

Post by EPO / PATSTAT Support » Wed May 15, 2019 3:13 pm

The new PATSTAT 2019 Spring edition is now published. In this release, we have added names in original language (in asian, cyrillic, arab characters) to the person records. There are many cases where persons with different names in original language were previously merged together because they have the same latinised name (see example “OKOMOTO, DAISUKE’ below). Now these names are split into multiple person records, each with their own person_id. The old record which does not have an original language name keeps the original PERSON_ID; new records are added with new PERSON_IDs.

32539715;64987642;OKAMOTO, Daisuke;岡元 大輔;;JP
32539715;66933187;OKAMOTO, DAISUKE;岡本 大輔;;JP
32539715;67305457;OKAMOTO, Daisuke;ИСИБАСИ Масаюки;;JP
32539715;67895261;OKAMOTO, DAISUKE;岡本 大典;;JP

In the file at this link you will find a mapping table between old records and new records.
The file is 1.3 GB large and is available untill mid-June from this link:
http://webserv.epo.org/download.nsf/0/8 ... enDocument

The file is provided as a one-off exercise for name harmonisation work in the transition between PATSTAT 2018b (Autumn) and the new PATSTAT 2019a (Spring) releases.

Actually, you can (re-)create this file with the query below:

Code: Select all

-- In 2019a some person records of 2018b are split into multiple person records if these
-- person also do have a name in original language (asian, cyrillic or arab character).
-- This query provided a mapping between the PERSON_IDs of 2018b and 2019b.
-- Note that PERSON_IDs do not change in case there is no name in original language.

-- Run time about 15 minutes on a PATSTAT notebook.
-- Indexes on the PERSON_NAMES are a must, otherwise run time is never-ending.
SELECT old.person_id as person_id_2018b, new.person_id as person_id_2019a, 
new.person_name, new.person_name_orig_lg, new.person_address, new.person_ctry_code
FROM db_old..tls206_person old
JOIN db_new..tls206_person new on old.person_name = new.person_name
WHERE old.person_address = new.person_address
AND old.person_ctry_code = new.person_ctry_code
ORDER BY person_id_2018b, person_id_2019a
-- Example: PERSON_ID 48917443 in 2018b is now split into 4 PERSON_IDs in 2019a
-- Other examples:
-- 48917246 (--> 5 records)
-- 4891735, 48916988 (--> 4 records)
-- 48917410, 48917304 (--> 3 records)
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org

Post Reply