Page 1 of 1

proper character set for tls203 and tls206 of patstat 2020 Spring version

Posted: Wed Sep 16, 2020 8:03 am
by tjung
Hi all,

I have loading problems for some files for tls203 and tls206 tables.
I'm running MySQL 5.6 on CentOS 6.9.
It seems that the problems are related with setting a proper character set
When I set utfmb4 only 8,133,450 rows from tls206_part02.csv were loaded.
When I set latin1, the loaded rows count only 154000 or so.

I also have the similar problems for tls203_part03.csv and tls203_part04.csv.

Is there anybody who can help me?

Thanks in advance.
Hanyang

Re: proper character set for tls203 and tls206 of patstat 2020 Spring version

Posted: Thu Sep 17, 2020 10:31 am
by mkracker
Hi Hanyang,

The general recommendation we give in the relevant documentation "How to load PATSTAT data" (see bottom of page https://www.epo.org/searching-for-paten ... tstat.html) is this:
When using MySQL: You must select character set utf8mb4, because MySQL's character set utf8 is restricted to 3 bytes (cf. https://dev.mysql.com/doc/refman/5.5/en ... f8mb4.html).
We don't use MySQL, so in case this does not work for you, I hope another PATSTAT user with MySQL user can help you. Please keep me informed in case of relevant findings.

Best regards,
Martin