proper character set for tls203 and tls206 of patstat 2020 Spring version

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

tjung
Posts: 1
Joined: Thu Dec 21, 2017 9:01 am

proper character set for tls203 and tls206 of patstat 2020 Spring version

Post by tjung » Wed Sep 16, 2020 8:03 am

Hi all,

I have loading problems for some files for tls203 and tls206 tables.
I'm running MySQL 5.6 on CentOS 6.9.
It seems that the problems are related with setting a proper character set
When I set utfmb4 only 8,133,450 rows from tls206_part02.csv were loaded.
When I set latin1, the loaded rows count only 154000 or so.

I also have the similar problems for tls203_part03.csv and tls203_part04.csv.

Is there anybody who can help me?

Thanks in advance.
Hanyang


mkracker
Posts: 120
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: proper character set for tls203 and tls206 of patstat 2020 Spring version

Post by mkracker » Thu Sep 17, 2020 10:31 am

Hi Hanyang,

The general recommendation we give in the relevant documentation "How to load PATSTAT data" (see bottom of page https://www.epo.org/searching-for-paten ... tstat.html) is this:
When using MySQL: You must select character set utf8mb4, because MySQL's character set utf8 is restricted to 3 bytes (cf. https://dev.mysql.com/doc/refman/5.5/en ... f8mb4.html).
We don't use MySQL, so in case this does not work for you, I hope another PATSTAT user with MySQL user can help you. Please keep me informed in case of relevant findings.

Best regards,
Martin
-------------------------------------------
Martin Kracker / EPO


Post Reply