Wrong UTF-8 caracters

This space is made available to users of Open Patent Services (OPS) web-service and now also to users of EPO’s bulk data subscription products such as 14. EPO worldwide bibliographic database (DOCDB), 14.11 EPO worldwide legal status database (INPADOC), 14.12 EP full text data, 14.1 EP bibliographic data (EBD)and more.

Users can ask each other questions, exchange experiences and solutions, post ideas. The moderator will use this space to announce changes or other relevant information.

martien
Posts: 31
Joined: Tue Jul 16, 2013 1:45 pm
Contact:

Re: Wrong UTF-8 caracters

Post by martien » Thu May 04, 2017 9:45 am

Hello,

the problem with the bad characters is not a problem of the interface.
It seems to be inserted on epodoc/docdb generation for e.g. inventor- or applicant data. It combines country within the used field.

Have a look on xml-transformation, ( in Perl) :

Code: Select all

$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'applicants'}->[0]->{'applicant'}->[0]->{'applicant-name'}->[0]->{'name'}->[0]->{'content'} = "RAST UWE\x{2002}[DE]";
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'applicants'}->[0]->{'applicant'}->[0]->{'data-format'} = 'epodoc';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'applicants'}->[0]->{'applicant'}->[0]->{'sequence'} = '1';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'applicants'}->[0]->{'applicant'}->[1]->{'applicant-name'}->[0]->{'name'}->[0]->{'content'} = " SCHMIDT KUPPLUNG GMBH\x{2002}[DE]";
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'applicants'}->[0]->{'applicant'}->[1]->{'data-format'} = 'epodoc';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'applicants'}->[0]->{'applicant'}->[1]->{'sequence'} = '2';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'applicants'}->[0]->{'applicant'}->[2]->{'applicant-name'}->[0]->{'name'}->[0]->{'content'} = 'RAST UWE, ';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'applicants'}->[0]->{'applicant'}->[2]->{'data-format'} = 'original';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'applicants'}->[0]->{'applicant'}->[2]->{'sequence'} = '1';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'applicants'}->[0]->{'applicant'}->[3]->{'applicant-name'}->[0]->{'name'}->[0]->{'content'} = 'SCHMIDT-KUPPLUNG GMBH';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'applicants'}->[0]->{'applicant'}->[3]->{'data-format'} = 'original';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'applicants'}->[0]->{'applicant'}->[3]->{'sequence'} = '2';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'inventors'}->[0]->{'inventor'}->[0]->{'data-format'} = 'epodoc';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'inventors'}->[0]->{'inventor'}->[0]->{'inventor-name'}->[0]->{'name'}->[0]->{'content'} = "RAST UWE\x{2002}[DE]";
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'inventors'}->[0]->{'inventor'}->[0]->{'sequence'} = '1';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'inventors'}->[0]->{'inventor'}->[1]->{'data-format'} = 'original';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'inventors'}->[0]->{'inventor'}->[1]->{'inventor-name'}->[0]->{'name'}->[0]->{'content'} = 'RAST UWE';
$data->[0]->{'bibliographic-data'}->[0]->{'parties'}->[0]->{'inventors'}->[0]->{'inventor'}->[1]->{'sequence'} = '1';


The original-format data or ok, epodoc/docdb add \x{2002} instead of a blank.
Especially if you want to use the field for getting more info (e.g. about an inventor) you'll
have to correct the field yourself.


Post Reply