Wrong UTF-8 caracters

This space is made available to users of Open Patent Services (OPS) web-service and now also to users of EPO’s bulk data subscription products such as 14. EPO worldwide bibliographic database (DOCDB), 14.11 EPO worldwide legal status database (INPADOC), 14.12 EP full text data, 14.1 EP bibliographic data (EBD)and more.

Users can ask each other questions, exchange experiences and solutions, post ideas. The moderator will use this space to announce changes or other relevant information.

AstriumServices
Posts: 6
Joined: Mon Oct 21, 2013 8:55 am

Wrong UTF-8 caracters

Post by AstriumServices » Tue Mar 11, 2014 11:02 am

All,

During a REST request on OPS server on March 5th, I have received as result of my request an XML including wrong UTF-8 caracters sequence " ".
Neither a standard ASCII file reader like UltraEdit nor C# HttpWebRequest piece of code are able to parse properly this sequence.
More over, I did not found this sequence in the UTF-8 coding chart.
This sequence has been generated for a couple of families (exemple famn=33457358) captured during my update sequence but not for all the families.

Could you confirm that the origin of these wrong caracters is located on the OPS server side ?
If confirmed, should I have to expect other type of wrong sequences in data captured from OPS server and how can I work around this problem ?

Thanks
Michel


EPO / OPS Support
Posts: 980
Joined: Thu Feb 22, 2007 5:32 pm

Re: Wrong UTF-8 caracters

Post by EPO / OPS Support » Tue Mar 11, 2014 3:48 pm

Dear user,

I am sorry, but OPS does not have possibility to search for famn. Can you give me a patent number of the document that has family members in question. Also, have you tried the same request via our Developers portal (API console): https://developers.epo.org/?

Thank you in advance,

OPS support


AstriumServices
Posts: 6
Joined: Mon Oct 21, 2013 8:55 am

Re: Wrong UTF-8 caracters

Post by AstriumServices » Thu Mar 13, 2014 9:15 am

Thanks for your reactivity.
Please find just below the request which delivers said wrong UTF-8 caracters sequence in my XML file.
https://ops.epo.org/3.1/rest-services/p ... n=33457358

On IE 9, switch coding format from UTF-8 to Occidental ISO for instance to see the sequence.
It appears for some members of the family in the epodoc applicant or the epodoc inventor field.

I didn't find this sequence in standard UTF-8 coding charts and this sequence is not present in my previous requests made during a 2months period at weekly basis.
Any idea what's wrong with this?


EPO / OPS Support
Posts: 980
Joined: Thu Feb 22, 2007 5:32 pm

Re: Wrong UTF-8 caracters

Post by EPO / OPS Support » Thu Mar 13, 2014 12:49 pm

Dear user,

I see I was a bit premature claiming that famn does not work in CQL - it seems that in last documentation several new filed identifiers were added. Thanks for pointing that to us :-)

As far as characters, when I display it on my computer I don't see anything wrong with epodoc format or inventor or applicant so I will ask our technical team if they have any idea what is causing your problem.

I will get back to you once I hear something from our colleagues,

Kind regards,

OPS support


EPO / OPS Support
Posts: 980
Joined: Thu Feb 22, 2007 5:32 pm

Re: Wrong UTF-8 caracters

Post by EPO / OPS Support » Thu Mar 13, 2014 1:57 pm

Dear user,

We are very sorry, but no one here can reproduce your problem in our system, we all get proper coding and we do get all 3 are completely valid UTF-8 symbols (0xE2 0x20AC 0x201A). We can only conclude that this issue is at your end and we can not be of help this time.

Kind regards,

OPS support


AstriumServices
Posts: 6
Joined: Mon Oct 21, 2013 8:55 am

Re: Wrong UTF-8 caracters

Post by AstriumServices » Mon Mar 17, 2014 5:56 pm

Dear support,

Deeper investigation has led me to identify the wrong UTF8 caracter [EN SPACE] or [\u2002].
I confirm that the problem is coming from a downstream process, while exporting my database into a UTF16 csv file. MySQL conversion of said caracter from UTF8 into UTF16 does not work while exporting DB content into CSV file.
Replacing said caracter with standard [SPACE] during the import process one solves the issue.

The misleading points were that this [EN SPACE] caracter has appeared in recent request and not before, and that standard text editor like UltraEdit or VisualStudio does not parse this caracter properly.
Thanks for your help


Kerstin Thoma
Posts: 2
Joined: Thu Apr 27, 2017 7:32 pm

Re: Wrong UTF-8 caracters

Post by Kerstin Thoma » Thu Apr 27, 2017 8:14 pm

3 years later and I find these characters. â€

To Reproduce:
Http://ops.epo.org/3.1/rest-services/pu ... iblio.json

Bibliographic-data / parties / inventors / inventor / [0] / inventor-name / name / $
Value: RAST UWEâ €, [DE]

Seen in: Chrome, Firefox and Edge.
You can also see these characters in https://developers.epo.org.

Some applicant-name (@ data-format: epodoc) also have these characters.

Is this a temporary problem?
And is there a way to get the county-value by inventor separately?

Thanks for your support,
Best Regards
Kerstin Thoma.


EPO / OPS Support
Posts: 980
Joined: Thu Feb 22, 2007 5:32 pm

Re: Wrong UTF-8 caracters

Post by EPO / OPS Support » Fri Apr 28, 2017 9:13 am

Capture4.PNG
Capture4.PNG (71.97 KiB) Viewed 895 times
Hi,

When I do a search using API Console I get this:


Kind regards,
OPS support


Kerstin Thoma
Posts: 2
Joined: Thu Apr 27, 2017 7:32 pm

Re: Wrong UTF-8 caracters

Post by Kerstin Thoma » Fri Apr 28, 2017 1:38 pm

When I do a search using API Console I get this:

Image

I'm working on Windows10, see in Chrome, Firefox, Edge and Opera.
Thank you very much for your support!


EPO / OPS Support
Posts: 980
Joined: Thu Feb 22, 2007 5:32 pm

Re: Wrong UTF-8 caracters

Post by EPO / OPS Support » Fri Apr 28, 2017 1:42 pm

You are using 3.1 instead of official production environment which is 3.2. Please check announcements part of this forum. 3.1 is a version that is not updated any longer.

Regards,

OPS support


Post Reply