PATSTAT 2016 Autumn - TLS215_CITN_CATEG corrupt data?

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

Fr3dY
Posts: 26
Joined: Mon Oct 17, 2016 8:57 am

PATSTAT 2016 Autumn - TLS215_CITN_CATEG corrupt data?

Post by Fr3dY » Fri Dec 09, 2016 9:56 am

Hi,

I've just detected some weird values in the TLS215_CITN_CATEG table... some CITN_CATEG values are "Ãœ" ("Ü" in UNICODE), but I can't find this value in the documentation:

CITN_CATEG 6.22
Name: Category of the citation
Also Known As: n/a
Description: Category of the citation as mentioned in Search Reports
Domain: 1 character (X, I, Y,A,D,E,P,L,R,T,O )

You can find the first occurrence in the file at line 4846287 (or search for PAT_PUBLN_ID=283802817 and CITN_ID=2)


Regards,


mkracker
Posts: 120
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: PATSTAT 2016 Autumn - TLS215_CITN_CATEG corrupt data?

Post by mkracker » Tue Dec 13, 2016 8:37 am

Hi,

Thank you for reporting these data errors. Your observation confirms the rule that no database is completely correct, not even PATSTAT ;) .

In fact, the citations are taken from EPOs REFI and DOCDB databases, which contains data collected from more than 80 patent offices around the world in a variety of formats. Although we put great effort on data quality, 100% data correctness will never be achieved. In this case, there are 78 citation categories out of 38 000 000 which do not make sense, with values like B, C, Q, S, U, Ü, ...

As far as I I can see, these 78 cases have already been corrected in our source databases, so you will not find them e.g. in Espacenet, and they will also not show up in the next PATSTAT edition.

Best regards,
Martin / EPO
-------------------------------------------
Martin Kracker / EPO


Fr3dY
Posts: 26
Joined: Mon Oct 17, 2016 8:57 am

Re: PATSTAT 2016 Autumn - TLS215_CITN_CATEG corrupt data?

Post by Fr3dY » Tue Dec 13, 2016 11:31 am

mkracker wrote:Hi,

As far as I I can see, these 78 cases have already been corrected in our source databases, so you will not find them e.g. in Espacenet, and they will also not show up in the next PATSTAT edition.
Hi Martin,

I guess some fix should be released for current PATSTAT edition as well... will you take a look into it?



Kind regards,


Fr3dY
Posts: 26
Joined: Mon Oct 17, 2016 8:57 am

Re: PATSTAT 2016 Autumn - TLS215_CITN_CATEG corrupt data?

Post by Fr3dY » Tue Dec 13, 2016 2:43 pm

I was told that this issue is not going to be fixed for current PATSTAT version (but 2017 Spring will be fine).
So, I've manually removed some duplicated entries and set all this values to 'U' - as for 'Unknown'.



Regards,


Post Reply