Page 1 of 1

PATSTAT 2016 Autumn - TLS215_CITN_CATEG corrupt data?

Posted: Fri Dec 09, 2016 9:56 am
by Fr3dY
Hi,

I've just detected some weird values in the TLS215_CITN_CATEG table... some CITN_CATEG values are "Ãœ" ("Ü" in UNICODE), but I can't find this value in the documentation:

CITN_CATEG 6.22
Name: Category of the citation
Also Known As: n/a
Description: Category of the citation as mentioned in Search Reports
Domain: 1 character (X, I, Y,A,D,E,P,L,R,T,O )

You can find the first occurrence in the file at line 4846287 (or search for PAT_PUBLN_ID=283802817 and CITN_ID=2)


Regards,

Re: PATSTAT 2016 Autumn - TLS215_CITN_CATEG corrupt data?

Posted: Tue Dec 13, 2016 8:37 am
by mkracker
Hi,

Thank you for reporting these data errors. Your observation confirms the rule that no database is completely correct, not even PATSTAT ;) .

In fact, the citations are taken from EPOs REFI and DOCDB databases, which contains data collected from more than 80 patent offices around the world in a variety of formats. Although we put great effort on data quality, 100% data correctness will never be achieved. In this case, there are 78 citation categories out of 38 000 000 which do not make sense, with values like B, C, Q, S, U, Ü, ...

As far as I I can see, these 78 cases have already been corrected in our source databases, so you will not find them e.g. in Espacenet, and they will also not show up in the next PATSTAT edition.

Best regards,
Martin / EPO

Re: PATSTAT 2016 Autumn - TLS215_CITN_CATEG corrupt data?

Posted: Tue Dec 13, 2016 11:31 am
by Fr3dY
mkracker wrote:Hi,

As far as I I can see, these 78 cases have already been corrected in our source databases, so you will not find them e.g. in Espacenet, and they will also not show up in the next PATSTAT edition.
Hi Martin,

I guess some fix should be released for current PATSTAT edition as well... will you take a look into it?



Kind regards,

Re: PATSTAT 2016 Autumn - TLS215_CITN_CATEG corrupt data?

Posted: Tue Dec 13, 2016 2:43 pm
by Fr3dY
I was told that this issue is not going to be fixed for current PATSTAT version (but 2017 Spring will be fine).
So, I've manually removed some duplicated entries and set all this values to 'U' - as for 'Unknown'.



Regards,