Page 1 of 1

"Meta strings" in Abstract Table TLS203

Posted: Tue Jul 14, 2015 8:27 pm
by Jos.Winnink
Dear all,

After loading the latest version of PATSTAT (2015a), and inspecting the results of the loading process I found dat in the Abstract Table (TLS203...) a typical form of "meta bracket stringss" #CMT# ... #/CMT# occurs frequently within the text of the abstracts. I seems to me that during production meta strings were altered to prevent 'problems', but the altered strings seem to have survived the production process, and have made it into the final data. Correction of this issue by remove these strings from the information in TLS203 is given the number of occurrences of these strings, and the size of the table a quite tedious task.

Is it possible to clarify the meaning of this string pair? And of course I would appreciate it if in the next version of PATSTAT this issue could be solved.

Best regards,

Jos Winnink

Re: "Meta strings" in Abstract Table TLS203

Posted: Wed Jul 15, 2015 4:37 pm
by mkracker
Dear Jos,

Due to a flaw in our source data files (DOCDB backfile of Jan 2015) the abstract text may contain one or more of these strange strings "#CMT#". They do not have any business relevance. I would simply ignore them or - if your research requires it - delete them in your database, e.g. with
REPLACE(appln_abstract, '#CMT#', ''). [Please test before applying this instruction].

I was told that this flaw has already been fixed, so the next 2015 Autumn version will be clean again.