Update for "EP full-text data for text analytics"

This is the place where the linked data/open data community can ask and respond to questions about or share experiences with EPO’s open bulk data sets. The moderator will use this forum to announce product related news.
Post Reply

mkracker
Posts: 112
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Update for "EP full-text data for text analytics"

Post by mkracker » Fri Mar 13, 2020 10:09 am

This product is a free bulk data set consisting of XML-tagged titles, abstracts, descriptions, claims and search reports covering all EP publications, designed to facilitate natural language processing work. It can be used alone, but is best if combined with bibliographic patent data, like it is available in PATSTAT, Global Patent Index, Open Patent Service and many more . It has the open data license CC-BY.

A new version has been recently released, which contains all EP publications from 1978 (publication number 1) to week 08 of 2020.

The data can be downloaded from Google Cloud from bucket “epo-public”. After signing into Google you may access the bucket via https://console.cloud.google.com/storag ... po-public/ .

For downloading larger volumes it is recommended to use the CLI "gsutil" from Google’s Cloud SDK. Details and explanations can be found in the User Guide of the “EP full-text data for text analytics” data set, available at the bottom of its product page https://www.epo.org/searching-for-paten ... html#tab-1 .
-------------------------------------------
Martin Kracker / EPO


mkracker
Posts: 112
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Update for "EP full-text data for text analytics"

Post by mkracker » Thu Oct 08, 2020 2:22 pm

There has been another update this week. The new release contains all EP publications from 1978 (publication number 1) to week 37 of 2020.
-------------------------------------------
Martin Kracker / EPO


Post Reply