Get Title, Abstract, Citations and IPC codes for every Document

Here you can post your opinions, ask questions and share information regarding the following services:
European Publication Server (EPO's official publication platform), EP Bulletin search and EP full-text search (Searching procedural data and the full text of EP A and B documents via the "Patent information services for experts" user interface).
Post Reply

Peter
Posts: 2
Joined: Wed Sep 08, 2021 8:22 am

Get Title, Abstract, Citations and IPC codes for every Document

Post by Peter » Thu Sep 09, 2021 5:24 pm

Hello,
I'm trying to get all patents where the tile and abstract (both in English) as well as any kind of citations and IPC codes are available.

Currently I'm trying to get this data through linked-data's SPARQL from the available samples and other resources I constructed something like this:

Code: Select all

prefix cpc: <http://data.epo.org/linked-data/def/cpc/>
prefix dcterms: <http://purl.org/dc/terms/>
prefix ipc: <http://data.epo.org/linked-data/def/ipc/>
prefix mads: <http://www.loc.gov/standards/mads/rdf/v1.rdf>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix patent: <http://data.epo.org/linked-data/def/patent/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix skos: <http://www.w3.org/2004/02/skos/core#>
prefix st3: <http://data.epo.org/linked-data/def/st3/>
prefix text: <http://jena.apache.org/text#>
prefix vcard: <http://www.w3.org/2006/vcard/ns#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT * {
  ?publication rdf:type patent:Publication ;
      patent:titleOfInvention ?title ;
      dcterms:abstract        ?abstract ;
      patent:classificationIPCInventive ?ipc;
      (patent:citesPatentPublication+ | patent:citesPatentPublication*/patent:citationNPL)  ?citedDocument
      
      .
} LIMIT 10
Unfortunately I ran into a number of issues with this. If a document contains multiple fields, e.g. multiple IPC codes, I get an entry for each. If there are multiple other fields as well (e.g. title in en, fr and de), this problem gets multiplied, and with all the fields I want, there can be hundreds of entries per document.

I believe this should be solvable through GROUP BY, group_concat or the like, however if I try any of that, the request just times out. No error.

I noticed that in json view (which I prefer), the title comes with an xml:lang tag, and in text view has the format "text"@en, but I was unable to achieve any results with these either.


cnicolae
Posts: 2
Joined: Mon Mar 29, 2021 4:36 pm

Re: Get Title, Abstract, Citations and IPC codes for every Document

Post by cnicolae » Mon Sep 13, 2021 3:44 pm

Hello Peter,

Try to use filters such as
FILTER(langMatches(lang(?title), "en"))
FILTER(langMatches(lang(?abstract), "en"))
in order to reduce the scope of the search.

The document http://documents.epo.org/projects/babyl ... s_v1.2.pdf
provides some sample queries, as well.

Hope this helps.

Kind regards,
Cristian


Peter
Posts: 2
Joined: Wed Sep 08, 2021 8:22 am

Re: Get Title, Abstract, Citations and IPC codes for every Document

Post by Peter » Fri Sep 17, 2021 6:46 pm

Hello Christian,

Thank you, this did indeed help reduce the languages.

Unfortunately I'm still stuck with my grouping problem. No matter what I try, the request just times out if there's anything related to grouping in there. To be honest, I'm just about to give up, run the queries like this and just merge them in my local code.

With the filters added, my query looks like this.

Code: Select all

prefix cpc: <http://data.epo.org/linked-data/def/cpc/>
prefix dcterms: <http://purl.org/dc/terms/>
prefix ipc: <http://data.epo.org/linked-data/def/ipc/>
prefix mads: <http://www.loc.gov/standards/mads/rdf/v1.rdf>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix patent: <http://data.epo.org/linked-data/def/patent/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix skos: <http://www.w3.org/2004/02/skos/core#>
prefix st3: <http://data.epo.org/linked-data/def/st3/>
prefix text: <http://jena.apache.org/text#>
prefix vcard: <http://www.w3.org/2006/vcard/ns#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?publication ?title ?abstract ?ipc ?citedDocument WHERE{
  ?publication rdf:type patent:Publication ;
      patent:titleOfInvention ?title ;
      dcterms:abstract        ?abstract ;
      patent:classificationIPCInventive ?ipc;
      patent:citesPatentPublication+ | patent:citesPatentPublication*/patent:citationNPL ?citedDocument;     
      .
  FILTER(langMatches(lang(?title), "en"))
  FILTER(langMatches(lang(?abstract), "en"))
} 
LIMIT 10


cnicolae
Posts: 2
Joined: Mon Mar 29, 2021 4:36 pm

Re: Get Title, Abstract, Citations and IPC codes for every Document

Post by cnicolae » Tue Sep 21, 2021 12:31 pm

Hi Peter,

I am sorry to hear about the time out problem.
The web interface is to be used only for occasional use and only when data is limited.
According to the guide "[...] Complex queries may cause the form to time out, so we recommend that you run them in your own local triple store, where you can import all of the EPO's linked open
data after downloading it."

The best way to debug this is simply to download the data locally and then run the query against the locally stored data.

That should at least eliminate the time out issue and allow you to concentrate on the query itself.

Kind regards,
Cristian


Post Reply