Time out error - Acceleration of query needed

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

GiulioBarth
Posts: 3
Joined: Tue Dec 15, 2015 12:55 pm

Time out error - Acceleration of query needed

Post by GiulioBarth » Tue Dec 15, 2015 12:59 pm

Hey there,

I am trying to get a table of patents relevant to synthetic biology for academic research.
Unfortunately the search for attached query does not complete but fails with time out after several hours. Any hint how to accelerate the query?
FYI: I checked with select top 100 - query works perfectly.

Many thanks!

Warm regards,
Giulio

Code: Select all

Select a.appln_id, a.appln_auth, a.appln_nr, a.appln_kind, p.person_id, p.person_name, abstr.appln_id, abstr.appln_abstract, t.appln_id, t.appln_title, i.appln_id, i.ipc_class_symbol, pub.appln_id, pub.publn_nr, pub.publn_date, p.sector
From  tls201_appln a
JOIN tls202_appln_title t ON a.appln_id = t.appln_id
JOIN tls203_appln_abstr abstr ON a.appln_id = abstr.appln_id
LEFT OUTER JOIN tls207_pers_appln pa ON a.appln_id = pa.appln_id
LEFT OUTER JOIN tls211_pat_publn pub ON a.appln_id = pub.appln_id
LEFT OUTER JOIN tls206_person p ON pa.person_id = p.person_id
JOIN tls209_appln_ipc i ON a.appln_id = i.appln_id
WHERE ((abstr.appln_abstract like '%cad%' OR 
abstr.appln_abstract like '%cam%' OR 
abstr.appln_abstract like '%microfluidics%' OR 
abstr.appln_abstract like '%design platform%' OR 
abstr.appln_abstract like '%computer aided design%' OR 
abstr.appln_abstract like '%systems biology model%' OR 
abstr.appln_abstract like '%metabolomic%' OR 
abstr.appln_abstract like '%model%' OR 
abstr.appln_abstract like '%transcriptomic% model%' OR 
abstr.appln_abstract like '%protein folding model%' OR 
abstr.appln_abstract like '%protein folding prediction%' OR 
abstr.appln_abstract like '%RNA folding model%' OR 
abstr.appln_abstract like '%RNA folding prediction%' OR 
abstr.appln_abstract like '%multiplex ligation%' OR 
abstr.appln_abstract like '%dna synthesis%' OR 
abstr.appln_abstract like '%gene synthesis%' OR 
(abstr.appln_abstract like '%multiplex%' AND abstr.appln_abstract like '%genome%') OR
(abstr.appln_abstract like '%multiplex%' AND abstr.appln_abstract like '%gene%'))
AND (i.ipc_class_symbol like '%B01%' OR i.ipc_class_symbol like '%C12N%' OR i.ipc_class_symbol like '%C12P%' OR i.ipc_class_symbol like '%C12Q%' OR i.ipc_class_symbol like '%C12S%' OR i.ipc_class_symbol like '%C40B%')
AND (publn_earliest_year='2013'))


mkracker
Posts: 114
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Time out error - Acceleration of query needed

Post by mkracker » Wed Dec 16, 2015 11:20 am

Just my observations and hints:
  • Searching in long texts with wild cards,which you do heavily in your query, is costly in PATSTAT Online. A relational database is not the ideal tool for searching text.
  • In your query there is no need to put a wild card on the left side of the IPC symbol, like ipc_class_symbol like '%C12Q%'. Moreover, left truncation is much more costly than right truncation. IPC symbols always start with the letter-digit-digit-letter combination, as defined by WIPO standard ST.8. See details of attribute IPC_CLASS_SYMBOL in the PATSTAT Data Catalog.
  • You can further combine some comparisons by slightly reformulating the query: The filtering of IPCs can be rewritten as:

    Code: Select all

    (i.ipc_class_symbol like 'B01%'
     or left(i.ipc_class_symbol, 4) IN ('C12N', 'C12P', 'C12Q', 'C12S', 'C40B'))
These measures reduced the costs of the query by about 40%.

I assumed you are using PATSTAT Online. So I run the query in PATSTAT Online 2x. The query returned 167 853 rows. The execution times were 19 minutes and 53 minutes resp.. The difference is obviously due to the differences is the general load of the system by other users. Note that queries will automatically be aborted if they run for more than 60 minutes.

A general hint (see also PATSTAT Online User Manual on the EPO website): When optimizing a query, the "costs" of the query, as estimated by MS SQL, can be a good indication of whether a change in the query has really achieved an improvement or not. You can retrieve the "costs" of a query by generating the estimated query plan of the query by pressing <Ctrl> while clicking PATSTAT's search icon or by pressing <Ctrl>+<lower case L>. In the result table you will then see the estimated execution plan. The cost of the query is shown as a number in column "TotalSubtreeCost" in the first line of the result table.
-------------------------------------------
Martin Kracker / EPO


GiulioBarth
Posts: 3
Joined: Tue Dec 15, 2015 12:55 pm

Re: Time out error - Acceleration of query needed

Post by GiulioBarth » Wed Dec 16, 2015 5:31 pm

Great hints! Works now ways better.

Many thanks and warm regards,


Giulio


GiulioBarth
Posts: 3
Joined: Tue Dec 15, 2015 12:55 pm

Re: Time out error - Acceleration of query needed

Post by GiulioBarth » Fri Dec 18, 2015 10:42 am

Unfortunately there is now an internal download error when attempting 2015 Spring edition and downloading result table as csv for the described search (first 700000 rows). I already wrote a mail to epal@epo.org, hope that mail finds you.

Thanks and warm regards,

Giulio


EPO / EPAL Team
Posts: 74
Joined: Mon Dec 22, 2014 8:30 am

Re: Time out error - Acceleration of query needed

Post by EPO / EPAL Team » Fri Dec 18, 2015 2:40 pm

There are some technical issues that our contractor is currently working on but we are confident that they will be resolved very soon.


Post Reply