Error: insufficient disk space

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

pardeep.naik
Posts: 5
Joined: Fri Feb 16, 2018 2:20 pm

Error: insufficient disk space

Post by pardeep.naik » Wed Jul 18, 2018 12:24 pm

I am using PATSTAT 2018 spring version.
I am running the following query:
SELECT a1.appln_id, a1.appln_auth + a1.appln_nr + a1.appln_kind AS Application_Number, a1.appln_filing_date, a1.granted, appln_title, appln_abstract, pp.publn_date, person_name, person_address, person_ctry_code, ipc_class_symbol, LEFT(ipc_class_symbol, 3), cpc_class_symbol, ipcn.nace2_code, an.nace2_code, nace2_descr, pp2.publn_auth + pp2.publn_nr + pp2.publn_kind AS Cited_Publications, a2.appln_auth + a2.appln_nr + a2.appln_kind AS Cited_Applications FROM tls201_appln a1
LEFT JOIN tls202_appln_title apt ON a1.appln_id = apt.appln_id
LEFT JOIN tls203_appln_abstr aa ON a1.appln_id = aa.appln_id
LEFT JOIN tls211_pat_publn pp ON a1.appln_id = pp.appln_id
LEFT JOIN tls207_pers_appln pa ON a1.appln_id = pa.appln_id
LEFT JOIN tls206_person p ON pa.person_id = p.person_id
LEFT JOIN tls209_appln_ipc ai ON a1.appln_id = ai.appln_id
LEFT JOIN tls224_appln_cpc ac ON a1.appln_id = ac.appln_id
LEFT JOIN tls229_appln_nace2 an ON a1.appln_id = an.appln_id
LEFT JOIN tls902_ipc_nace2 ipcn ON an.nace2_code = ipcn.nace2_code
LEFT JOIN tls212_citation c ON pp.pat_publn_id = c.pat_publn_id
LEFT JOIN tls211_pat_publn pp2 ON c.cited_pat_publn_id = pp2.pat_publn_id
LEFT JOIN tls201_appln a2 ON c.cited_appln_id = a2.appln_id
WHERE a1.granted = 1 AND
(LEFT(ipc_class_symbol, 3) IN ('A01', 'A23', 'A47', 'A61', 'B01', 'B02', 'B04', 'B05', 'B26', 'B27', 'B29', 'B31', 'B81', 'B82', 'C02', 'C05', 'C07', 'CO8', 'C09', 'C10', 'C11', 'C12', 'C13', 'C14', 'E03', 'E04', 'G02', 'G03', 'G06', 'G08', 'G16', 'H12') OR LEFT(ipc_class_symbol, 1) IN ('D'))
AND ((appln_title LIKE '%bio%') OR (appln_abstract LIKE '%bio%'))

After approximate 2 hours of running time I am getting following error:
Error Code: 1101, SQL State: S00010] Could not allocate a new page for database 'TEMPDB' because of insufficient disk space in filegroup 'DEFAULT'. Create the necessary space by dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

Is it because of lots of Joins and text search I am using in my query or any other reason? Thanks in advance.


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Error: insufficient disk space

Post by EPO / PATSTAT Support » Tue Jul 24, 2018 10:09 am

Dear Pradeep,

There are several reasons why your query has problems:

- Last week we had a full disk issue which caused the error message you reported. This is fixed. Sorry for the inconvenience this has caused.

- The other reason is that your query is – as you assumed – much too heavy, because it contains a lot of joins and you filter conditions are not very restrictive. To illustrate the JOIN issue, let’s take an example of an application which has
2 publications
5 citations per publication
4 IPC symbols
4 CPC symbols
5 persons (1 applicant, 4 inventors)
3 NACE codes.
This single application will cause 2x5x4x4x5x3 = 2400 rows to be retrieved, where each row contains a different combinations of the values listed above.

Also, the JOIN with table TLS902_IPC_NACE2 adds another huge amount of redundancy, which you do not want.

I always test my queries on a small scale first. To get a feeling of your result, try your query with

Code: Select all

WHERE a1.appln_id is in (1,2,3)
This query alone retrieves almost 1,7 million rows for these 3 applications!
The estimated query plan of MS SQL Server guesses that your result will contain about 80 billion(!) rows. Even considering that this is a rough estimation, it tells you that you have to re-design your query and / or your analysis.

Also, your WHERE clause it quite general: 1 486 624 applications fulfil the WHERE conditions.

Not only would the resulting data size be huge, but it will make any subsequent analysis or visualisation very difficult due to the many repeating values.

Instead of putting all the data you need in one table, you should split them up into several tables to avoid this combinatorial explosion. When downloaded you then may combine the tables as needed in your local environment.
A powerful feature of PATSTAT Online to help you is the Subset Download. Look it up in the PATSTAT Online user manual and give it a try. The Subset Download allows you to start with a simple query like the one below which just returns the APPLN_IDs of the relevant applications. Only when downloading you specify the extra tables you want to retrieve.

Code: Select all

SELECT DISTINCT a1.appln_id
FROM tls201_appln a1
JOIN tls202_appln_title apt ON a1.appln_id = apt.appln_id
JOIN tls203_appln_abstr aa ON a1.appln_id = aa.appln_id
JOIN tls209_appln_ipc ai ON a1.appln_id = ai.appln_id
WHERE a1.granted = 1 AND
(LEFT(ipc_class_symbol, 3) IN ('A01', 'A23', 'A47', 'A61', 'B01', 'B02', 'B04', 'B05', 'B26', 'B27', 'B29', 'B31', 'B81', 'B82', 'C02', 'C05', 'C07', 'CO8', 'C09', 'C10', 'C11', 'C12', 'C13', 'C14', 'E03', 'E04', 'G02', 'G03', 'G06', 'G08', 'G16', 'H12') OR LEFT(ipc_class_symbol, 1) IN ('D'))
AND ((appln_title LIKE '%bio%') OR (appln_abstract LIKE '%bio%'))
- And lastly, as a safeguard for our servers every query is aborted if it still has not finished after 1 hour. If your query is too complex, then you need to simplify it or subscribe t the PATSTAT data and create your own local database.
Note that searching in long strings like the abstract with leading wildcards (‘%bio...’) is very costly.

Best regards,
Martin
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply