Page 1 of 1

Help with large downloads on PATSTAT online

Posted: Fri Sep 20, 2019 6:08 pm
by SugOxOriel
Hi team,
I would like to do a large download where I get all application IDs for patents filed by UK entities and their associated CPC codes over 2000-2019 (using earliest filing year for the time interval). The issue is that PATSTAT online only allows you to export 700,000 rows at a time to excel. Even when I restrict the interval to one year (e.g. 2000-2001) there are more than 700,000 rows. Does anyone have any advice on how to do batch downloads? For example, how do I tell PATSTAT to download the first 350,000 rows and then download the bottom 350,000 rows? I've seen example code for this on sql help forums but am struggling to apply it to PATSTAT online. Any advice on how to do this would be appreciated. I also don't mind advice on other ways to handle very large downloads & how to get around the 700,000 row limit.
Thank you!

Re: Help with large downloads on PATSTAT online

Posted: Mon Sep 23, 2019 9:36 am
by EPO / PATSTAT Support
Hello SugOxOriel,
the 700.000 limit has been implemented exactly to avoid mass downloads. Researchers needing that much date would rather work with a local installed PATSTAT data base. EXCEL also has a limit of just over 1 million records on 1 sheet, so anything over that will need multiple sheets or another tool to work with. (or powerpivot)

There is no way to download more records in one go, so the only way is to do multiple downloads and then kind of merge the tables with a tool.

Assuming you only need the GB applications and the CPC codes for all applications between 2000 and 2019:
limit the output via an extra WHERE condition limiting the years:

Code: Select all

select * from tls201_appln join tls224_appln_cpc on tls201_appln.appln_id = tls224_appln_cpc.appln_id
where appln_auth = 'GB' 
and earliest_filing_year between 2000 and 2009
-- then change filing_year to 2010 and  2015
-- then change filing year to 2016 and 2019
order by tls201_appln.appln_id
That will give you 3 txt files which you can open in separate EXCEL worksheets or any other tool that can deal with millions of rows.

Another option if you need more then only the CPC codes would be to download MS ACCESS or CSV extractions, but the limit on such download is 100.000 applications.
In total, all GB applications between 2000 and 2019 means 463.150 records (PATSTAT 2019a), so you you would need to redo the process minimum 5 times with different time frames and then merge the databases (files)

Code: Select all

select * from tls201_appln 
where appln_auth = 'GB' and earliest_filing_year between 2000 and 2019
order by tls201_appln.appln_id
In MS ACCESS you will hit the Microsoft maximum data base size of 2 GB, so you will need to use another tool. If you do this systematic for large data sets, your IP address / account will be blocked !

Depending on the nature of your research, would it not be possible to aggregate your data PATSTAT Online instead of downloading ?