Page 1 of 1

Support for automation

Posted: Sun Dec 02, 2018 6:50 pm
by Rachie
For me, retrieving the full patstat dataset is quite a chore. Once I found the download page (this took me fully 30 minutes), there is no easy way to download everything. Clicking on each link in a web browser is both tedious and unreliable. The server only seems to support five or so simultaneous downloads. I'd like to script the downloads, but the actual links are wrapped in layers of javascript. The result is I must sit and babysit the web browser.

Patstat authors, please, please provide a better method of obtaining the full dataset. Something like SFTP would be nice. SFTP is easily scriptable, and is well suited for large unattended downloads.

The bittorrent protocol would be even better. It is inherently well suited for multiple large files, has built-in checksum verification, and greatly reduces the load on your servers.

Also, it would be very useful if the checksums were easier to use. There is absolutely no reason to have separate files for each ZIP file. A single file containing the checksums of all ZIP files is preferable. It would be far simpler, easier to download, and easier to verify. The ideal format is explained in a previous forum post:
patstat-autumn-2016-checksums-5524#p16720

Thanks!

Re: Support for automation

Posted: Tue Dec 04, 2018 11:55 am
by EPO / PATSTAT Support
Hello Rachie,
the download platform is indeed not the most user-friendly one.
The link to the download page is included in the e-mail when we announce the uploading of the new PATSTAT release.
For the purpose of scripting, we offer you to make use of our REST interface. The attached PDF explains detailed steps on how to access and use that platform. I hope you find this useful.

Re: Support for automation

Posted: Mon Feb 28, 2022 9:21 pm
by jarenas
Hi,

In case anyone finds this interesting, I wrote a simple Python script to download the last available version of PATSTAT Global. It can easily be modified to download other products. You can find it here: https://github.com/IntelCompH2020/getPATSTAT

Instructions on how to use the script are included in the Readme file of the repo.

Regards,
Jerónimo.