Selecting a subset of results: equivalent to MySQL's LIMIT?

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

PDD84
Posts: 2
Joined: Sun Nov 01, 2015 7:21 am

Selecting a subset of results: equivalent to MySQL's LIMIT?

Post by PDD84 » Sun Nov 01, 2015 7:36 am

Hello,

I'm trying to download the titles of all Samsung patents via PATSTAT Online. The following query works well to find them:

Code: Select all

SELECT DISTINCT a.appln_id FROM tls201_appln a join tls207_pers_appln on tls207_pers_appln.appln_id = a.appln_id join tls206_person on tls206_person.person_id = tls207_pers_appln.person_id join tls226_person_orig on tls226_person_orig.person_id = tls207_pers_appln.person_id join tls221_inpadoc_prs on tls221_inpadoc_prs.appln_id = a.appln_id where tls206_person.hrm_l2 LIKE 'SAMSUNG ELECTRONICS %'
This gives me 255 335 rows, so I'm hitting the download limit which is at 100 000 patents max.

In MySQL I would simply add something like LIMIT 0,99999 and then LIMIT 100000,199999 etc to multiple versions of the same query but this syntax doesn't exist in MS SQL. I also tried OFFSET ... FETCH ... but that also gives a syntax error (only seems to work with MS SQL 2012 and higher). Lastly, I tried the ROW_NUMBER() OVER (...) function as described here (http://stackoverflow.com/questions/1940 ... ere-clause) but that seems to require the use of a CTE which I don't get to work in PATSTAT.

What is the best way break up a long list of results into multiple queries?

Thanks a lot.


mkracker
Posts: 120
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Selecting a subset of results: equivalent to MySQL's LIM

Post by mkracker » Sun Nov 01, 2015 10:34 pm

You are right, the easiest way to do this is with the OFFSET ... FETCH clause, which is new in version 2012. Luckily PATSTAT Online is running on MS SQL Server 2012.

An ORDER BY clause is mandatory for the OFFSET ... FETCH. So to conform to the syntax, you must append to your query these lines to skip the first 200 000 rows and fetch the next 100 000:

ORDER BY a.appln_id
OFFSET 200000 ROWS
FETCH NEXT 100000 ROWS ONLY

PATSTAT Online accepts this query, but unfortunately there is an issue during execution, because the OFFSET.. FETCH seems to be simply ignored. Thank you for making me aware of this issue; I will forward it to our developers.

PATSTAT Online can work with CTEs, but I think this workaround is too complex.

Because you need to split your result table in only 3 parts, I propose a simpler approach:
1) Order your result by APPLN_ID, which is the only column in your SELECT
2) Use TOP 100000 to get the 100 000 smallest APPLN_IDs
3) To get the second 100 000 applications, add a WHERE APPLN_ID > nnn, with nnn being the highest (= last) APPLN_ID in the first run of your query, and re-run your query.

Just another note: The limit of max. 100 000 applications applies to the PATSTAT Subset download. If you split your query in 3 parts and request a subset download for each of them, then there may be - depending on the data you download - overlaps. E. g., the same person may be retrieved during multiple downloads. So you may have to eliminate these duplicates after download.

I hope that helps,
Martin Kracker / EPO PATSTAT


PDD84
Posts: 2
Joined: Sun Nov 01, 2015 7:21 am

Re: Selecting a subset of results: equivalent to MySQL's LIM

Post by PDD84 » Mon Nov 02, 2015 3:09 pm

Thanks a lot for your detailed and quick reply. I think I didn't have an ORDER BY clause earlier. I'll try it with your suggestions.


Post Reply