Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
-
PDD84
- Posts: 2
- Joined: Sun Nov 01, 2015 7:21 am
Post
by PDD84 » Sun Nov 01, 2015 7:36 am
Hello,
I'm trying to download the titles of all Samsung patents via PATSTAT Online. The following query works well to find them:
Code: Select all
SELECT DISTINCT a.appln_id FROM tls201_appln a join tls207_pers_appln on tls207_pers_appln.appln_id = a.appln_id join tls206_person on tls206_person.person_id = tls207_pers_appln.person_id join tls226_person_orig on tls226_person_orig.person_id = tls207_pers_appln.person_id join tls221_inpadoc_prs on tls221_inpadoc_prs.appln_id = a.appln_id where tls206_person.hrm_l2 LIKE 'SAMSUNG ELECTRONICS %'
This gives me 255 335 rows, so I'm hitting the download limit which is at 100 000 patents max.
In MySQL I would simply add something like LIMIT 0,99999 and then LIMIT 100000,199999 etc to multiple versions of the same query but this syntax doesn't exist in MS SQL. I also tried OFFSET ... FETCH ... but that also gives a syntax error (only seems to work with MS SQL 2012 and higher). Lastly, I tried the ROW_NUMBER() OVER (...) function as described here (
http://stackoverflow.com/questions/1940 ... ere-clause) but that seems to require the use of a CTE which I don't get to work in PATSTAT.
What is the best way break up a long list of results into multiple queries?
Thanks a lot.
-
mkracker
- Posts: 120
- Joined: Wed Sep 04, 2013 6:17 am
- Location: Vienna
Post
by mkracker » Sun Nov 01, 2015 10:34 pm
You are right, the easiest way to do this is with the OFFSET ... FETCH clause, which is new in version 2012. Luckily PATSTAT Online is running on MS SQL Server 2012.
An ORDER BY clause is mandatory for the OFFSET ... FETCH. So to conform to the syntax, you must append to your query these lines to skip the first 200 000 rows and fetch the next 100 000:
ORDER BY a.appln_id
OFFSET 200000 ROWS
FETCH NEXT 100000 ROWS ONLY
PATSTAT Online accepts this query, but unfortunately there is an issue during execution, because the OFFSET.. FETCH seems to be simply ignored. Thank you for making me aware of this issue; I will forward it to our developers.
PATSTAT Online can work with CTEs, but I think this workaround is too complex.
Because you need to split your result table in only 3 parts, I propose a simpler approach:
1) Order your result by APPLN_ID, which is the only column in your SELECT
2) Use TOP 100000 to get the 100 000 smallest APPLN_IDs
3) To get the second 100 000 applications, add a WHERE APPLN_ID > nnn, with nnn being the highest (= last) APPLN_ID in the first run of your query, and re-run your query.
Just another note: The limit of max. 100 000 applications applies to the PATSTAT Subset download. If you split your query in 3 parts and request a subset download for each of them, then there may be - depending on the data you download - overlaps. E. g., the same person may be retrieved during multiple downloads. So you may have to eliminate these duplicates after download.
I hope that helps,
Martin Kracker / EPO PATSTAT
-
PDD84
- Posts: 2
- Joined: Sun Nov 01, 2015 7:21 am
Post
by PDD84 » Mon Nov 02, 2015 3:09 pm
Thanks a lot for your detailed and quick reply. I think I didn't have an ORDER BY clause earlier. I'll try it with your suggestions.
-
sunniegoldie
- Posts: 1
- Joined: Fri Aug 02, 2024 12:03 pm
Post
by sunniegoldie » Fri Aug 02, 2024 12:11 pm
Hello,
To handle the large number of results you're getting from your query on PATSTAT Online, you can indeed use the `ROW_NUMBER()` function along with Common Table Expressions (CTEs) to paginate your results. While you mentioned having trouble with CTEs, they are a powerful tool in MS SQL for breaking down large datasets into manageable chunks. Here's an example of how you can implement this approach:
sql
Code: Select all
WITH CTE AS (
SELECT a.appln_id, ROW_NUMBER() OVER (ORDER BY a.appln_id) AS row_num
FROM tls201_appln a
JOIN tls207_pers_appln ON tls207_pers_appln.appln_id = a.appln_id
JOIN tls206_person ON tls206_person.person_id = tls207_pers_appln.person_id
JOIN tls226_person_orig ON tls226_person_orig.person_id = tls207_pers_appln.person_id
JOIN tls221_inpadoc_prs ON tls221_inpadoc_prs.appln_id = a.appln_id
WHERE tls206_person.hrm_l2 LIKE 'SAMSUNG ELECTRONICS %'
)
SELECT appln_id
FROM CTE
WHERE row_num BETWEEN 1 AND 100000;
-- For the next batch
SELECT appln_id
FROM CTE
WHERE row_num BETWEEN 100001 AND 200000;
This way, you can continue to adjust the `BETWEEN` clause for each subsequent batch of results.
You also might consider how an
ERP could automate such data retrieval and management processes, ensuring efficiency and accuracy across your organization's operations.
I hope this helps!
Best regards,
Sunny