Page 1 of 1
Get all patents but not duplicated
Posted: Sat May 28, 2022 5:58 pm
by JohnVoyore
How can i choose all publications without distinguishing the country, besides how to avoid that it is a duplicate patent? How to use the distinct and should I use some auth? if yes, which one?
For example:
SELECT *
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE
NOT CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
how choose all countries with this code and that patents are not duplicated
Re: Get all patents but not duplicated
Posted: Mon May 30, 2022 9:28 am
by EPO / PATSTAT Support
Hello JohnVoyore,
Your query will not generate any duplicate applications as such, because each application has only 1 record in tls201_appln and can only have 1 title, so you will not have any duplicates.
You might of course have patents filed in different patent offices for the same invention. Those can be identified via the docdb_family_id. If you want to limit to patents filed at a specific patent offices, then you can add an extra condition in the WHERE clause, something like
appln_auth = 'US'
Then there is also the fact that your WHERE clause condition will generate an enormous amount of results. (millions of patents) I wonder if you did not meant to retrieve patents that have "machine learning" in the title, instead of
NOT having "machine learning" in the table. Below is an exemple.
Code: Select all
SELECT distinct tls201_appln.docdb_family_id
,tls201_appln.appln_id
,[appln_auth]
,[appln_nr]
,[appln_kind]
,[appln_filing_date]
,[receiving_office]
,[earliest_publn_date]
,[granted]
,[nb_citing_docdb_fam]
,[nb_applicants]
,[nb_inventors]
,tls202_appln_title.appln_title
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
order by docdb_family_id, appln_filing_date asc
Re: Get all patents but not duplicated
Posted: Fri Jun 03, 2022 5:45 pm
by JohnVoyore
EPO / PATSTAT Support wrote: ↑Mon May 30, 2022 9:28 am
Hello JohnVoyore,
Your query will not generate any duplicate applications as such, because each application has only 1 record in tls201_appln and can only have 1 title, so you will not have any duplicates.
You might of course have patents filed in different patent offices for the same invention. Those can be identified via the docdb_family_id. If you want to limit to patents filed at a specific patent offices, then you can add an extra condition in the WHERE clause, something like
appln_auth = 'US'
Then there is also the fact that your WHERE clause condition will generate an enormous amount of results. (millions of patents) I wonder if you did not meant to retrieve patents that have "machine learning" in the title, instead of
NOT having "machine learning" in the table. Below is an exemple.
Code: Select all
SELECT distinct tls201_appln.docdb_family_id
,tls201_appln.appln_id
,[appln_auth]
,[appln_nr]
,[appln_kind]
,[appln_filing_date]
,[receiving_office]
,[earliest_publn_date]
,[granted]
,[nb_citing_docdb_fam]
,[nb_applicants]
,[nb_inventors]
,tls202_appln_title.appln_title
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
order by docdb_family_id, appln_filing_date asc
No, no, I really want it not to contain machine learning, but as you said there are millions. What I would like is to get the same amount of patents for machine learning and for non-machine learning. Assuming machine learning has 5000 patents, then I would like to take 5000 patents that were not identified as machine learning. Is there a way to randomly take the 5000 patents from those not identified as machine learning?
Re: Get all patents but not duplicated
Posted: Wed Jun 15, 2022 4:59 pm
by EPO / PATSTAT Support
You could use any of the methods to generate a "random sample" (which is a strongly debated concept in SQL; how random is random...)
But SQL questions are outside the scope of PATSTAT support. Nevertheless, here is an example.
Code: Select all
SELECT top 5000 tls201_appln.docdb_family_id
,tls201_appln.appln_id
,appln_auth
,appln_nr
,appln_kind
,appln_filing_date
,receiving_office
,earliest_publn_date
,granted
,nb_citing_docdb_fam
,nb_applicants
,nb_inventors
,tls202_appln_title.appln_title
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE not CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
group by tls201_appln.docdb_family_id,tls201_appln.appln_id,appln_auth
,appln_nr,appln_kind,appln_filing_date,receiving_office
,earliest_publn_date,granted,nb_citing_docdb_fam
,nb_applicants,nb_inventors,tls202_appln_title.appln_title
ORDER BY NEWID()