How can i choose all publications without distinguishing the country, besides how to avoid that it is a duplicate patent? How to use the distinct and should I use some auth? if yes, which one?
For example:
SELECT *
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE
NOT CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
how choose all countries with this code and that patents are not duplicated
Get all patents but not duplicated
-
- Posts: 440
- Joined: Thu Feb 22, 2007 5:33 pm
- Contact:
Re: Get all patents but not duplicated
Hello JohnVoyore,
Your query will not generate any duplicate applications as such, because each application has only 1 record in tls201_appln and can only have 1 title, so you will not have any duplicates.
You might of course have patents filed in different patent offices for the same invention. Those can be identified via the docdb_family_id. If you want to limit to patents filed at a specific patent offices, then you can add an extra condition in the WHERE clause, something like appln_auth = 'US'
Then there is also the fact that your WHERE clause condition will generate an enormous amount of results. (millions of patents) I wonder if you did not meant to retrieve patents that have "machine learning" in the title, instead of NOT having "machine learning" in the table. Below is an exemple.
Your query will not generate any duplicate applications as such, because each application has only 1 record in tls201_appln and can only have 1 title, so you will not have any duplicates.
You might of course have patents filed in different patent offices for the same invention. Those can be identified via the docdb_family_id. If you want to limit to patents filed at a specific patent offices, then you can add an extra condition in the WHERE clause, something like appln_auth = 'US'
Then there is also the fact that your WHERE clause condition will generate an enormous amount of results. (millions of patents) I wonder if you did not meant to retrieve patents that have "machine learning" in the title, instead of NOT having "machine learning" in the table. Below is an exemple.
Code: Select all
SELECT distinct tls201_appln.docdb_family_id
,tls201_appln.appln_id
,[appln_auth]
,[appln_nr]
,[appln_kind]
,[appln_filing_date]
,[receiving_office]
,[earliest_publn_date]
,[granted]
,[nb_citing_docdb_fam]
,[nb_applicants]
,[nb_inventors]
,tls202_appln_title.appln_title
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
order by docdb_family_id, appln_filing_date asc
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org
EPO - Vienna
patstat @ epo.org
-
- Posts: 4
- Joined: Fri May 20, 2022 5:40 pm
Re: Get all patents but not duplicated
No, no, I really want it not to contain machine learning, but as you said there are millions. What I would like is to get the same amount of patents for machine learning and for non-machine learning. Assuming machine learning has 5000 patents, then I would like to take 5000 patents that were not identified as machine learning. Is there a way to randomly take the 5000 patents from those not identified as machine learning?EPO / PATSTAT Support wrote: ↑Mon May 30, 2022 9:28 amHello JohnVoyore,
Your query will not generate any duplicate applications as such, because each application has only 1 record in tls201_appln and can only have 1 title, so you will not have any duplicates.
You might of course have patents filed in different patent offices for the same invention. Those can be identified via the docdb_family_id. If you want to limit to patents filed at a specific patent offices, then you can add an extra condition in the WHERE clause, something like appln_auth = 'US'
Then there is also the fact that your WHERE clause condition will generate an enormous amount of results. (millions of patents) I wonder if you did not meant to retrieve patents that have "machine learning" in the title, instead of NOT having "machine learning" in the table. Below is an exemple.
Code: Select all
SELECT distinct tls201_appln.docdb_family_id ,tls201_appln.appln_id ,[appln_auth] ,[appln_nr] ,[appln_kind] ,[appln_filing_date] ,[receiving_office] ,[earliest_publn_date] ,[granted] ,[nb_citing_docdb_fam] ,[nb_applicants] ,[nb_inventors] ,tls202_appln_title.appln_title FROM tls201_appln JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id WHERE CONTAINS (appln_title, '"machine learning" ') AND appln_filing_year between 2015 and 2020 order by docdb_family_id, appln_filing_date asc
-
- Posts: 440
- Joined: Thu Feb 22, 2007 5:33 pm
- Contact:
Re: Get all patents but not duplicated
You could use any of the methods to generate a "random sample" (which is a strongly debated concept in SQL; how random is random...)
But SQL questions are outside the scope of PATSTAT support. Nevertheless, here is an example.
But SQL questions are outside the scope of PATSTAT support. Nevertheless, here is an example.
Code: Select all
SELECT top 5000 tls201_appln.docdb_family_id
,tls201_appln.appln_id
,appln_auth
,appln_nr
,appln_kind
,appln_filing_date
,receiving_office
,earliest_publn_date
,granted
,nb_citing_docdb_fam
,nb_applicants
,nb_inventors
,tls202_appln_title.appln_title
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE not CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
group by tls201_appln.docdb_family_id,tls201_appln.appln_id,appln_auth
,appln_nr,appln_kind,appln_filing_date,receiving_office
,earliest_publn_date,granted,nb_citing_docdb_fam
,nb_applicants,nb_inventors,tls202_appln_title.appln_title
ORDER BY NEWID()
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org
EPO - Vienna
patstat @ epo.org