Get all patents but not duplicated

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

JohnVoyore
Posts: 4
Joined: Fri May 20, 2022 5:40 pm

Get all patents but not duplicated

Post by JohnVoyore » Sat May 28, 2022 5:58 pm

How can i choose all publications without distinguishing the country, besides how to avoid that it is a duplicate patent? How to use the distinct and should I use some auth? if yes, which one?
For example:
SELECT *
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE
NOT CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020


how choose all countries with this code and that patents are not duplicated


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Get all patents but not duplicated

Post by EPO / PATSTAT Support » Mon May 30, 2022 9:28 am

Hello JohnVoyore,
Your query will not generate any duplicate applications as such, because each application has only 1 record in tls201_appln and can only have 1 title, so you will not have any duplicates.
You might of course have patents filed in different patent offices for the same invention. Those can be identified via the docdb_family_id. If you want to limit to patents filed at a specific patent offices, then you can add an extra condition in the WHERE clause, something like appln_auth = 'US'
Then there is also the fact that your WHERE clause condition will generate an enormous amount of results. (millions of patents) I wonder if you did not meant to retrieve patents that have "machine learning" in the title, instead of NOT having "machine learning" in the table. Below is an exemple.

Code: Select all

SELECT distinct tls201_appln.docdb_family_id
	  ,tls201_appln.appln_id
      ,[appln_auth]
      ,[appln_nr]
      ,[appln_kind]
      ,[appln_filing_date]
      ,[receiving_office]
      ,[earliest_publn_date]
      ,[granted]
      ,[nb_citing_docdb_fam]
      ,[nb_applicants]
      ,[nb_inventors]
	  ,tls202_appln_title.appln_title
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
order by docdb_family_id, appln_filing_date asc
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


JohnVoyore
Posts: 4
Joined: Fri May 20, 2022 5:40 pm

Re: Get all patents but not duplicated

Post by JohnVoyore » Fri Jun 03, 2022 5:45 pm

EPO / PATSTAT Support wrote:
Mon May 30, 2022 9:28 am
Hello JohnVoyore,
Your query will not generate any duplicate applications as such, because each application has only 1 record in tls201_appln and can only have 1 title, so you will not have any duplicates.
You might of course have patents filed in different patent offices for the same invention. Those can be identified via the docdb_family_id. If you want to limit to patents filed at a specific patent offices, then you can add an extra condition in the WHERE clause, something like appln_auth = 'US'
Then there is also the fact that your WHERE clause condition will generate an enormous amount of results. (millions of patents) I wonder if you did not meant to retrieve patents that have "machine learning" in the title, instead of NOT having "machine learning" in the table. Below is an exemple.

Code: Select all

SELECT distinct tls201_appln.docdb_family_id
	  ,tls201_appln.appln_id
      ,[appln_auth]
      ,[appln_nr]
      ,[appln_kind]
      ,[appln_filing_date]
      ,[receiving_office]
      ,[earliest_publn_date]
      ,[granted]
      ,[nb_citing_docdb_fam]
      ,[nb_applicants]
      ,[nb_inventors]
	  ,tls202_appln_title.appln_title
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
order by docdb_family_id, appln_filing_date asc
No, no, I really want it not to contain machine learning, but as you said there are millions. What I would like is to get the same amount of patents for machine learning and for non-machine learning. Assuming machine learning has 5000 patents, then I would like to take 5000 patents that were not identified as machine learning. Is there a way to randomly take the 5000 patents from those not identified as machine learning?


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Get all patents but not duplicated

Post by EPO / PATSTAT Support » Wed Jun 15, 2022 4:59 pm

You could use any of the methods to generate a "random sample" (which is a strongly debated concept in SQL; how random is random...)
But SQL questions are outside the scope of PATSTAT support. Nevertheless, here is an example.

Code: Select all

SELECT top 5000 tls201_appln.docdb_family_id
	  ,tls201_appln.appln_id
      ,appln_auth
      ,appln_nr
      ,appln_kind
      ,appln_filing_date
      ,receiving_office
      ,earliest_publn_date
      ,granted
      ,nb_citing_docdb_fam
      ,nb_applicants
      ,nb_inventors
	  ,tls202_appln_title.appln_title
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE not CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
group by tls201_appln.docdb_family_id,tls201_appln.appln_id,appln_auth
      ,appln_nr,appln_kind,appln_filing_date,receiving_office
      ,earliest_publn_date,granted,nb_citing_docdb_fam
      ,nb_applicants,nb_inventors,tls202_appln_title.appln_title
ORDER BY NEWID()
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply