Page 1 of 1

Get all patents but not duplicated

Posted: Sat May 28, 2022 5:58 pm
by JohnVoyore
How can i choose all publications without distinguishing the country, besides how to avoid that it is a duplicate patent? How to use the distinct and should I use some auth? if yes, which one?
For example:
SELECT *
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE
NOT CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020


how choose all countries with this code and that patents are not duplicated

Re: Get all patents but not duplicated

Posted: Mon May 30, 2022 9:28 am
by EPO / PATSTAT Support
Hello JohnVoyore,
Your query will not generate any duplicate applications as such, because each application has only 1 record in tls201_appln and can only have 1 title, so you will not have any duplicates.
You might of course have patents filed in different patent offices for the same invention. Those can be identified via the docdb_family_id. If you want to limit to patents filed at a specific patent offices, then you can add an extra condition in the WHERE clause, something like appln_auth = 'US'
Then there is also the fact that your WHERE clause condition will generate an enormous amount of results. (millions of patents) I wonder if you did not meant to retrieve patents that have "machine learning" in the title, instead of NOT having "machine learning" in the table. Below is an exemple.

Code: Select all

SELECT distinct tls201_appln.docdb_family_id
	  ,tls201_appln.appln_id
      ,[appln_auth]
      ,[appln_nr]
      ,[appln_kind]
      ,[appln_filing_date]
      ,[receiving_office]
      ,[earliest_publn_date]
      ,[granted]
      ,[nb_citing_docdb_fam]
      ,[nb_applicants]
      ,[nb_inventors]
	  ,tls202_appln_title.appln_title
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
order by docdb_family_id, appln_filing_date asc

Re: Get all patents but not duplicated

Posted: Fri Jun 03, 2022 5:45 pm
by JohnVoyore
EPO / PATSTAT Support wrote:
Mon May 30, 2022 9:28 am
Hello JohnVoyore,
Your query will not generate any duplicate applications as such, because each application has only 1 record in tls201_appln and can only have 1 title, so you will not have any duplicates.
You might of course have patents filed in different patent offices for the same invention. Those can be identified via the docdb_family_id. If you want to limit to patents filed at a specific patent offices, then you can add an extra condition in the WHERE clause, something like appln_auth = 'US'
Then there is also the fact that your WHERE clause condition will generate an enormous amount of results. (millions of patents) I wonder if you did not meant to retrieve patents that have "machine learning" in the title, instead of NOT having "machine learning" in the table. Below is an exemple.

Code: Select all

SELECT distinct tls201_appln.docdb_family_id
	  ,tls201_appln.appln_id
      ,[appln_auth]
      ,[appln_nr]
      ,[appln_kind]
      ,[appln_filing_date]
      ,[receiving_office]
      ,[earliest_publn_date]
      ,[granted]
      ,[nb_citing_docdb_fam]
      ,[nb_applicants]
      ,[nb_inventors]
	  ,tls202_appln_title.appln_title
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
order by docdb_family_id, appln_filing_date asc
No, no, I really want it not to contain machine learning, but as you said there are millions. What I would like is to get the same amount of patents for machine learning and for non-machine learning. Assuming machine learning has 5000 patents, then I would like to take 5000 patents that were not identified as machine learning. Is there a way to randomly take the 5000 patents from those not identified as machine learning?

Re: Get all patents but not duplicated

Posted: Wed Jun 15, 2022 4:59 pm
by EPO / PATSTAT Support
You could use any of the methods to generate a "random sample" (which is a strongly debated concept in SQL; how random is random...)
But SQL questions are outside the scope of PATSTAT support. Nevertheless, here is an example.

Code: Select all

SELECT top 5000 tls201_appln.docdb_family_id
	  ,tls201_appln.appln_id
      ,appln_auth
      ,appln_nr
      ,appln_kind
      ,appln_filing_date
      ,receiving_office
      ,earliest_publn_date
      ,granted
      ,nb_citing_docdb_fam
      ,nb_applicants
      ,nb_inventors
	  ,tls202_appln_title.appln_title
FROM tls201_appln
JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id
WHERE not CONTAINS (appln_title, '"machine learning" ')
AND appln_filing_year between 2015 and 2020
group by tls201_appln.docdb_family_id,tls201_appln.appln_id,appln_auth
      ,appln_nr,appln_kind,appln_filing_date,receiving_office
      ,earliest_publn_date,granted,nb_citing_docdb_fam
      ,nb_applicants,nb_inventors,tls202_appln_title.appln_title
ORDER BY NEWID()