SQL queries to create and exclude some keywords

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

CarlosPereiraT
Posts: 1
Joined: Tue Nov 29, 2022 10:07 pm

SQL queries to create and exclude some keywords

Post by CarlosPereiraT » Tue Nov 29, 2022 10:12 pm

Hi, im new here and was trying to run a query where requires the keyword “software” “computer”, or “program” to be present, and none of the keywords “chip”, “semiconductor”, “bus”, “circuit” or “circuit” to be present. i did This:

Code: Select all

SELECT distinct 
  tls201_appln.docdb_family_id
, tls201_appln.appln_id
, [appln_auth]
, [appln_nr]
, [appln_kind]
, [appln_filing_date]
, [receiving_office]
, [earliest_publn_date]
, [granted]
, [nb_citing_docdb_fam]
, [nb_applicants]
, [nb_inventors]
, tls202_appln_title.appln_title

FROM tls201_appln
  INNER JOIN tls202_appln_title ON tls201_appln.appln_id = tls202_appln_title.appln_id 
  INNER JOIN tls203_appln_abstr ON tls201_appln.appln_id = tls203_appln_abstr.appln_id 


WHERE (appln_title IN 'software', 'computer', 'program' or appln_abstract IN 'software', “computer”, 'program')
AND( appln_title NOT IN 'chip', 'semiconductor' , 'circuity' , 'circuitry' , 'bus' OR appln_abstract NOT IN 'chip', 'semiconductor' , 'circuity' , 'circuitry' , 'bus')
AND appln_filing_year between 2003 and 2017
AND granted = 'Y'
order by docdb_family_id, appln_filing_date asc

Do you guys have another way to do it?


EPO / PATSTAT Support
Posts: 424
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: SQL queries to create and exclude some keywords

Post by EPO / PATSTAT Support » Thu Dec 01, 2022 3:13 pm

Hello Carlos,
full text searching is probably the best way to go. Your example SQL query will look for applications having specific single words as title or abstract which is not the same as a certain word to be present in the title or the abstract.

Have a look at the "contains" function in SQL. https://learn.microsoft.com/en-us/sql/t ... rver-ver16 This can only be used if your text fields (title and abstract) have been full text indexed by the data base administrator. (this is the case on PATSTAT Online :) )
One also has to keep in mind that the combinations using 2 fields with conditions of explicit containing or not containing certain words can give strange results as the conditions are executed independent from each other. If the Title contains "software" and from the same application the abstract contains "chip", the application will be included. Therefore one has to add a specific condition to exclude those applications (if that is what you want). It is always a good practice to test your queries on a very limited data set and manually check if the results are what is expected.
Here is an exemple based on your question:

Code: Select all

select tls201_appln.appln_id  ,appln_auth+appln_nr application ,appln_title , appln_abstract
from 
tls201_appln 
left join tls202_appln_title on tls201_appln.appln_id = tls202_appln_title.appln_id
left join tls203_appln_abstr on tls201_appln.appln_id = tls203_appln_abstr.appln_id

where 
tls201_appln.ipr_type = 'PI'
and appln_filing_date between '2000-01-01' and '2000-01-31' 
and appln_auth = 'EP'
and
(
tls201_appln.appln_id in (select appln_id from tls202_appln_title where (contains (appln_title,   '  ("software" or "computer" or "program") and not ("chip" or "semiconductor" or "circuit*"  or "bus") ')))
or 
tls201_appln.appln_id in (select appln_id from tls203_appln_abstr where (contains (appln_abstract, '  ("software" or "computer" or "program") and not ("chip" or "semiconductor" or "circuit*" or "bus")  ')))
)
and 
( tls201_appln.appln_id not in (select appln_id from tls203_appln_abstr where (contains (appln_abstract, ' ("chip" or "semiconductor" or "circuit*" or "bus")  '))))
and 
(tls201_appln.appln_id not in (select appln_id from tls202_appln_title where (contains (appln_title,   ' ("chip" or "semiconductor" or "circuit*" or "bus") '))))
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply