problem with keywords and CPC query

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

rockreid
Posts: 2
Joined: Tue Jan 26, 2016 2:45 am

problem with keywords and CPC query

Post by rockreid » Tue Jan 26, 2016 3:30 am

Hello PATSTAT community,

I am currently working for my PhD thesis and collecting all the Japan smart grid technologies.
But I am currently stuck with the query.
I am using the CPC classification +keywords to collect the patent data.
The detailed query is

SELECT distinct(a.appln_id)
FROM tls201_appln a
LEFT OUTER join tls224_appln_cpc cpc ON a.appln_id=cpc.appln_id
LEFT OUTER JOIN tls202_appln_title t ON a.appln_id = t.appln_id
LEFT OUTER JOIN tls203_appln_abstr abstr ON a.appln_id = abstr.appln_id


WHERE appln_auth = 'JP'
and (cpc.cpc_class_symbol like'Y04S%'
or t.appln_title LIKE '%Smart grid%'
or t.appln_title LIKE '%Smart metering%'
or t.appln_title LIKE '%Energy demand management%'
or t.appln_title LIKE '%Demand side management%'
or t.appln_title LIKE '%Distributed generation%'
or t.appln_title LIKE '%Substation automation%'
or t.appln_title LIKE '%Supervisory control and data acquisition%'
or t.appln_title LIKE '%Phasor measurement unit%'
or t.appln_title LIKE '%Demand response%'
or t.appln_title LIKE '%Dynamic demand%'
OR abstr. appln_abstract LIKE '%Smart grid%'
or abstr. appln_abstract LIKE '%Smart metering%'
or abstr. appln_abstract LIKE '%Energy demand management%'
or abstr. appln_abstract LIKE '%Demand side management%'
or abstr. appln_abstract LIKE '%Distributed generation%'
or abstr. appln_abstract LIKE '%Substation automation%'
or abstr. appln_abstract LIKE '%Supervisory control and data acquisition%'
or abstr. appln_abstract LIKE '%Phasor measurement unit%'
or abstr. appln_abstract LIKE '%Demand response%'
or abstr. appln_abstract LIKE '%Dynamic demand%')

The problem is the query process is quit time consuming .
It has already taken more than 10 hours and the searching process is still going on. I don't know when it will be finished.
It would be greatly appreciated that if someone can help with issues? Or should I change to new search query?


Geert Boedt
Posts: 176
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: problem with keywords and CPC query

Post by Geert Boedt » Tue Jan 26, 2016 4:46 pm

Hello Rockreid,
Although your SQL query is syntax-wise correct, it is too complex from a computational point of view for PATSTAT Online. Queries that take more than 1 hour will normally time out. From a “logical” point of view, it is also rather un-usual to search for patents combining CPC (or IPC) classifications with a keyword search linked with an “OR” function, and as such widening the scope of retrieved patent applications. Most (re)searchers would use keyword searches to reduce the scope; looking for very specific (new –combinations of-) technological areas that are not yet covered with the available classification schemes. Keyword searches often introduce noise in the data.
Solution: I had a closer look at your query. It basically combines 3 different “sets”, based on CPC, keywords in title and keywords in abstract.
If you split the query in 3 separate ones you will get about 6000 (JP) patents based on the CPC condition, 45 based on the keywords in the title and 100 based on keywords in the abstract. Most of the patents retrieved via the keywords do have a Y02S classification. So from a statistical point of view, you could simply search using only the Y02S condition and forget about the keywords.
If you are not satisfied with this, then you could run the 3 query-parts separately, export the result lists (into Excel,...), and then de-duplicate the appln_id’s. The keywords will introduce some none-smart grid patents (water and gas distribution networks...)

SELECT distinct(tls201_appln.appln_id)
FROM tls201_appln join tls224_appln_cpc ON tls201_appln.appln_id= tls224_appln_cpc.appln_id
where left(cpc_class_symbol,4) = 'Y04S' and appln_auth = 'JP'

(result: 6282 applications)

SELECT distinct(jp.appln_id)
FROM (select appln_id from tls201_appln where appln_auth = 'JP') JP
join tls203_appln_abstr abstr ON jp.appln_id=abstr.appln_id
where (abstr. appln_abstract LIKE '%Smart grid%'
or abstr. appln_abstract LIKE '%Smart metering%'
or abstr. appln_abstract LIKE '%Energy demand management%'
or .....
(result: 98 applications)


SELECT distinct(jp.appln_id)
FROM (select appln_id from tls201_appln where appln_auth = 'JP') JP
join tls202_appln_title t ON jp.appln_id=t.appln_id
where (t.appln_title LIKE '%Smart grid%'
or t.appln_title LIKE '%Smart metering%'
or t.appln_title LIKE '%Energy demand management%'
or t.appln_title LIKE '%Demand side management%'
or .....
or t.appln_title LIKE '%Dynamic demand%')
(result:45 applications)

I hope this answers your question.
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


rockreid
Posts: 2
Joined: Tue Jan 26, 2016 2:45 am

Re: problem with keywords and CPC query

Post by rockreid » Tue Jan 26, 2016 5:21 pm

Dear Geert Boedt,

Thank you so much for you help and you really solved my problems. :D :D :D
Best wishes,
aleric


Post Reply