problems with %-operator / retrieve green inventory IPCs

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

Thomas Oelker
Posts: 3
Joined: Fri Sep 20, 2019 10:50 am

problems with %-operator / retrieve green inventory IPCs

Post by Thomas Oelker » Fri Oct 11, 2019 9:21 am

Dear PATSTAT Community,

in broader terms, I want to get information about green patents as specified by the IPC green inventory (https://www.wipo.int/classifications/ip ... inventory/).

so I applied the following code:

Code: Select all

SELECT appln_filing_year, ipc_class_symbol, appln_auth, person_ctry_code
FROM tls201_appln 
JOIN tls209_appln_ipc  ON tls209_appln_ipc.appln_id = tls201_appln.appln_id
join tls207_pers_appln on tls207_pers_appln.appln_id = tls201_appln.appln_id
join tls206_person on tls206_person.person_id = tls207_pers_appln.person_id 
WHERE ipc_class_symbol LIKE 'H01M 4%' -- Change IPC symbol here
AND appln_filing_year BETWEEN 2000 AND 2000 -- Define year range here
GROUP BY appln_filing_year, ipc_class_symbol , appln_auth, person_ctry_code
ORDER BY appln_filing_year, ipc_class_symbol
This query yields 0 rows. But, when looking for

Code: Select all

...WHERE ipc_class_symbol LIKE 'H01M %'...
I can clearly find patents with the ipc code H01M 4/86 that is classified as green.

What might be the problem?

Thanks a lot in advance!
Best, Thomas


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: problems with %-operator / retrieve green inventory IPCs

Post by EPO / PATSTAT Support » Fri Oct 11, 2019 3:21 pm

Hello Thomas,
just a small syntax error: IPC and CPC codes where the maingroup level is used need to be written with 8 digits.
LIKE 'H01M 4%' is not the same as LIKE 'H01M4%' You need to replenish with leading blanks.
The "/" should always be on the 9 th position.

Code: Select all

SELECT appln_filing_year, ipc_class_symbol, appln_auth, person_ctry_code
FROM tls201_appln 
JOIN tls209_appln_ipc  ON tls209_appln_ipc.appln_id = tls201_appln.appln_id
join tls207_pers_appln on tls207_pers_appln.appln_id = tls201_appln.appln_id
join tls206_person on tls206_person.person_id = tls207_pers_appln.person_id 
WHERE ipc_class_symbol LIKE 'H01M   4%' -- or 'H01M   4/%'
AND appln_filing_year BETWEEN 2000 AND 2000 -- Define year range here
GROUP BY appln_filing_year, ipc_class_symbol , appln_auth, person_ctry_code
ORDER BY appln_filing_year, ipc_class_symbol
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Thomas Oelker
Posts: 3
Joined: Fri Sep 20, 2019 10:50 am

Re: problems with %-operator / retrieve green inventory IPCs

Post by Thomas Oelker » Mon Oct 14, 2019 8:41 am

perfect, thank you very much!

Best, Thomas


xy4u20
Posts: 9
Joined: Mon Feb 27, 2023 3:03 pm

Re: problems with %-operator / retrieve green inventory IPCs

Post by xy4u20 » Wed Mar 08, 2023 4:48 pm

Dear PATSTAT community,

Can I search multiple IPCs at the same time? I am currently using the COUNT command to search for IPC patents, but I cannot add more than one IPC to the search code. How should I solve this problem? And I want the IPC to be displayed in the results, how do I edit the code?

One last question, how would I download the full patent dataset (given my attempts so far, I can only get the results companies that match the query code rather than all of them)?

Looking forward to your reply.

Best,
Xiaohan

Code: Select all

SELECT psn_name, 
COUNT(distinct(case when t1.appln_filing_year = 2010 then t1.appln_id end)) as '2010',
COUNT(distinct(case when t1.appln_filing_year = 2011 then t1.appln_id end)) as '2011',
COUNT(distinct(case when t1.appln_filing_year = 2012 then t1.appln_id end)) as '2012',
COUNT(distinct(case when t1.appln_filing_year = 2013 then t1.appln_id end)) as '2013',
COUNT(distinct(case when t1.appln_filing_year = 2014 then t1.appln_id end)) as '2014',
COUNT(distinct(case when t1.appln_filing_year = 2015 then t1.appln_id end)) as '2015',
COUNT(distinct(case when t1.appln_filing_year = 2016 then t1.appln_id end)) as '2016',
COUNT(distinct(case when t1.appln_filing_year = 2017 then t1.appln_id end)) as '2017',
COUNT(distinct(case when t1.appln_filing_year = 2018 then t1.appln_id end)) as '2018',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2019',
count(t1.appln_id) total
FROM   tls201_appln t1 JOIN  tls209_appln_ipc t2 
	ON t1.appln_id = t2.appln_id
join tls207_pers_appln on t1.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
WHERE
t2.ipc_class_symbol LIKE 'C10L 3/00' and 'F02C 1/05' --Does only one IPC work at a time?
AND t1.appln_filing_year between 2010 and 2019
and applt_seq_nr > 0 and invt_seq_nr = 0
and granted = 'y'
group by psn_name
order by total desc


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: problems with %-operator / retrieve green inventory IPCs

Post by EPO / PATSTAT Support » Thu Mar 09, 2023 11:50 am

Hello Xiaohan,
The reason why you don't find any result is because of the way your query is build using t2.ipc_class_symbol LIKE 'C10L 3/00' AND 'F02C 1/05'. This is correct SQL syntax, but it does not work for what you intend to do.
SQL and relational data bases have what we call normalised data, so each attribute (mostly) contains only 1 single value. With other words ipc_class_symbol can not contain multiple values in the same "field".
Example: If you run the query below you will find 1 patent having 4 IPC codes.

Code: Select all

Select appln_auth, appln_nr, appln_kind, ipc_class_symbol from 
tls201_appln 
join tls209_appln_ipc on tls201_appln.appln_id = tls209_appln_ipc.appln_id
where 
tls201_appln.appln_auth = 'EP' and tls201_appln.appln_nr= '07808942'
If you would change this now to :

Code: Select all

Select appln_auth, appln_nr, appln_kind, ipc_class_symbol from 
tls201_appln 
join tls209_appln_ipc on tls201_appln.appln_id = tls209_appln_ipc.appln_id
where 
tls201_appln.appln_auth = 'EP' and tls201_appln.appln_nr= '07808942'
and ipc_class_symbol = 'C10L   3/00'
and ipc_class_symbol = 'F02C   1/06'
You will get 0 results, even though we know that both codes are correct IPC codes for that patent. This is because the data base looks for both values in the same record.
A correct query that does what you need is:

Code: Select all

Select appln_id,appln_auth, appln_nr, appln_kind
FROM  tls201_appln 
WHERE
tls201_appln.appln_id in (select appln_id from tls209_appln_ipc where ipc_class_symbol = 'C10L   3/00')
and 
tls201_appln.appln_id in (select appln_id from tls209_appln_ipc where ipc_class_symbol = 'F02C   1/06')
2 "small" observations:
a) using LIKE with % wildcard is in most cases sufficient to also cover the lower groups of the classification, but not always. To be 100% sure, double check in ESPACENET.
See: https://worldwide.espacenet.com/patent/ ... C=F02C1/05, and you will observe that "LIKE 'F02C 1/05%' would not cover 'F02C 1/06', which I assume you want to also include to obtain all the patents in that technical field. Here you can use the IN clause which is the similar to using OR.
b) for most data extractions, you would want to cover as well IPC as CPC to retrieve all relevant patents.
So here is a solid query that gives you all patents that have both the IPC codes or both the CPC codes. (The classfication codes are in an AND relation, the IPC and CPC conditions in an OR !)

Code: Select all

Select appln_id,appln_auth, appln_nr, appln_kind
FROM  tls201_appln 
WHERE
(
tls201_appln.appln_id in (select appln_id from tls209_appln_ipc where ipc_class_symbol like 'C10L   3%')
and 
tls201_appln.appln_id in (select appln_id from tls209_appln_ipc where ipc_class_symbol in ('F02C   1/05','F02C   1/06' ) )
)
or
(
tls201_appln.appln_id in (select appln_id from tls224_appln_cpc where cpc_class_symbol like 'C10L   3%')
and 
tls201_appln.appln_id in (select appln_id from tls224_appln_cpc where cpc_class_symbol in ('F02C   1/05','F02C   1/06' ) )
);
If you want to extend this to even more IPC or CPC codes, then you will need to extend the WHERE close with more conditions via ORs.
This query only retrieves 11 patents, so not much counting can be done over applicants and years. So I am doubtful if your combination of 'C10L 3%' and ('F02C 1/05','F02C 1/06' ) is really what you want.
To download a data set, you just have to be sure that the SELECT clause contains the appln_id. So not your query with the CASE and COUNTS.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


xy4u20
Posts: 9
Joined: Mon Feb 27, 2023 3:03 pm

Re: problems with %-operator / retrieve green inventory IPCs

Post by xy4u20 » Thu Mar 16, 2023 2:57 am

Thank you for your reply!

Unfortunately, the query code will not meet my needs if it only ensures that the SELECT clause contains appln_id. I would expect to get the number of patents successfully filed by the company, not appln_id or appln_auth, appln_nr.
I am happy with the results table showing the company name, IPC, and year count.
When WHERE contains "tls201_appln.appln_id in (select appln_id from tls209_appln_ipc where ipc_class_symbol in ('F02C 1/00','E04H 1/00' ) )", psn_name and appln_id cannot appear in SELECT at the same time regardless of whether I query CASE and COUNTS.
How should I design the query code to get the results for column 1: psn_name; column 2: IPC; column 3: year2010; column 4: number of patents count total?

What should I do if psn_name and appln_id are in conflict here, but I still need time series results like the above? Because psn_name is my research variable not appln_id.

Looking forward to your reply.

Best,
Xiaohan


xy4u20
Posts: 9
Joined: Mon Feb 27, 2023 3:03 pm

Re: problems with %-operator / retrieve green inventory IPCs

Post by xy4u20 » Tue Mar 21, 2023 12:55 am

Dear PATSTAT Community,

Please ignore my response posted on 16 March as I have designed new code to implement the requirements. Here is the query code for using COUNT to get the IPC specified by psn_name, and my request for help.

Here are my new queries:
First: because the slash "/" in ipc_class_symbol needs to always be in the 9th position, this will result in some additional results being grabbed. For example, the results table includes not only the two IPCs for the query code but also 'H02J 15/00', 'C03B 5/235', 'F01K 27/02' etc. Although these extended results are engine related, I suspect that this reduces the accuracy of the results. Should I consider keeping these extended results?
Code 1

Code: Select all

SELECT psn_id, psn_name, ipc_class_symbol,
COUNT(distinct(case when t1.appln_filing_year = 2010 then t1.appln_id end)) as '2010',
COUNT(distinct(case when t1.appln_filing_year = 2011 then t1.appln_id end)) as '2011',
COUNT(distinct(case when t1.appln_filing_year = 2012 then t1.appln_id end)) as '2012',
COUNT(distinct(case when t1.appln_filing_year = 2013 then t1.appln_id end)) as '2013',
COUNT(distinct(case when t1.appln_filing_year = 2014 then t1.appln_id end)) as '2014',
COUNT(distinct(case when t1.appln_filing_year = 2015 then t1.appln_id end)) as '2015',
COUNT(distinct(case when t1.appln_filing_year = 2016 then t1.appln_id end)) as '2016',
COUNT(distinct(case when t1.appln_filing_year = 2017 then t1.appln_id end)) as '2017',
COUNT(distinct(case when t1.appln_filing_year = 2018 then t1.appln_id end)) as '2018',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2019',
count(t1.appln_id) total
FROM   tls201_appln t1 JOIN  tls209_appln_ipc t2 
ON t1.appln_id = t2.appln_id
join tls207_pers_appln on t1.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
WHERE
t1.appln_id in (select appln_id from tls209_appln_ipc where ipc_class_symbol in ('F02C   1/05','F02C   1/06' ) )
AND t1.appln_filing_year between 2010 and 2019
and applt_seq_nr > 0 and invt_seq_nr = 0
and granted = 'Y'
group by psn_id, psn_name, ipc_class_symbol
order by total desc
Secondly, Code 2 is the result of querying the identified 4-digit IPC. Again there are many other extended IPCs in the results table. In this code, it works fine when there is only one 4-digit IPC, but it doesn't work when I use AND or ',' add other 4-IPCs. Messages show either "An expression of non-boolean type specified in a context where a condition is expected, near ')' ". or "Incorrect syntax near ',' ".
Code 2

Code: Select all

SELECT psn_id, psn_name, ipc_class_symbol,
COUNT(distinct(case when t1.appln_filing_year = 2010 then t1.appln_id end)) as '2010',
COUNT(distinct(case when t1.appln_filing_year = 2011 then t1.appln_id end)) as '2011',
COUNT(distinct(case when t1.appln_filing_year = 2012 then t1.appln_id end)) as '2012',
COUNT(distinct(case when t1.appln_filing_year = 2013 then t1.appln_id end)) as '2013',
COUNT(distinct(case when t1.appln_filing_year = 2014 then t1.appln_id end)) as '2014',
COUNT(distinct(case when t1.appln_filing_year = 2015 then t1.appln_id end)) as '2015',
COUNT(distinct(case when t1.appln_filing_year = 2016 then t1.appln_id end)) as '2016',
COUNT(distinct(case when t1.appln_filing_year = 2017 then t1.appln_id end)) as '2017',
COUNT(distinct(case when t1.appln_filing_year = 2018 then t1.appln_id end)) as '2018',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2019',
count(t1.appln_id) total
FROM   tls201_appln t1 JOIN  tls209_appln_ipc t2 
ON t1.appln_id = t2.appln_id
join tls207_pers_appln on t1.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
WHERE
t1.appln_id in (select appln_id from tls209_appln_ipc where ipc_class_symbol like 'A01H%') -- Change IPC symbol here
AND t1.appln_filing_year between 2010 and 2019
and applt_seq_nr > 0 and invt_seq_nr = 0
and granted = 'y'
group by psn_id, psn_name, ipc_class_symbol
order by total desc
Finally, Code 1 and Code 2 run successfully and get results when they are separated. However, when codes 1 and 2 are combined together, it shows that the data fetched 0 rows. How should I improve these codes?

Looking forward to your reply.

Best regards,
Xiaohan


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: problems with %-operator / retrieve green inventory IPCs

Post by EPO / PATSTAT Support » Thu Mar 30, 2023 3:39 pm

On code 1)
Using t1.appln_id in (select appln_id from tls209_appln_ipc where ipc_class_symbol in ('F02C 1/05','F02C 1/06' ) ) will retrieve all the data (and all the IPC codes for all the applications) for wich there is an IPC code = F02C 1/05' or 'F02C 1/06'.
If you only want those values to be in the result list then you should write

Code: Select all

WHERE
ipc_class_symbol in ('F02C   1/05','F02C   1/06' ) 
AND t1.appln_filing_year between 2010 and 2019
and applt_seq_nr > 0 and invt_seq_nr = 0
and granted = 'Y'
group by psn_id, psn_name, ipc_class_symbol
order by total desc
(much less rows in this table).
Both approaches are correct, it simply depends on what data you need for you analysis.
On code 2) , same observation. If you only want the rows that have A01H, then your WHERE clause should be:

Code: Select all

SELECT psn_id, psn_name, ipc_class_symbol,
COUNT(distinct(case when t1.appln_filing_year = 2010 then t1.appln_id end)) as '2010',
COUNT(distinct(case when t1.appln_filing_year = 2011 then t1.appln_id end)) as '2011',
COUNT(distinct(case when t1.appln_filing_year = 2012 then t1.appln_id end)) as '2012',
COUNT(distinct(case when t1.appln_filing_year = 2013 then t1.appln_id end)) as '2013',
COUNT(distinct(case when t1.appln_filing_year = 2014 then t1.appln_id end)) as '2014',
COUNT(distinct(case when t1.appln_filing_year = 2015 then t1.appln_id end)) as '2015',
COUNT(distinct(case when t1.appln_filing_year = 2016 then t1.appln_id end)) as '2016',
COUNT(distinct(case when t1.appln_filing_year = 2017 then t1.appln_id end)) as '2017',
COUNT(distinct(case when t1.appln_filing_year = 2018 then t1.appln_id end)) as '2018',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2019',
count(t1.appln_id) total
FROM   tls201_appln t1 JOIN  tls209_appln_ipc t2 
ON t1.appln_id = t2.appln_id
join tls207_pers_appln on t1.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
WHERE
ipc_class_symbol like 'A01H%' -- Change IPC symbol here
AND t1.appln_filing_year between 2010 and 2019
and applt_seq_nr > 0 and invt_seq_nr = 0
and granted = 'y'
group by psn_id, psn_name, ipc_class_symbol
order by total desc
Combining both can be easily done with a Boolean OR.

Code: Select all

SELECT psn_id, psn_name, ipc_class_symbol,
COUNT(distinct(case when t1.appln_filing_year = 2010 then t1.appln_id end)) as '2010',
COUNT(distinct(case when t1.appln_filing_year = 2011 then t1.appln_id end)) as '2011',
COUNT(distinct(case when t1.appln_filing_year = 2012 then t1.appln_id end)) as '2012',
COUNT(distinct(case when t1.appln_filing_year = 2013 then t1.appln_id end)) as '2013',
COUNT(distinct(case when t1.appln_filing_year = 2014 then t1.appln_id end)) as '2014',
COUNT(distinct(case when t1.appln_filing_year = 2015 then t1.appln_id end)) as '2015',
COUNT(distinct(case when t1.appln_filing_year = 2016 then t1.appln_id end)) as '2016',
COUNT(distinct(case when t1.appln_filing_year = 2017 then t1.appln_id end)) as '2017',
COUNT(distinct(case when t1.appln_filing_year = 2018 then t1.appln_id end)) as '2018',
COUNT(distinct(case when t1.appln_filing_year = 2019 then t1.appln_id end)) as '2019',
count(t1.appln_id) total
FROM   tls201_appln t1 JOIN  tls209_appln_ipc t2 
ON t1.appln_id = t2.appln_id
join tls207_pers_appln on t1.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
WHERE
(ipc_class_symbol like 'A01H%' 
OR
ipc_class_symbol in ('F02C   1/05','F02C   1/06' ))
AND t1.appln_filing_year between 2010 and 2019
and applt_seq_nr > 0 and invt_seq_nr = 0
and granted = 'Y'
group by psn_id, psn_name, ipc_class_symbol
order by total desc
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply