Why different counts for B82Y subclass

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

Why different counts for B82Y subclass

Post by knikonor » Fri Nov 01, 2013 2:24 pm

Dear colleagues,

I am new in using PATSTAT for patent seraching so a few questions:
when I try to search how many total patents exist for B82Y subclass (nanotech) I
use the following query:

use patstatapril2013
select count(distinct appln_id) from tls224_appln_cpc
where cpc_class_symbol like 'B82Y%'

it returns 165637 - seems close to truth. I use April 2013 version.

But when I try to issue more detailed request:

use patstatapril2013
SELECT distinct tls224_appln_cpc.cpc_class_symbol,
tls211_pat_publn.publn_nr, tls201_appln.appln_nr,
tls201_appln.appln_filing_date, tls211_pat_publn.publn_date,
tls206_person.person_name, tls203_appln_abstr.appln_abstract,
tls201_appln.appln_auth

FROM tls224_appln_cpc INNER JOIN
tls203_appln_abstr ON tls203_appln_abstr.appln_id = tls224_appln_cpc.appln_id INNER JOIN
tls201_appln ON tls201_appln.appln_id = tls224_appln_cpc.appln_id INNER JOIN
tls207_pers_appln ON tls207_pers_appln.appln_id = tls224_appln_cpc.appln_id INNER JOIN
tls206_person ON tls206_person.person_id = tls224_appln_cpc.appln_id INNER JOIN
tls211_pat_publn ON tls211_pat_publn.appln_id = tls224_appln_cpc.appln_id

WHERE tls224_appln_cpc.cpc_class_symbol LIKE 'B82Y%'

order by tls211_pat_publn.publn_date asc

I have very different number: about 88000 records. Also filling and publication dates start from 30-th but
actual search of this subclass by year shows more deep past coverage.
Could you please point me on my mistake?

Regards,
Kirill.


mkracker
Posts: 120
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Why different counts for B82Y subclass

Post by mkracker » Tue Dec 10, 2013 1:26 pm

Dear Kirill,

Some helpful hints and comments:

I tried your query
use patstatapril2013
select count(distinct appln_id) from tls224_appln_cpc
where cpc_class_symbol like 'B82Y%'

but got more rows: 184.259 – please check

Regarding your detailed request:
• In line “tls206_person ON tls206_person.person_id = tls224_appln_cpc.appln_id INNER JOIN” you compared APPLN_ID with PERSON ID, which is definitely not want you want.
• In General: Have a look at the Logical Model in the Data Catalog. The diagram gives an overview which tables should be joined via which attributes. Look out for FK (Foreign Keys).
It is at irritating (but still may be correct if done properly) to join tables which are not connected in the diagram.
• (INNER) JOINS return nothing if the corresponding record in one of the tables is missing.
Because you are joining the abstract table with the applications table, but there are a lot of applications without abstract, you will miss all these applications.
The same (to a considerably smaller degree) happens to all applications which do not have any applicant / inventor.
I suggest you add abstracts etc. in a second query or use subqueries, or OUTER JOINs.

I rephrased your query and excluded abstracts to prove my point. I used table aliases to make queries shorter and more readable. I reduced the SELECT list because it is not relevant here:

SELECT count(distinct a.appln_id)

FROM tls224_appln_cpc c INNER JOIN
-- tls203_appln_abstr ON tls203_appln_abstr.appln_id = tls224_appln_cpc.appln_id INNER JOIN
tls201_appln a ON c.appln_id = a.appln_id INNER JOIN
tls207_pers_appln pa ON a.appln_id = pa.appln_id INNER JOIN
tls206_person p ON pa.person_id = p.person_id INNER JOIN
tls211_pat_publn pub ON a.appln_id = pub.appln_id

WHERE c.cpc_class_symbol LIKE 'B82Y%'


This query returned 174 882 records

Best regards,
Martin Kracker / EPO - PATSTAT
-------------------------------------------
Martin Kracker / EPO


knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

Re: Why different counts for B82Y subclass

Post by knikonor » Thu Dec 12, 2013 11:04 am

Hello Martin!

Thanks much for your reply. I have re-imported tls224 table and now it looks fine.


knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

Re: Why different counts for B82Y subclass

Post by knikonor » Thu Dec 12, 2013 12:12 pm

Hello,

besides tables consistency fix I also tried these queries:

--- cut ---
use patstatapril2013
SELECT distinct tls211_pat_publn.*
FROM tls211_pat_publn LEFT JOIN tls209_appln_ipc ON tls211_pat_publn.appln_id = tls209_appln_ipc.appln_id
LEFT JOIN tls224_appln_cpc ON tls211_pat_publn.appln_id = tls224_appln_cpc.appln_id

where (tls224_appln_cpc.cpc_class_symbol like 'B82Y%' or tls209_appln_ipc.ipc_class_symbol like 'B82y%')
and year (tls211_pat_publn.publn_date) between 1800 and 1930

order by tls211_pat_publn.publn_date
--- cut ---

and

--- cut ---
use patstatapril2013
SELECT year(tls211_pat_publn.publn_date) as Year, count(DISTINCT(TLS211_PAT_PUBLN.APPLN_ID)) as Patents
FROM tls211_pat_publn LEFT JOIN tls209_appln_ipc ON tls211_pat_publn.appln_id = tls209_appln_ipc.appln_id
LEFT JOIN tls224_appln_cpc ON tls211_pat_publn.appln_id = tls224_appln_cpc.appln_id

where (tls224_appln_cpc.cpc_class_symbol like 'B82Y%' or tls209_appln_ipc.ipc_class_symbol like 'B82y%')
and year (tls211_pat_publn.publn_date) between 1800 and 1930
group by year(tls211_pat_publn.publn_date)
order by year(tls211_pat_publn.publn_date) asc
--- cut ---

but got only 3 rows. Seems that is not correct, so where is my mistake?

Thanks in advance,
Kirill.


mkracker
Posts: 120
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Why different counts for B82Y subclass

Post by mkracker » Wed Dec 18, 2013 5:04 pm

Dear Kirill,

your queries are fine. Using the April 2013 PATSTAT edition, I also retrieve only 3 publications. However, if I use the October 2013 edition, I retrieve 180 rows. This is plausible when compared to GPI (Global Patent Index), which returned 179 publication.

PATSTAT is based on a snapshot of DOCDB, EPO's master DB for bibliographic data.
PATSTAT April 2013 edition is based on a DOCDB snapshot of January 2013.
PATSTAT Oct 2013 edition is based on a DOCDB snapshot of July 2013.
So I assume there has been some reclassification regarding B82Y in the first half of 2013.

I suppose you have to use the October 2013 edition for your analysis. I do not know what you need to find out, but I assume the data quality for documents this old might be quite poor.

As a side comment, I am suprised that there are (US-)patents classified as nano-technology before 1930. The oldest publication I retrieved is from 1860 "Improvement in the purification of coal and oers".

Martin Kracker
EPO - PATSTAT
-------------------------------------------
Martin Kracker / EPO


Post Reply