Correctness of applications and publications counting

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

Correctness of applications and publications counting

Post by knikonor » Wed Feb 01, 2017 12:30 pm

Hello to All!

Can someone help me and point where I am not correct in querying patent numbers?
I am using Patastat April 2013 version of database offline currently.
What I need to get is total number of granted patents (publication) vs total number of
filled applications by period, say 1930-2013 and count them by number per year.

For publications I use the following query:

SELECT distinct year(tls211_pat_publn.publn_date) as Year,
count(tls211_pat_publn.pat_publn_id) as Patents

FROM tls201_appln
full outer join tls224_appln_cpc on tls201_appln.appln_id = tls224_appln_cpc.appln_id
full outer join tls211_pat_publn on tls201_appln.appln_id = tls211_pat_publn.appln_id

where tls224_appln_cpc.cpc_class_symbol like 'B82Y%'
and year(tls211_pat_publn.publn_date) between '1930' and '2013'

group by year(tls211_pat_publn.publn_date)
order by year(tls211_pat_publn.publn_date) asc

and get results close to Espacenet search engine using B82Y filter for CPC classification.

But for total amount of fillings I use the following query:

SELECT distinct year(tls201_appln.appln_filing_date) as Year,
count(tls201_appln.appln_id) as Filings

FROM tls201_appln

full outer JOIN tls224_appln_cpc on tls201_appln.appln_id = tls224_appln_cpc.appln_id

where tls224_appln_cpc.cpc_class_symbol like 'B82Y%'
-- and tls201_appln.appln_id <= 900000000
and year(tls201_appln.appln_filing_date) between '1930' and '2013'

group by year(tls201_appln.appln_filing_date)
order by year(tls201_appln.appln_filing_date) asc

and it shows not correct result - number of applications per year are smaller than publications.

Can you please help me to find out the right query for that?

Many thanks in advance,
Kirill.


Geert Boedt
Posts: 176
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: Correctness of applications and publications counting

Post by Geert Boedt » Wed Feb 01, 2017 4:58 pm

Hello Kirill,
your question is a bit more difficult then one might first think.
A single patent application mostly has multiple "dates": priority date(s) , filing date, publication date(s), date of grant -if applicable-. So if you want to do a count over a specific period, one has to keep in mind those different dates applicable to the application. One also has to keep in mind that those dates are in most cases spread over a number of years. Depending on what you want to analyse, you will need to develop your approach using the date you deem to be most applicable.

Here is a query that will count the number of applications based on the first publication, and giving you the number of those applications that have been granted. So the column "Granted" refers to the applications counted in "Total", but they will mostly be granted years after the date of the first publication.
I have also added the results of this query, but based on the PATSTAT2016B edition. If you compare that with the same data from PATSTAT2013A, then you will observe the "delay of grant" for those patents. If you count granted patents, it is good practice to specify appln_kind ='A' to avoid distortion. PCT applications (kind 'W') can not get granted, but will still be counted as an application if you don't exclude them.

Code: Select all

SELECT  year(first_pub.mindate) YYYY , count(distinct (tls201_appln.appln_id)) total,
COUNT(distinct(case when publn_first_grant = 1 then tls211_pat_publn.pat_publn_id end)) as 'granted'
FROM tls201_appln join tls224_appln_cpc on tls201_appln.appln_id = tls224_appln_cpc.appln_id 
left join tls211_pat_publn on tls201_appln.appln_id = tls211_pat_publn.appln_id and publn_first_grant = 1
left join (select appln_id , min(publn_date) mindate from tls211_pat_publn group by appln_id) as first_pub
 on tls201_appln.appln_id = first_pub.appln_id
where first_pub.mindate between '1995' and '2016'
and tls224_appln_cpc.cpc_class_symbol like 'B82Y%'
and appln_kind ='A'
group by year(first_pub.mindate) 
Gives you all the patents:

Code: Select all

SELECT tls201_appln.appln_id, tls201_appln.appln_auth, tls201_appln.appln_nr, tls201_appln.appln_kind,
 publn_auth, publn_nr, publn_kind,publn_date, publn_first_grant,
first_pub.appln_id, first_pub.mindate
FROM tls201_appln join tls224_appln_cpc on tls201_appln.appln_id = tls224_appln_cpc.appln_id
left join tls211_pat_publn on tls201_appln.appln_id = tls211_pat_publn.appln_id and publn_first_grant = 1
left join (select appln_id , min(publn_date) mindate from tls211_pat_publn group by appln_id) as first_pub
 on tls201_appln.appln_id = first_pub.appln_id
where tls224_appln_cpc.cpc_class_symbol like 'B82Y%'
and first_pub.mindate between '1995' and '2016'
and appln_kind = 'A'
group by tls201_appln.appln_auth, tls201_appln.appln_nr, tls201_appln.appln_kind, publn_auth, publn_nr,
publn_kind,publn_date, publn_first_grant, first_pub.appln_id, first_pub.mindate, tls201_appln.appln_id
order by tls201_appln.appln_id
The data based on PATSTAT2016b:
B82Y.zip
sample data
(4.64 MiB) Downloaded 220 times
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

Re: Correctness of applications and publications counting

Post by knikonor » Thu Feb 02, 2017 4:28 pm

Hello Geert!

Thanks so much for your explanations. Seems first example query does what I need to.
Can you please clarify if I am right when using also IPC class query to handle B82Y applications
like:

or tls209_appln_ipc.ipc_class_symbol like 'B82Y%'

does it have any meaning?

Also for the second example of yours I get a lot of NULL values for Patstat 2013A database. Is that ok?

Best regards,
Kirill.


Geert Boedt
Posts: 176
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: Correctness of applications and publications counting

Post by Geert Boedt » Thu Feb 02, 2017 5:49 pm

Hello Kirill,
if you want your final count to include patents that have as well IPC as CPC subclass B82Y, then you need to make a left join with the 2 respective tables. This will give you give you a higher count then only using one of the classification schemes.

Code: Select all

SELECT  year(first_pub.mindate) YYYY , count(distinct (tls201_appln.appln_id)) total,
COUNT(distinct(case when publn_first_grant = 1 then tls211_pat_publn.pat_publn_id end)) as 'granted'
FROM tls201_appln left join tls224_appln_cpc on tls201_appln.appln_id = tls224_appln_cpc.appln_id 
left join tls209_appln_ipc on tls201_appln.appln_id = tls209_appln_ipc.appln_id
left join tls211_pat_publn on tls201_appln.appln_id = tls211_pat_publn.appln_id and publn_first_grant = 1
left join (select appln_id , min(publn_date) mindate from tls211_pat_publn group by appln_id) as first_pub
 on tls201_appln.appln_id = first_pub.appln_id
where first_pub.mindate between '1995' and '2016'
and (tls224_appln_cpc.cpc_class_symbol like 'B82Y%'  or tls209_appln_ipc.ipc_class_symbol like 'B82Y%')
and appln_kind ='A'
group by year(first_pub.mindate)
Here is a link to the file with the data (valid until 02/03/2017):
http://webserv.epo.org/download.nsf/0/2 ... enDocument
The Null values are patents that have not been granted. That SQL query allows me to count granted and non granted in one query, because PATSTAT Online does not allow to store temp or intermediate tables.
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

Re: Correctness of applications and publications counting

Post by knikonor » Fri Feb 03, 2017 5:07 pm

Hi Geert,

thanks a lot again. I have tried your query e.g. SQL1 (from table) and got different results for
my 2013 April database. Is that correct and not all data was included in this version?
Is 2016 version more rich with data?

E.g. your data:

2005 11230 6351
2006 11346 6169
2007 11811 6451
2008 12021 6509
2009 11945 6668
2010 12329 7195

My data:

2005 10725 5860
2006 10891 5549
2007 11124 5328
2008 11250 4692
2009 10869 4125
2010 11242 3654

Maybe this is because not all patents were published at 2013 DB version time? But total should be similar...

Best regards,
Kirill.


Geert Boedt
Posts: 176
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: Correctness of applications and publications counting

Post by Geert Boedt » Mon Feb 06, 2017 3:10 pm

Hello Kirill,
it is of course possible that some publications were not yet available for the PATSTAT2013 April production when the back file was extracted.
However, a more plausible reason is the re-classification of patents, and with regard to CPC, the fact that CPC codes are mostly available at a moment later then the date of publication (CPC codes are not on the patent publication). So adding the CPC B82Y codes to your query, increases the numbers in the count; reclassifications within IPC and CPC as well.
Attached is an excel file that shows a sample of applications that had a B82Y CPC-classification code in PATSTAT2016B, but not in PATSTAT2013A.
b82y_2013_2016.zip
(967.41 KiB) Downloaded 227 times
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

Re: Correctness of applications and publications counting

Post by knikonor » Sat Feb 11, 2017 5:56 pm

Hello Geert!

I have analyzed my results with the great help of your scripts and could you please clarify the following facts:

1. I see that mindate here goes to first publication date but not prority date as it goes from Espacenet? E.g. I have the following patent:

appln_id appln_auth appln_nr appln_kind publn_auth publn_nr publn_kind publn_date publn_first_grant appln_id mindate
62 EP 04090068 A EP 1457191 B1 2009-02-25 1 62 2004-09-15

from here priority date is 2004-09-15 but from Espacenet I have 2003-03-13.

So how is it correct to find out exact priority date (s)?

2. Also I have statistical data that proves that total amount of filed applications and granted patents is about 100% to 70%. I mean that about 70% of applications should be granted in some period of time. Till year 2001-2002 this proportion seems to be correct but starting from 2003 it changes to 50%-50%. Here is my data from Patstat 2013A:

2003 9222 5359
2004 10057 5821
2005 10725 5860
2006 10891 5549
2007 11124 5328
2008 11250 4692
2009 10869 4125
2010 11242 3654
2011 11919 2074
2012 12480 834

Is that correct for this version of DB? Or maybe I have a mistake somewhere?

Best regards and many thanks beforehand.
Kirill.


knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

Re: Correctness of applications and publications counting

Post by knikonor » Sat Feb 11, 2017 6:46 pm

Maybe I was not correct with the 1-st question. What I really need to count is not the date of first publication but priority date as in Espacenet for every application. And then I need to count them per year.

Is that possible?

Best regards,
Kirill.


Post Reply