Issue with Collecting Patent Data

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.

jay2018
Posts: 10
Joined: Sun Nov 11, 2018 12:52 am

Issue with Collecting Patent Data

Post by jay2018 » Thu Jan 10, 2019 3:54 am

Hi,

I try to use the following code to collect patent in certain classification granted by USPTO between 2005 and 2017.

select a.appln_id, a.appln_filing_date, p.publn_auth, p.publn_nr, p.publn_date, a.nb_inventors,
p.publn_claims,i.ipc_class_symbol, count(distinct (c.pat_publn_id)) as fcitation, f.person_ctry_code, g.psn_sector
from tls201_appln a
join tls209_appln_ipc i on a.appln_id = i.appln_id
join tls211_pat_publn p on p.appln_id = a.appln_id
join tls212_citation c on p.pat_publn_id = c.cited_pat_publn_id
join tls207_pers_appln d on a.appln_id=d.appln_id
join tls226_person_orig f on d.person_id=f.person_id
join tls206_person g on f.person_id=g.person_id
where i.ipc_class_symbol like 'H02M%'
and p.publn_auth = 'US'
and p.publn_first_grant = '1'
and a.granted = '1'
and i.ipc_position = 'F'
AND a.earliest_publn_year BETWEEN 2005 AND 2017
and c.cited_pat_publn_id <> 0
group by a.appln_id, a.appln_filing_date, a.nb_inventors, p.publn_nr, p.publn_date,
p.publn_claims, p.publn_auth , i.ipc_class_symbol, f.person_ctry_code, g.psn_sector
order by p.publn_nr

However, the data I collect show the unique number of granted patents peaks around 2013 and 2014. For a more recent year like 2016 or 2017, the number of granted patents is quite small. I wonder if you can help me with which part of the code is not correct? Thank you.

Best regards,
Wei


EPO / PATSTAT Support
Posts: 87
Joined: Thu Feb 22, 2007 5:33 pm

Re: Issue with Collecting Patent Data

Post by EPO / PATSTAT Support » Thu Jan 10, 2019 5:21 pm

Hello Wei,
I see 2 main reasons that could clarify you observation:
patents that have a recent earliest_publn_year (limited to BETWEEN 2005 AND 2017) have had less opportunity to be cited then older patents.
If a patent has its earliest_publn_year = 2016 (or 2017), it will probably be cited in patents filed later then those dates (novelty destroying), and will also probably not be published yet and therefore will not be available yet in PATSTAT neither.
Geert Boedt
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


jay2018
Posts: 10
Joined: Sun Nov 11, 2018 12:52 am

Re: Issue with Collecting Patent Data

Post by jay2018 » Thu Jan 10, 2019 6:37 pm

Hi Geert,

Thank you for your reply. If I understand correctly, the main problem is I limit to patents with the number of citations greater than 0. If I delete the line "and c.cited_pat_publn_id <> 0", I should have no problems with getting all patents issued between 2005 and 2017. Is that correct? Thank you.

Best regards,
Wei


EPO / PATSTAT Support
Posts: 87
Joined: Thu Feb 22, 2007 5:33 pm

Re: Issue with Collecting Patent Data

Post by EPO / PATSTAT Support » Fri Jan 11, 2019 10:11 am

Hello Wei,
this is not really a "problem", that can be solved.
Patents that have been published in 2017 will most probably not have been cited yet. And if they have been cited, the publication will probably not be available to the public. (18 monhts between filing and publication.)
So there is simply no data and therefore no records.
If you would run the same query on a PATSTAT release from 2015b, you would see the same picture for the publication years 2013 and 2014
Geert Boedt
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


jay2018
Posts: 10
Joined: Sun Nov 11, 2018 12:52 am

Re: Issue with Collecting Patent Data

Post by jay2018 » Fri Jan 11, 2019 6:55 pm

Hi Geert,

Thank you for the reply. However, I am getting a little confused about the data availability. As you mentioned, some patents granted by USPTO at 2017 are simply unavailable. But I have checked the USPTO website, and it has more observations than I get here. I am just curious what is the reason to result in the unavailability of data?
To clarify, my target is just to collect all patents granted between 2005 and 2017, and for patents both being cited and not being cited. So, I wonder if you can explain in more details about why a patent has been cited or not can affect whether there is a record. Thank you.

Best regards,
Wei


EPO / PATSTAT Support
Posts: 87
Joined: Thu Feb 22, 2007 5:33 pm

Re: Issue with Collecting Patent Data

Post by EPO / PATSTAT Support » Mon Jan 14, 2019 2:48 pm

Hello Wei,
kindly provide some specific examples of granted US patents (2017) that are not available I PATSTAT.

The fact whether a patent was cited or not does not affect it to be included in PATSTAT or not.
But a patent can not be cited If it is not in the public domain yet - meaning being published-. (exceptions are applicants citing their own applications that have not been published yet)
When the EPO receives data that a certain publication/application has been cited without that publication//application being in our database, then the EPO will create a so called "dummy" publication and application in order to guarantee the integrity of the data base. These applications will have appln_id's > 900000000 .
Kindly check the PATSTAT data catalog for more information on this topic.
Geert BOEDT
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


jay2018
Posts: 10
Joined: Sun Nov 11, 2018 12:52 am

Re: Issue with Collecting Patent Data

Post by jay2018 » Wed Jan 16, 2019 6:18 am

Hi Geert,

For example, granted patent #9848623. The publication date is 12/26/2017. But using the code I listed in the first post (replace the ipc code with C12N), this patent is not in the dataset I get. I wonder if you can help me figure out why some patents are actually granted between the sample period I select but not available here. Thank you.

Best regards,
Wei


EPO / PATSTAT Support
Posts: 87
Joined: Thu Feb 22, 2007 5:33 pm

Re: Issue with Collecting Patent Data

Post by EPO / PATSTAT Support » Wed Jan 16, 2019 2:02 pm

Hello Wei,
the US application (9848623 - pub 2017-12-26) is available in PATSTAT.
Here is the SQL query to retrieve it.

Code: Select all

SELECT tls201_appln.appln_id
      , appln_auth 
      , appln_nr 
      , appln_kind 
      , appln_filing_date 
      , appln_nr_epodoc 
      , earliest_filing_date 
      , earliest_publn_date 
      , granted 
   , tls211_pat_publn.publn_nr_original
   ,publn_date
   ,publn_first_grant
   , (case when tls211_pat_publn.pat_publn_id in (select cited_pat_publn_id from tls212_citation) then 'Y' else 'N' end) as "cited?"
  FROM tls201_appln 
  join tls211_pat_publn on tls201_appln.appln_id = tls211_pat_publn.appln_id
  where appln_nr_epodoc =  'US201213342623' --this is the application number
  order by tls201_appln.appln_id
In the above query you can see that I added a line to show whether the publications belonging to that application were cited. This is not the case.

In fact, when I run your query, I do not get any results at all. Looking a bit deeper into it, your conditions in the WHERE clause : and p.publn_first_grant = '1' forces the query to look at citations of the publication of the grant. Examiners will normally not cite this publication, but rather the first publication.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


jay2018
Posts: 10
Joined: Sun Nov 11, 2018 12:52 am

Re: Issue with Collecting Patent Data

Post by jay2018 » Wed Jan 16, 2019 7:22 pm

Hi,

Thank you for the clarification. Here is the updated query I use:

select a.appln_id, a.appln_filing_date, p.publn_auth, p.publn_nr, p.publn_date, a.nb_inventors,
p.publn_claims,i.ipc_class_symbol, count(c.cited_pat_publn_id) as bcitation, f.person_ctry_code, g.psn_sector
from tls201_appln a
join tls209_appln_ipc i on a.appln_id = i.appln_id
join tls211_pat_publn p on i.appln_id = p.appln_id
join tls212_citation c on p.pat_publn_id = c.cited_pat_publn_id
join tls207_pers_appln d on a.appln_id=d.appln_id
join tls226_person_orig f on d.person_id=f.person_id
join tls206_person g on f.person_id=g.person_id
where i.ipc_class_symbol like 'C12N%'
and p.publn_auth = 'US'
and a.granted = 'Y'
and i.ipc_position = 'F'
AND a.earliest_publn_year BETWEEN 2000 AND 2017
group by a.appln_id, a.appln_filing_date, a.nb_inventors, p.publn_nr, p.publn_date,
p.publn_claims, p.publn_auth , i.ipc_class_symbol, f.person_ctry_code, g.psn_sector
order by p.publn_nr

I get a larger sample now after removing the filter "and p.publn_first_grant = '1'". But still, there is a huge decline in the number of the granted patent in 2015, 2016, and 2017. The number of the unique patent granted in 2013 is 1617, in 2014 is 1493, in 2015 is 1082, in 2016 is 589, in 2017 is 176.
I still don't understand why the number for the recent 3 years is so small. I suppose it should be at least as similar as 2013 or 2014. I don't think whether a patent is cited explains the decline, because I don't add any condition on whether a patent is cited or not, so it should return any patent granted by USPTO between 2000 and 2017, with first IPC class as C12N.

Best regards,
Wei


EPO / PATSTAT Support
Posts: 87
Joined: Thu Feb 22, 2007 5:33 pm

Re: Issue with Collecting Patent Data

Post by EPO / PATSTAT Support » Thu Jan 17, 2019 5:58 pm

Hello Wei,
the fact that you JOIN your data with the tls212_citation table implies that only cited publications ar retained. Many of them have not been cited, so they will not be included in your data sample.
This is the nature of relational data bases. You can avoid this by using a LEFT JOIN, which will then include all the applications from the first table, independently whether there is a citation.

The same goes for the IPC table. Joining tls201 with tls209 will result in a data set of application that have an IPC classification. Applications without an IPC are excluded. You then further narrow it down with the condition C12N... which is ok.

If you want to have a closer look at the number of applications, remove all tables and joins which you do not need to narrow down the sample.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply