Accessing Data from PATSTAT Online

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

LucasM
Posts: 5
Joined: Fri Feb 21, 2020 2:33 pm

Accessing Data from PATSTAT Online

Post by LucasM » Fri Feb 21, 2020 2:41 pm

Hi! I am currently working on my MSc thesis but I am having a hard time retrieving the data from PATSTAT online. I have a list of 800,000 patents sorted on appln_id which I would like to get the following information for:

1. Application and grant year
2. Patent class (or some other industry identifier)
3. Number of backward and forward citations on a year by year basis

Furthermore, for 10,000 of the 800,000 patents I am trying to get the following information:

4. Application year and grant year for each of the citing patents
5. Patent class for the cited patent

Are there any good resources which show me how to access this information? I am not very proficient in SQL so that could also be why I'm struggling


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Accessing Data from PATSTAT Online

Post by EPO / PATSTAT Support » Mon Feb 24, 2020 11:15 am

Hello Lucas,
it is impossible to do such complex data aggregations and working with such volumes on PATSTAT Online.
The maximum download list is 700.000 results (with a paying subscription). (100.000 if you want to extract a database subset.)
The first 3 points are more or less covered by a query as this one. (not the backward citations)

Code: Select all

SELECT TOP (700000) tls201_appln. appln_id 
      , appln_auth + appln_nr + appln_kind 
      , appln_filing_date 
      , granted 
	  ,pub_grant.publn_date as grant_date
      , nb_citing_docdb_fam 
	  ,STRING_AGG (ipc.IPC_4, ', ') IPC_group
FROM  tls201_appln 
  left join tls211_pat_publn pub_grant on tls201_appln.appln_id = pub_grant.appln_id and publn_first_grant ='Y'
  left join (select distinct appln_id, left(ipc_class_symbol,4) IPC_4 from  tls209_appln_ipc ) IPC  on tls201_appln.appln_id = IPC.appln_id
group by tls201_appln. appln_id 
      , appln_auth + appln_nr + appln_kind 
      , appln_filing_date 
      , granted 
	  ,pub_grant.publn_date 
      , nb_citing_docdb_fam 
order by tls201_appln.appln_id
Number of forward citations is pre-calculated on a family-family basis according to the data catalog.
Backward citations are not pre-calculated. Backward citation come in many "flavours" and can be linked to various publication instances. They are numbered sequentially according to the citations origin.
This query gives yo the "highest" of the number of for citations given during the "Search and examiniation" or "internationl search report". For other flavours, kindly check the data catalog.

Code: Select all

SELECT TOP (1000) tls201_appln. appln_id 
      , appln_auth + appln_nr + appln_kind 
      , appln_filing_date 
      , granted 
	  ,pub_grant.publn_date as grant_date
      , nb_citing_docdb_fam
	  ,citn_origin
	 ,max(tls212_citation.pat_citn_seq_nr) backward_citations
FROM  tls201_appln 
  left join tls211_pat_publn pub_grant on tls201_appln.appln_id = pub_grant.appln_id and publn_first_grant ='Y'
  left join tls211_pat_publn pub_citing on tls201_appln.appln_id = pub_citing.appln_id
  join tls212_citation on pub_citing.pat_publn_id = tls212_citation.pat_publn_id and citn_origin in ('SEA','ISR')
group by tls201_appln. appln_id 
      , appln_auth + appln_nr + appln_kind 
      , appln_filing_date 
      , granted 
	  ,pub_grant.publn_date 
      , nb_citing_docdb_fam 
	  ,citn_origin
order by tls201_appln.appln_id
Looking at citations per year requires more complex aggregation models that would require the storage of intermediate data sets.
For point 4 and 5; it will be difficult to aggregate such data sets in PATSTAT Online. Is there is some kind of common denominator in those 10.000 patents (a specific set of companies ?), I would create a data base extraction of those 10.000 patents, include the family members and the citing patents (and the IPC / CPC table), and use MS ACCESS or any other tool to do further analysis outside PATSTAT Online.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


LucasM
Posts: 5
Joined: Fri Feb 21, 2020 2:33 pm

Re: Accessing Data from PATSTAT Online

Post by LucasM » Mon Feb 24, 2020 1:17 pm

Thanks for the reply!

I think the code above works but I would be interested to know how to incorporate the following features.

(1) I would like to be able to extrtact the data you just posted using an appln_id, I believe the code would look something like:

where appln_id in (16906140,16906140,16906002)

where only the data for those three particular patents would be extracted?

(2) Regarding the difficulty of extracting the forward citations on a per year basis, would it be possible to extract the foraward citation count up to a certain year on a cumulative basis? For instance, Grant Date + 3 years.

(3) Lastly, if it isn't possible to extract the patent class for each individual citation, would it be possible to extract the number of forward and backward citations on a class by class basis? For instance, Backward citation per IPC section would be: (A) Human necessities: 2 (B) Peforming operation; Transportation: 0 etc. with similar data for the forward citations.

The reason for (3) is because I want to be able to measure generality/originality for the patents to estimate their economic value.


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Accessing Data from PATSTAT Online

Post by EPO / PATSTAT Support » Tue Feb 25, 2020 10:35 am

Hello Lucas,
on 1) yes of course. The query below selects the 2 application id's.

Code: Select all

SELECT appln_id
FROM tls201_appln
WHERE appln_id IN (16906140,16906002)
You can then download a MS ACCESS subset, and the 2 applications will then result in data base containing al the PATSTAT tables for 29 patent applications - being all the cited and citing applications from the 2 original applications.
example_download.zip
(245.47 KiB) Downloaded 167 times
On 2, yes that is possible on a local PATSTAT installation or on a downloaded set as explained above. Keep in mind that the "date of grant" has no importance on forward citations as this date is very much patent granting procedure dependant. The first publication is the date from which a patent can be cited. You can of course only consider granted patents that have been cited. (via the "GRANTED" flag in tls201)
Have a look at this publication for some example code.
Getting started with Patstat.docx
(411.79 KiB) Downloaded 211 times
3) the classification codes for the citations are absolutely included in the subset data extraction. But you can only do this for a starting set of 100.000 applications. But writing a query that lists 100.000 application ID will exceed the maximum length of an SQL query.
A query such as

Code: Select all

SELECT appln_id
FROM tls201_appln
WHERE appln_id IN (16906140,16906002,.....)
-- and 100.000 more numbers
will be too long for PATSTAT Online.

Once you have the subset, you can do any kind of analysis to see what technology is cited or citing, or how they relate to each other. I assume you use the OECD generality/originality index; the OECD publication does provide the methodology and SQL code but we can not support individual researchers adapting it to their needs.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


LucasM
Posts: 5
Joined: Fri Feb 21, 2020 2:33 pm

Re: Accessing Data from PATSTAT Online

Post by LucasM » Tue Feb 25, 2020 8:59 pm

Thanks again for your quick reply!

I was also wondering why the number of forward citations calculated from nb_citing_docdb_fam is defined as the number of distinct DOCDB simple families citing at least one of the publications or applications of the DOCDB simple family of the current application instead of just the number of distinct patents citing the current application, i.e. why does EPO give me forward citation data on the DOCDB simple families rather than the distinct patents?


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Citation analysis: family-family concept

Post by EPO / PATSTAT Support » Wed Feb 26, 2020 11:40 am

Hello Lucas,
the concept of doing citation analysis on families instead of patent applications (or in fact publications) is to avoid under- and over counting. Citations analysis being used as a proxy for "value", reflect the value of "the invention". The protection for "the invention" is represented by a patent family (filings for the same invention but in different countries/patent authorities), so a citation to any of the family members linked to "the invention" should be considered as "adding value" to the invention.

Visually:
citation.jpg
Family A contains a US and WO application for an invention A.
The US-A applications is cited by application EP-X and the WO-A application - being a Japanese language filing- is cited by an application JP-Y.
In a citations count we would say: US-A and WO-A have both being cited 1 time.
In a family-family count we would say that Family A has been cited by 2 families, namely the family from EP-X and the family JP-Y.
Variation: assume now that EP-X and JP-Y would belong to the same family B, then a family - family citations count would result in family-A being cited by family-B.

In PATSTAT, the pre-aggregated citations count is family-family citations count. But all the citations data is available in the tls212 table, so nothing should stop researchers from defining their own (non-family based ?) methodology. You could also for example exclude self-citations, or citations given by the applicant.
More on this topic:
Patent citation indicators _ one size fits all.pdf
(526.05 KiB) Downloaded 181 times
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


LucasM
Posts: 5
Joined: Fri Feb 21, 2020 2:33 pm

Re: Accessing Data from PATSTAT Online

Post by LucasM » Wed Feb 26, 2020 5:13 pm

I see, that makes sense! I think using simple families will be the best solution for my application. However, after running the code I still have two puzzling questions:

(1)
I have tried to extract the data on my patents using the following code:

SELECT tls201_appln. appln_id
, appln_auth + appln_nr + appln_kind
, appln_filing_date
,pub_grant.publn_date as grant_date
, nb_citing_docdb_fam
,max(tls212_citation.pat_citn_seq_nr) backward_citations
FROM tls201_appln
left join tls211_pat_publn pub_grant on tls201_appln.appln_id = pub_grant.appln_id and publn_first_grant ='Y'
left join tls211_pat_publn pub_citing on tls201_appln.appln_id = pub_citing.appln_id
join tls212_citation on pub_citing.pat_publn_id = tls212_citation.pat_publn_id and citn_origin in ('SEA','ISR')
WHERE tls201_appln.appln_id IN (
501933391,
505223791)
group by tls201_appln. appln_id
, appln_auth + appln_nr + appln_kind
, appln_filing_date
,pub_grant.publn_date
, nb_citing_docdb_fam

However, this code yields no results. I assume this is because the appln_id numbers can not be found in tls201_appln? However, when I search for the appln_nr_epodoc number for the two patents (SE20170050339 and CA20173027619) on Espacenet I am able to find both of them. Is there any way I could extract the data above for the patents I can't find in tls201_appln using some other database?

(2)
I find that some of the patents have a number of forward citations even though they do not have been granted, how come that patents become cited even though they are not granted?


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Accessing Data from PATSTAT Online

Post by EPO / PATSTAT Support » Wed Feb 26, 2020 5:45 pm

Hello Lucas,
all existing application id's are available in TLS201_appln.
Your query does not yield any results because SE20170050339 and CA20173027619 are not having any citations. In fact, for Sweden and Canada, the EPO does not have (receive) citation information.
REFI Statistics July_2019.pdf
(33.44 KiB) Downloaded 156 times
So when linking to tls212, there is nothing to link with.
Here is an example with 3 EP applications for which there are backward citations

Code: Select all

SELECT tls201_appln.appln_id
, appln_auth + appln_nr + appln_kind
, appln_filing_date
,pub_grant.publn_date as grant_date
, nb_citing_docdb_fam
,max(tls212_citation.pat_citn_seq_nr) backward_citations
FROM tls201_appln
left join tls211_pat_publn pub_grant on tls201_appln.appln_id = pub_grant.appln_id and publn_first_grant ='Y'
left join tls211_pat_publn pub_citing on tls201_appln.appln_id = pub_citing.appln_id
join tls212_citation on pub_citing.pat_publn_id = tls212_citation.pat_publn_id and citn_origin in ('SEA','ISR')
WHERE tls201_appln.appln_id IN (3,5,6)
group by tls201_appln. appln_id
, appln_auth + appln_nr + appln_kind
, appln_filing_date
,pub_grant.publn_date
, nb_citing_docdb_fam
on (2),whether a patent has been granted or not does not affect it being cited in search reports. Examiner also cite for example internet webpages, scientific publications, product manualsetc...
Genrally spoken, anything that has been published and is available to the public can be considered prior art and can therefore be cited.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


LucasM
Posts: 5
Joined: Fri Feb 21, 2020 2:33 pm

Re: Accessing Data from PATSTAT Online

Post by LucasM » Wed Feb 26, 2020 9:22 pm

Thank you for the prompt reply! Why would a company apply for an SE patent vs. a EP/WO one, is it for instance the case that an SE patent only protects the innovation on the Swedish market whereas an EP protects the innovation on a european market and a WO one on a global market? Also, could a single innovation have an overlap between SE/EP/WO patents?


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Accessing Data from PATSTAT Online

Post by EPO / PATSTAT Support » Thu Feb 27, 2020 9:46 am

Hello Lucas,
your question is not really PATSTAT related and outside the scope of this forum.
This publication gives you a good general introduction on patents. In an nutshell -and very much generalised-:
Companies apply for patents in those countries where they want to have their invention protected. From a business point of view: those countries where there is a market for the invention (or a danger that a local company would make money with the invention by copying the technology).
If a company wants to have its' invention protected, it will need to have a valid patent for each of those countries.
The European Patent provide easier means to apply and obtain a patent for all the EPO member states. Therefore an applicant does not need to go through an application procedure in each country seperately. Once the European patent is granted, the applicant has then the possibility to have the patent "validated" in the EPO member states without having to go through a national patent granting procedure anymore.
This saves costs and resources for the applicant. Applying for a European patent is an option, applicants can equally apply directly to the national patent offices to obtain a patent instead of going through a European patent (which will in most cases be more expensive).
A pattent applied for in Sweden will indeed only give protection in Sweden.
A PCT application is different; PCT applications do not get granted as PCT. It simply provides the applicant an easy means to apply for a patent in PCT member states (and also EP) without the need of having to file the patent in each separate country. Thereafter the applicant will still need to go through the national (or regional/EP) granting procedure.
In principle: an invention can not be protected in 1 country through 2 co-existing patents. So a patent filed and granted in Sweden can not co-exist with a granted EP application that has been validated in Sweden. A granted EP patent is often re-published in the language of the member state where the patent is validated. (they will be part of the same docdb patent family). This might give the impression that there are "co-existing" patents, but they are in fact the same.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply