PATSTAT vs. WIPO

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

Thomas Wilken
Posts: 3
Joined: Mon Sep 30, 2019 2:01 pm

PATSTAT vs. WIPO

Post by Thomas Wilken » Mon Sep 30, 2019 2:08 pm

Dear PATSTAT'ers

Can anyone refer to a good website, report etc. that describes the exact difference between the data in PATSTAT and the data used in WIPO's IP Statistics Data Center?

The WIPO IP Statistics Data Center can be found here: https://www3.wipo.int/ipstats/index.htm?tab=patent

I have already bought access to the PATSTAT database. But I am not sure which database will be the best to show the aggregated amount of patent applications, patent families, etc. in fx Denmark.


Best,
Thomas


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: PATSTAT vs. WIPO

Post by EPO / PATSTAT Support » Tue Oct 01, 2019 11:44 am

Hello Thomas,
One of the data sources used in WIPO's IP Statistics Data Center is PATSTAT.
This is documented at this link: https://www.wipo.int/ipstats/en/help/
So there is for sure an overlap on the data; we also know that addresses for PCT applications in PATSTAT are often missing, so I assume that WIPO's PCT data is more complete then the PCT data in PATSTAT. WIPO also states that the data of PCT FILINGS is up to date to about 3 months. I would conclude that WIPO publishes statistics on un-published applications which we don't have available in PATSTAT. (EPO only publishes aggregated information on un-published applications via the annual reports and facts and figures.)
This probably also explains why no graphs on the WIPO tool show applicant names. (But kindly contact WIPO if you need more background information on their platform.)

The advantage of PATSTAT is that you can drill down as deep as you want in the data. For each single application you can examine the very details if your research requires it. As fare as I can see (10 minutes playing around with it), this is not possible on WIPO's IP Statistics Data Center.
A disadvantage on working with PATSTAT is that you need to know SQL, you need to understand the data coverage and possible pitfalls and you need to know the basics of the patent granting procedure for each authority that is included in your data.

So my conclusion would be: ... it depends on your use case....

WIPO's IP Statistics Data Center is fine if you are happy with some quick statistics and graphs, without the need of having to provide details or the risk of being questioned or challenged. You are an undergraduate and need some data to back up your research coming from an established source... fine.
But if you are doing an in-depth patent analysis; you might be challenged and you will have to defend your results, possibly even document possible risks in your analysis : PATSTAT would be my preferred data set.

But for straightforward un-complex data sets, I assume there should not be day and night differences in the results between the tools.

I just was pointed to your tweet on the same topic. Can you share the SQL query you used in PATSTAT through this post ?
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Thomas Wilken
Posts: 3
Joined: Mon Sep 30, 2019 2:01 pm

Re: PATSTAT vs. WIPO

Post by Thomas Wilken » Thu Oct 03, 2019 2:40 pm

Yes!

I have used the "CODE1" and afterwards I have removed duplicates on the INPADOC family to isolate the amount if unique danish patent families each year.

I have tried with another code, "CODE2", and here I find another amount of unique patent families than in "CODE1".

So basically my question has to be, how would you isolate the amount of unique danish patent families in a given year from 1990 to 2016?


***CODE1***

Code: Select all

SELECT DISTINCT
[tls206_person].[person_ctry_code],
[tls206_person].[person_name],
[tls206_person].[han_name],
[tls206_person].[person_address],
[tls201_appln].[granted],
[tls201_appln].[inpadoc_family_id],
[tls211_pat_publn].[publn_auth],
[tls211_pat_publn].[publn_nr],
[tls211_pat_publn].[publn_kind],
[tls211_pat_publn].[publn_date],
[tls209_appln_ipc].[appln_id],
[tls201_appln].[appln_filing_year],
[tls201_appln].[appln_filing_date]       
                  
FROM [tls206_person],
[tls207_pers_appln],
[tls211_pat_publn],
[tls209_appln_ipc],
[tls201_appln],
[tls224_appln_cpc]     
WHERE [tls206_person].[person_id] = [tls207_pers_appln].[person_id]
AND   [tls207_pers_appln].[appln_id] = [tls211_pat_publn].[appln_id]
AND   [tls211_pat_publn].[appln_id] = [tls209_appln_ipc].[appln_id]
AND    [tls201_appln].[appln_id]=[tls211_pat_publn].[appln_id]
AND    [tls201_appln].[appln_id]=[tls224_appln_cpc].[appln_id]
AND    [tls206_person].[person_ctry_code] ='DK'

AND   [tls207_pers_appln].[invt_seq_nr] = '0'
AND   [tls211_pat_publn].[publn_date] between '1990-01-01' and '2016-12-31'

***CODE2***

Code: Select all

SELECT DISTINCT
[tls206_person].[person_ctry_code],
[tls201_appln].[inpadoc_family_id],
[tls201_appln].[appln_id],
[tls211_pat_publn].[publn_date],
[tls201_appln].[appln_filing_year]       
                  
FROM 
[tls206_person],
[tls207_pers_appln],
[tls211_pat_publn],
[tls201_appln] 
      
WHERE  
[tls201_appln].[appln_id] = [tls207_pers_appln].[appln_id]
and [tls206_person].[person_id] = [tls207_pers_appln].[person_id]
and [tls201_appln].[appln_id]=[tls211_pat_publn].[appln_id]
and [tls206_person].[person_ctry_code] ='DK'
and [tls211_pat_publn].[publn_date] between '1990-01-01' and '2016-12-31'
Best,
Thomas


Thomas Wilken
Posts: 3
Joined: Mon Sep 30, 2019 2:01 pm

Re: PATSTAT vs. WIPO

Post by Thomas Wilken » Thu Oct 03, 2019 3:13 pm

Hey again

My main question is, how will you find the amount of unique danish patent families in a given year from 1990 to 2016?

Below I have posted "CODE1" and "CODE2".

I have originally used "CODE1" and afterwards removed duplicates from the INPADOC patent families to obtain the unique patent families.

But, when I use "CODE2" the amount of unique patent families increases compared to "CODE1". And I am not sure why.

So it would be nice with some help to obtain the number of unique danish INPADOC families in a given year.

***CODE1***
SELECT DISTINCT
[tls206_person].[person_ctry_code],
[tls206_person].[person_name],
[tls206_person].[han_name],
[tls206_person].[person_address],
[tls201_appln].[granted],
[tls201_appln].[inpadoc_family_id],
[tls211_pat_publn].[publn_auth],
[tls211_pat_publn].[publn_nr],
[tls211_pat_publn].[publn_kind],
[tls211_pat_publn].[publn_date],
[tls209_appln_ipc].[appln_id],
[tls201_appln].[appln_filing_year],
[tls201_appln].[appln_filing_date]

FROM [tls206_person],
[tls207_pers_appln],
[tls211_pat_publn],
[tls209_appln_ipc],
[tls201_appln],
[tls224_appln_cpc]


WHERE [tls206_person].[person_id] = [tls207_pers_appln].[person_id]
AND [tls207_pers_appln].[appln_id] = [tls211_pat_publn].[appln_id]
AND [tls211_pat_publn].[appln_id] = [tls209_appln_ipc].[appln_id]
AND [tls201_appln].[appln_id]=[tls211_pat_publn].[appln_id]
AND [tls201_appln].[appln_id]=[tls224_appln_cpc].[appln_id]
AND [tls206_person].[person_ctry_code] ='DK'

AND [tls207_pers_appln].[invt_seq_nr] = '0'
AND [tls211_pat_publn].[publn_date] between '1990-01-01' and '2016-12-31'


***CODE2***
SELECT DISTINCT
[tls206_person].[person_ctry_code],
[tls201_appln].[inpadoc_family_id],
[tls201_appln].[appln_id],
[tls211_pat_publn].[publn_date],
[tls201_appln].[appln_filing_year]

FROM
[tls206_person],
[tls207_pers_appln],
[tls211_pat_publn],
[tls201_appln]

WHERE
[tls201_appln].[appln_id] = [tls207_pers_appln].[appln_id]
and [tls206_person].[person_id] = [tls207_pers_appln].[person_id]
and [tls201_appln].[appln_id]=[tls211_pat_publn].[appln_id]
and [tls206_person].[person_ctry_code] ='DK'
and [tls211_pat_publn].[publn_date] between '1990-01-01' and '2016-12-31'


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: PATSTAT vs. WIPO

Post by EPO / PATSTAT Support » Wed Oct 09, 2019 4:04 pm

Hello Thomas,
from a syntax point of view, your queries are running.
From a logical point of view: it does not do do what you want: providing a count per publication year of families that have Danish inventors.
You used the wording "unique danish INPADOC families in a given year"; looking at your sample query you defined it as being "all families that have at least 1 Danish INVENTOR".

The main problem on your specification is on how to assign "a single year" to a family that contains multiple applications filed in different application years and even more publication years.

Here is a query that does the trick:

Code: Select all

SELECT YEAR(cte.publn_date),cte.person_ctry_code,COUNT (DISTINCT(cte.docdb_family_id)) AS DocDB_Patent_Family_Count
FROM (	
      SELECT a.publn_date, c.person_ctry_code, d.docdb_family_id, 
       DENSE_RANK() OVER (PARTITION BY d.docdb_family_id ORDER BY CONCAT(d.docdb_family_id, a.publn_date)) AS row_no
      FROM tls211_pat_publn a
      JOIN tls227_pers_publn b ON a.pat_publn_id=b.pat_publn_id
      JOIN tls206_person c ON b.person_id=c.person_id
      JOIN tls201_appln d ON a.appln_id=d.appln_id
      WHERE YEAR(a.publn_date)>=1990 AND YEAR(a.publn_date)<=2016 AND d.IPR_TYPE='PI' AND b.invt_seq_nr > 0 AND c.person_ctry_code = 'DK'
      ) cte
WHERE cte.row_no=1
GROUP BY YEAR(cte.publn_date), cte.person_ctry_code
ORDER BY YEAR(cte.publn_date), cte.person_ctry_code;
Analyse it, and adapt if necessary. (using applicants instead of inventors ?)
You also had the CPC table in your query, but it was not used,neither in the SELECT nor the WHERE clause of your query, so I removed it.

Geert BOEDT
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply