Page 1 of 1

Same PATSAT version & query, different values?

Posted: Wed May 18, 2016 8:37 am
by MiG25
Hi everyone,

A colleague and I would like to know how many patent families were started under certain CPC codes in recent years. In our understanding, patent families contain several applications for the same innovation, so we'd be interested in counting only the first application within each family. Secondary goals include very roughly evaluating the "quality" of patent families through the family size and family citations. We're quite new to this and don't require much more.

We have noticed sometimes very large discrepancies between what I downloaded last year (on the Spring 2015 version) and what he's obtained this year (on both Spring and Autumn 2015 versions).

Last year I carried out a very simple yearly patent family count of three different CPC codes on 2015 Spring version of PATSTAT:

Y02E 10/1 Geothermal energy
Y02E 10/4 Solar thermal energy
Y02C 10 Carbon capture and storage

None of them has had any changes (that we can see here: http://www.cooperativepatentclassificat ... anges.html)

Queries are very simple, like this:

Code: Select all

SELECT DISTINCT t.prior_earliest_year, t.nb_citing_docdb_fam, t.docdb_family_size, t.docdb_family_id
FROM tls201_appln t
JOIN tls224_appln_cpc cpc ON cpc.appln_id = t.appln_id
WHERE cpc.cpc_class_symbol LIKE 'Y02C  10%'
To update it with Autumn 2015 data, we have used earliest_filing_year instead of prior_earliest_year (which seems to have disappeared). After noticing differences, we downloaded Spring 2015 data again to check, and were surprised that the these differences persisted, so it was not just the change in table/code. Nonetheless, the discrepancies are quite varied.

Y02C 10 (CCS) shows minimal discrepancies for both Spring (just 8 patent families below total figure for 2005-2012 as downloaded last year) and Autumn versions (only 2 patent families below total figure but slightly larger differences each year)

Y02E 10/1 (Geothermal) shows far greater discrepancies. Spring 2015 data that I downloaded last year gave me a total patent family count of 1313 for 2005-2012. Spring 2015 data that we downloaded recently gave a total of 2417. Autumn 2015 data was 2401.

Y02E 10/4 (Solar thermal) shows similarly large discrepancies.

Would anyone have any clues as to what could be happening here? Any advice on what table/column to use for the year of earliest filing of an application within a given patent family?

Any help would be much appreciated!! :)

Thank you.

Best regards,

Alfonso

Re: Same PATSAT version & query, different values?

Posted: Fri May 20, 2016 8:56 pm
by mkracker
Hi Alfonso,

I suppose you are using PATSTAT Online, so we are all working on the same data. I tried to reproduce your example of Y02E 10/1 Geothermal energy, because there you gave specific numbers:

I used this query for the 2015 Autumn Edition:

Code: Select all

SELECT DISTINCT  t.earliest_filing_year, t.nb_citing_docdb_fam,t.docdb_family_id
FROM tls201_appln t
JOIN tls224_appln_cpc cpc ON cpc.appln_id = t.appln_id
WHERE cpc.cpc_class_symbol LIKE 'Y02E  10/1%'
and t.earliest_filing_year between 2005 and 2012
Note: In earlier editions the attribute EARLIEST_FILING_YEAR has been named PRIOR_EARLIEST_YEAR. Just the name has changed because it was slightly misleading. The logic has not changed. For details please see the Data Catalog of the respective editions.

Consequently, for the 2015 Spring Edition I used this query - just with the differently names attribute:

Code: Select all

SELECT DISTINCT  t.prior_earliest_year, t.nb_citing_docdb_fam,t.docdb_family_id
FROM tls201_appln t
JOIN tls224_appln_cpc cpc ON cpc.appln_id = t.appln_id
WHERE cpc.cpc_class_symbol LIKE 'Y02E  10/1%'
and t.prior_earliest_year between 2005 and 2012
Let´s compare the results:

2015 Autumn version
2401 rows with your query
2401 rows with my query - so no difference

2015 Spring Version, executed recently
2417 rows your query
2399 rows my query.
The numbers are very similar to the 2015 Autum version. Between editions there will always be differences, because data is alive: corrections and re-classifications take place, families change due to new or corrected applications, ...
Surprisingly your query and my query did not return the same result. So obviously you use a different query, which makes comparisons difficult.

2015 Spring Edition, executed last year:
You retrieved only 1313 rows, but of course I cannot reproduce that now. But I can confirm that we did not update the PATSTAT Online data 2015 Spring after it was released. I strongly assume that you (involuntarily) modified your query.

The striking difference is between the result of the query you ran last year and the one you executed now, both in 2015 Spring . To analyse further, you need to look at specific examples which were in one result but not the other and vice versa.

I hope I could help.