DOCDB Family fractional count by country of inventors

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

Valentin
Posts: 3
Joined: Fri Jun 18, 2021 11:52 am

DOCDB Family fractional count by country of inventors

Post by Valentin » Fri Jun 18, 2021 12:25 pm

Hello everyone,

I would like some help and advices with a query aiming to execute a fractional count of DOCDB families from CPC Y02B by country of inventors (here France) on Patstat Online.

Here what I'm doing (kind of resourcefulness but it works fine) :

Code: Select all

SELECT DISTINCT T2.earliest_filing_year, SUM(T2.CASTFRACOUNT)
FROM
(SELECT T.earliest_filing_year, T.docdb_family_id, COUNT(T.REPLACED) AS COUNTALL, SUM(T.SIFR) AS COUNTFR, (CAST(SUM(T.SIFR) AS DECIMAL(3,1)) / CAST(COUNT(T.REPLACED) AS DECIMAL(3,1))) AS CASTFRACOUNT
    FROM
        (SELECT DISTINCT table1.earliest_filing_year, REPLACE(table4.person_ctry_code, '  ','NULL') AS REPLACED, table2.docdb_family_id,
        CASE 
        WHEN REPLACE(table4.person_ctry_code, '  ','NULL')='FR' THEN 1
        ELSE 0
        END SIFR
        FROM tls201_appln table1
        INNER JOIN tls225_docdb_fam_cpc table2 ON table1.docdb_family_id=table2.docdb_family_id
        INNER JOIN tls207_pers_appln table3 ON table1.appln_id=table3.appln_id
        INNER JOIN tls206_person table4 ON table4.person_id=table3.person_id
        AND table2.cpc_class_symbol LIKE 'Y02B%'
        AND table1.earliest_filing_year >=1970
        AND table1.earliest_filing_year<=2018
        AND ipr_type='PI'
        AND granted='Y'
        AND REPLACE(table4.person_ctry_code, '  ','NULL') LIKE '__'
        GROUP BY earliest_filing_year,REPLACE(table4.person_ctry_code, '  ','NULL'), table2.docdb_family_id) T 
GROUP BY T.earliest_filing_year, T.docdb_family_id) T2
GROUP BY T2.earliest_filing_year
ORDER BY T2.earliest_filing_year
1) I know something is approximative in the querry as I just delete the lines docdb_family_id with no results in person_ctry_code. Thereby, I miss some families that could be associates with a french inventor and I miss also an exact fractional count of families with a french among inventors and with an indeterminated inventors. Are these no results in person_ctry_code due to a lack of informations about the only, second, third or... inventors ? If so, how to remediate to this lack of information in my case ? I know the paper of (de Rassenfosse, 2012).

2) Is there any other mistakes that I could miss ? SQL always give you a result and we can easily don't see a mistake.

3) I know the earliest_filing_date is quite unconventional but I would like the date as near as possible from the date of invention.


Thanks a lot for your help,
Valentin


EPO / PATSTAT Support
Posts: 256
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: DOCDB Family fractional count by country of inventors

Post by EPO / PATSTAT Support » Mon Jun 21, 2021 1:03 pm

Hello Valentin,
looking at your query; it is a bit difficult to understand exactly what you would like to do with regards to the fractional counting. Can you explain in a couple of sentences, what you would like to do and how you expect the results to look like ?

If your methodology is to be used on patents having a French inventor in the 'Y02B%' area, it might be easier to simply extract that data set and work of-line.
That gives you more freedom to manipulate the data using whatever tools you prefer.
Using a query like below will give you about 10.000 patent applications with French inventors which is within the download limits for an extraction download.

Code: Select all

SELECT distinct  tls201_appln.appln_id
FROM   tls201_appln  join tls207_pers_appln on tls201_appln.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
join tls224_appln_cpc on tls201_appln.appln_id = tls224_appln_cpc.appln_id
WHERE
invt_seq_nr > 0
and person_ctry_code = 'FR'
AND appln_filing_year > 1970
and cpc_class_symbol LIKE 'Y02B%'
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Valentin
Posts: 3
Joined: Fri Jun 18, 2021 11:52 am

Re: DOCDB Family fractional count by country of inventors

Post by Valentin » Mon Jun 21, 2021 2:19 pm

Hello,

Thank you for your reply.

Sorry, I could have been more specific. For my thesis, I am working on the sources of innovation in the building sector to fight against climate change. My objective is to do some semantic analysis and topics modeling on patents in order to propose intrinsic technological classes. But first, I would like to do a descriptive analysis on the data and here I would like to study the cross-countries flow of knowledge. To do this, I focus on families rather than the applications themselves to avoid double counting inventions. Then, in order to obtain the origin of inventions, I focus on the countries of the inventors rather than on the application authorities (OECD, 2009). For these families with several inventors, I want to perform a fractional count (normalize 1 FR and 2 US = 1/3 FR and 2/3 US) to obtain an exact count by country.

The query of my first post gives a first approximation of this fractional number. As I am really new to Patstat Online and SQL language, my query is not very clean and some inventors are missing. I would like to know if a query, simpler perhaps, existed to carry out such a count. And I would also like to know if there is a way to retrieve the information concerning these missing inventors.

Thank you again,
Valentin


EPO / PATSTAT Support
Posts: 256
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: DOCDB Family fractional count by country of inventors

Post by EPO / PATSTAT Support » Tue Jun 22, 2021 12:25 pm

Hello Valentin,
it will not be easy; and I would say that this is not possible to do in 1 query- 1 go.
You will need to extract the data sets.

In the case of patents that have (at least) 1 FR inventor, there are 8582 patent applications.
They are grouped in 3359 docdb patent families. So for each one of these is families you will need to calculate the fractions.

Patent families can have different levels of completeness on inventor countries, they can have different filing dates and different earliest filing dates. So you will need to build a model that covers it all (or nearly all).

I would do the following (not saying that there might not be better ways):
from the 3359 docdb patent families families, 1059 have only 1 family member --> so no need to look further
further 1731 families have an EP patent in the family --> good source for inventor country codes without extra work.
further 490 families only have 1 inventor --> easy to calculate fraction (is 1...)
and from the remainder 593 patent families, 459 families only have FR inventors --> fraction is 1
leaves 134 families not covered (check/calculate manually ?)

This the query I used to analyse the data:

Code: Select all

SELECT 
distinct  tls201_appln.appln_id,appln_auth, nb_inventors, tls201_appln.docdb_family_id,  
 docdb_family_size ,STRING_AGG(CAST(person_ctry_code AS NVARCHAR(MAX)), ',')  inventors_country
--COUNT(distinct(  tls201_appln.docdb_family_id ))

FROM   tls201_appln  join tls207_pers_appln on tls201_appln.appln_id = tls207_pers_appln.appln_id
join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
join tls224_appln_cpc on tls201_appln.appln_id = tls224_appln_cpc.appln_id
WHERE
invt_seq_nr > 0
and tls201_appln.appln_id
in (select appln_id from tls207_pers_appln join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id where person_ctry_code = 'FR' and invt_seq_nr > 0)
AND appln_filing_year > 1970
and cpc_class_symbol LIKE 'Y02B%'
and tls201_appln.docdb_family_id in (select docdb_family_id from tls201_appln where docdb_family_size > 1)
and tls201_appln.docdb_family_id  not in (select docdb_family_id from tls201_appln where appln_auth = 'EP')
and tls201_appln.docdb_family_id not in (select docdb_family_id from tls201_appln where nb_inventors = 1)

group by tls201_appln.appln_id, docdb_family_id, appln_auth, nb_inventors,docdb_family_size
having  STRING_AGG(CAST(person_ctry_code AS NVARCHAR(MAX)), ',')  like  '%[^FR,]%' --only French
order by docdb_family_id
Any "replenishment" you whish to do will need to be done before, but I am doubtful whether it would bring much extra for families having a FR inventor. (this would be different for CN).
The rest is for you.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Valentin
Posts: 3
Joined: Fri Jun 18, 2021 11:52 am

Re: DOCDB Family fractional count by country of inventors

Post by Valentin » Wed Jun 23, 2021 9:24 am

Hello,

Thank you very much for the reply, really helpfull !


Best regards,
Valentin


Post Reply