Best way to count DOCDB_FAMILY_ID

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

mboch
Posts: 3
Joined: Wed Mar 13, 2019 11:10 am

Best way to count DOCDB_FAMILY_ID

Post by mboch » Wed May 15, 2019 2:24 pm

Hi,
I want to count all patent families from German inventors, specific CPC classes and years grouped in months.

Code: Select all

select 
DATEPART(month, earliest_publn_date) as month,
year(earliest_publn_date) as year, 
count(distinct a.appln_id) as NumberOfPatentFamilies 
from tls201_appln a
where 
a.appln_id in (
	select
	min(a2.appln_id)  -- Only first application of family counts
	from tls201_appln a2
	join tls207_pers_appln pa2 on a2.appln_id = pa2.appln_id
	join tls206_person p2 on pa2.person_id = p2.person_id
    	join tls224_appln_cpc cpc on a2.appln_id = cpc.appln_id
	where 
		year(earliest_publn_date) between 1999 and 2018	   
		and invt_seq_nr > 0 -- Inventors
    	and 
	(
      		cpc_class_symbol like 'Y02C  10/%' or
      		cpc_class_symbol like 'Y02C  20/%' or
      		cpc_class_symbol like 'Y02C  30/%'
	) 
		and person_ctry_code = 'DE'    
	group by docdb_family_id)
group by DATEPART(month, earliest_publn_date), year(earliest_publn_date)
order by year(earliest_publn_date),  DATEPART(month, earliest_publn_date);
Is this a correct approach and is there an easier way to count family ids?
BR
Mike


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Best way to count DOCDB_FAMILY_ID

Post by EPO / PATSTAT Support » Thu May 16, 2019 10:27 am

Hello Mike,
there are a couple of flaws in the approach: the min(a2.appln_id) does not guarantee that you have the first application of the family, you are doing an aggregation using publications and if I understood correctly, you want to have a "family - level" final table.
The main challenge is to aggregate data which is stored at application and publication level to family level. Keeping in mind that 1 family can have many members (with different application dates) and each application different publication (dates), it becomes difficult to keep track of what conditions need to be applied to get good data to create your final table. So I changed it a bit around and made a start table that gives for each family a sorted list of all family members and their application date. As we are only interested in the earliest application, I need to have a way to identify the earliest one. To do this, I used ROW NUMBER over PARTITION which I can use as a selection criteria.
Via the WHERE clause I made the conditions a bit more general to : each family has at least 1 German inventor, each family has at least 1 application that has a 'Y02C 10' , 'Y02C 20' or 'Y02C 30' classification.
This was done to avoid missing out on applications because a country code or classification is missing in the earliest application of the family. If you agree with that, this is the query to make that table:

Code: Select all

SELECT a.docdb_family_id, a.appln_id, a.appln_filing_date
	 ,ROW_NUMBER() OVER(partition by a.docdb_family_id ORDER BY a.docdb_family_id ASC,
		a.appln_filing_date,a.appln_id )  AS Row
FROM tls201_appln a
where
docdb_family_id in (select distinct  docdb_family_id from tls201_appln join tls207_pers_appln 
						on tls201_appln.appln_id = tls207_pers_appln.appln_id
						join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id 
						where invt_seq_nr > 0 and person_ctry_code = 'DE')
--at least 1 German inventor in one of the family members

AND docdb_family_id in (select distinct  docdb_family_id from tls201_appln join tls224_appln_cpc 
						on tls201_appln.appln_id = tls224_appln_cpc.appln_id
				 where left(cpc_class_symbol, 8) in ('Y02C  10','Y02C  20','Y02C  30'))							
--at least 1 application having one of the CPC codes
group by  a.docdb_family_id
     ,a.appln_id
     ,a.appln_filing_date
order by  docdb_family_id asc, row asc
If one would be working on a local PATSTAT installation, one could store the above result in a temporary table or view to do the next step. This is not possible in PATSTAT Online, so one needs to use the above query as a sub query to create the final table. (Or you could export it into EXCEL and take it from there.) Using the same query as above in sub query to do the counts, no forgetting to add criteria to only take the first "ROW" and limiting via the application filing year:

Code: Select all

Select
FORMAT(temp.appln_filing_date,'yyyy-MM') yyyy_mm,
--DATEPART(month, temp.appln_filing_date) as month,
--year(temp.appln_filing_date) as year, 
count(distinct temp.docdb_family_id)as NumberOfPatentFamilies 
from 
(SELECT a.docdb_family_id, a.appln_id, a.appln_filing_date
	 ,ROW_NUMBER() OVER(partition by a.docdb_family_id ORDER BY a.docdb_family_id ASC,
		a.appln_filing_date,a.appln_id )  AS Row
FROM tls201_appln a
where
docdb_family_id in (select distinct  docdb_family_id from tls201_appln join tls207_pers_appln 
						on tls201_appln.appln_id = tls207_pers_appln.appln_id
						join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id 
						where invt_seq_nr > 0 and person_ctry_code = 'DE')
--at least 1 German inventor in one of the family members

AND docdb_family_id in (select distinct  docdb_family_id from tls201_appln join tls224_appln_cpc 
						on tls201_appln.appln_id = tls224_appln_cpc.appln_id
				 where left(cpc_class_symbol, 8) in ('Y02C  10','Y02C  20','Y02C  30'))							
--at least 1 application having one of the CPC codes
group by  a.docdb_family_id
     ,a.appln_id
     ,a.appln_filing_date) temp
where temp.Row = 1 and year(temp.appln_filing_date) between 1999 and 2018
group by FORMAT(temp.appln_filing_date,'yyyy-MM')
--DATEPART(month, temp.appln_filing_date), year(temp.appln_filing_date)
order by  yyyy_mm
--year, month
Another approach is to simply extract the data into a MS ACCESS where you have the full freedom to create extra intermediate tables (and graphs).

Code: Select all

select distinct appln_id from tls224_appln_cpc
where left(cpc_class_symbol, 8) in ('Y02C  10','Y02C  20','Y02C  30')
Take the option GLOBAL, and then include the DOCDB family members. This will give you the worldwide selection which would allow you to start comparing DE inventor data with other countries.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply