Patent family landscape

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

Eduardo Mercadante
Posts: 12
Joined: Thu Sep 17, 2020 12:04 pm

Patent family landscape

Post by Eduardo Mercadante » Tue Mar 16, 2021 3:36 pm

Hello,
I want to find all of the pharmaceutical patent applications (technical field 16) filed between 2005 and 2015 that belong to DOC families that have at least one twin application in each of these seven offices: in seven offices: Argentina, Brazil, EPO, India, Japan, Korea and the US.

I tried to do it with the following code. However, the results include several families that do not have at least one twin in each office. How could I fix that problem?

Code: Select all

SELECT DISTINCT a.appln_id, a.appln_auth, a.appln_filing_year, a.granted, a.docdb_family_id, a.docdb_family_size
FROM tls201_appln a
JOIN tls230_appln_techn_field j ON a.appln_id = j.appln_id
WHERE a.docdb_family_id IN
	(SELECT a.docdb_family_id 
    	FROM tls201_appln a
	JOIN tls230_appln_techn_field j ON a.appln_id = j.appln_id
	WHERE j.techn_field_nr = 16
    	AND a.appln_auth = 'US')
AND a.docdb_family_id IN
	(SELECT a.docdb_family_id 
    	FROM tls201_appln a
	JOIN tls230_appln_techn_field j ON a.appln_id = j.appln_id
	WHERE j.techn_field_nr = 16
    	AND a.appln_auth = 'EP')
AND a.docdb_family_id IN
	(SELECT a.docdb_family_id 
    	FROM tls201_appln a
	JOIN tls230_appln_techn_field j ON a.appln_id = j.appln_id
	WHERE j.techn_field_nr = 16
    	AND a.appln_auth = 'JP')
AND a.docdb_family_id IN
	(SELECT a.docdb_family_id 
    	FROM tls201_appln a
	JOIN tls230_appln_techn_field j ON a.appln_id = j.appln_id
	WHERE j.techn_field_nr = 16
    	AND a.appln_auth = 'KR')
AND a.docdb_family_id IN
	(SELECT a.docdb_family_id 
    	FROM tls201_appln a
	JOIN tls230_appln_techn_field j ON a.appln_id = j.appln_id
	WHERE j.techn_field_nr = 16
    	AND a.appln_auth = 'BR')
AND a.docdb_family_id IN
	(SELECT a.docdb_family_id 
    	FROM tls201_appln a
	JOIN tls230_appln_techn_field j ON a.appln_id = j.appln_id
	WHERE j.techn_field_nr = 16
    	AND a.appln_auth = 'AR')
AND a.docdb_family_id IN
	(SELECT a.docdb_family_id 
    	FROM tls201_appln a
	JOIN tls230_appln_techn_field j ON a.appln_id = j.appln_id
	WHERE j.techn_field_nr = 16
    	AND a.appln_auth = 'IN')
AND j.techn_field_nr = 16
AND a.appln_filing_year BETWEEN 2000 AND 2015
-- added line below to illustrate the data resstriction issue.
order by docdb_family_id
All the best,
Eduardo Mercadante
PhD Student (LSE)


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Patent family landscape

Post by EPO / PATSTAT Support » Wed Mar 17, 2021 2:26 pm

Hello Eduardo,
they do have family members in your list of countries, but you don't see them in your result list because your criteria of "AND a.appln_filing_year BETWEEN 2000 AND 2015" will exclude the family members from the list that have filing dates outside that range. This is very often the case with US applications where "continuations" often have application filing dates much later then the other family members. You can see that clearly when you add an "order by docdb_family_id" to your query, and then use the first family to look up all family members in a seperate query.
So the problem is rather a methodology issue then a syntax issue. You can adapt your query by moving the a.appln_filing_year BETWEEN 2000 AND 2015 in a separate subquery; that will then allow all families that have at least 1 family member in that range. If you find this restriction too wide, then you will need to move the condition to each of the separate subqueries as you did for the patent authorities. (Similar to what you have done for the techn_field_nr = 16 condition.) This will make the query much more restrictive --> les results.
Observation: you absolutely need to check the coverage of the countries you added into your query. We have very little data coming from India, so simply removing India from your restrictions increases the results from about 5000 to 135.000.
See: mapping-data-completeness-of-patstat-gl ... ls231-7984
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Eduardo Mercadante
Posts: 12
Joined: Thu Sep 17, 2020 12:04 pm

Re: Patent family landscape

Post by Eduardo Mercadante » Wed Mar 17, 2021 9:29 pm

Thank you very much for the quick and helpful response.

I am aware of the coverage issue, especially with India. The purpose of including it was exactly to do the comparison you presented.

I was thinking of doing another post on this but I'll ask here. I don't understand exactly what determines this level of coverage. Can I assume that the selection of applications that are indeed covered is random? In that sense, could I accept the result of this query as an unbiased sample?

Eduardo Mercadante
PhD Student (LSE)


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Patent family landscape

Post by EPO / PATSTAT Support » Fri Mar 19, 2021 8:06 pm

Hello Eduardo,
the sample is absolutely not random. An SQL query will give you 100% of the data available in the data base, and based on the criteria in your criteria. The problem is that your criteria is using the very incomplete data on patents filed in India. So that drastically reduces your sample.
That is why it is extremely important to check data coverage and completeness for all data attributes used in your query.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply