Avoiding Duplicates Question

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

DBCerigo
Posts: 4
Joined: Wed Jan 20, 2016 11:52 pm

Avoiding Duplicates Question

Post by DBCerigo » Mon Feb 22, 2016 6:12 pm

We aim to get a set of non duplicate granted patents. To do so we use the query:

Code: Select all

SELECT tls211_pat_publn.publn_nr, tls201_appln.appln_auth, tls211_pat_publn.publn_date, tls201_appln.appln_id
FROM tls201_appln
INNER JOIN tls211_pat_publn ON tls211_pat_publn.appln_id = tls201_appln.appln_id
WHERE tls201_appln.appln_id NOT IN 
	(
    SELECT appln_id
    FROM tls204_appln_prior
    )
AND tls211_pat_publn.publn_first_grant = 1
Our worry is that we are missing out many granted patents by filtering in this way. Example: patent(/application) A which was never granted, but patent B which used A as a priority application was granted. B would not be included in the query results; correct?

Suggestions for the best way to gain a set of non duplicate granted patents are appreciated.


Geert Boedt
Posts: 178
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: Avoiding Duplicates Question

Post by Geert Boedt » Tue Feb 23, 2016 8:52 am

What is your criteria for "duplicate patents"?

Your query retrieves granted applications that do not claim priority. I am not sure why you combine the application authority with the publication number in the select.
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


DBCerigo
Posts: 4
Joined: Wed Jan 20, 2016 11:52 pm

Re: Avoiding Duplicates Question

Post by DBCerigo » Tue Feb 23, 2016 6:21 pm

I suppose the higher level aim is for a set of patents which we will understand to represent distinct inventions, which does not contain duplicate inventions.

If the sets of patents connected by priority claims constitutes a family (simple family or DOCDB family?), and each family could represent an invention, then we should consider all families that contain at least one granted patent. My worry is that by simply considering granted applications that do not claim a priority, it will miss some of these families that have a granted patents further down their 'priority chains', and thus miss (representations of) inventions.

Thanks for your responses and thoughts, they are very appreciated.


Geert Boedt
Posts: 178
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: Avoiding Duplicates Question

Post by Geert Boedt » Wed Feb 24, 2016 5:13 pm

Considering a DOCDB family representing an invention is a valid approach. A family can have 1 or more family members (patent applications), and they all follow their own granting procedure for the respective patent authorities. That means that every family can have any number of granted family members. Every patent application in PATSTAT belongs to exactly 1 DOCDB or INPADOC family. To migrate the concept of granted to a DOCDB (or INPADOC) family requires a business rule. For example: "the family contains at least 1 granted application","the family contains only granted applications" or "contains no granted applications" . The query below retrieves DOCDB_FAMILY_ID's that have at least 1 granted family member:

Code: Select all

[i]SELECT top 50 docdb_family_id, granted
FROM tls201_appln
where granted = 1
group by docdb_family_id, granted[/i]
On families and priorities: a priority can be a priority in multiple DOCDB families. Priorities can be a DOCDB family on its own --> resulting in what you call "double counting", but priorities can just as well belong to the family of the patents that claim the priority.
If you don't want this, then you should use the INPADOC family concept. The INPADOC family groups all applications and priorities into 1 big family. The disadvantage is that "different -but related- inventions " are then sometimes grouped in 1 count.
The internal EPO process of building the DOCDB families is so extensive that it is impossible to find a "1 system fits all" approach to deal with the double counting. Some researchers (and professional patent information providers) develop their own "family concept", but every family concept will involve some kind of compromise. (Sometimes only priority filings or first filings are taken; see also the OECD triadic family concept.) The question is if the compromise is of statistical importance for you reseach, a discussion I leave to the academics.
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


Post Reply