Docdb family citation

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

PEVJ
Posts: 4
Joined: Fri Aug 11, 2017 12:14 pm

Docdb family citation

Post by PEVJ » Thu Aug 31, 2017 9:20 pm

My research has about 100.000 Docdb families Id (Patstat Tls_218) and I would like to match those DOCDB family Id with their cited docdb family (Patstat Tls_228). Nonetheless, when I merged my research database with the cited docdb family database, I found about 30.000 Docdb family missing in the citation database (I mean that around 30% of my database there are no relative cited docdb family). Should I interpret the missing cases as if that the Docdb family have not cited any patent?
Thanks!


Geert Boedt
Posts: 176
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: Docdb family citation

Post by Geert Boedt » Fri Sep 01, 2017 4:11 pm

Can you illustrate this with and couple of examples and the query you used to link the families with its cited families ?
That would allow us to have a closer look at your observation.
Separate from the above, we do not have citations for all patent authorities.
Have a look at the citations coverage here (Overview of citation data in the EPO's citation database): http://www.epo.org/searching-for-paten ... gular.html
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


PEVJ
Posts: 4
Joined: Fri Aug 11, 2017 12:14 pm

Re: Docdb family citation

Post by PEVJ » Sat Sep 02, 2017 1:33 pm

Thank you very much for your answer.
I am using the EPO Patstat (raw data) 2015 spring edition and the stata software. The steps that I followed to build the population of about 100.000 DOCDB families Id were:
1. I used the tls206_PERSON to select all person_id with person_ctry_code == "BR".
2. I matched the selected person_id sample from the step 1 with the tls207_PERSON_APPLN and kept only the applt_seq_nr>0. I did some manual work to drop some applicants that do not fit in my research objective (to that selection I used the HRM name). So I got the appln_id pool that I intend to research.
3. I matched that appln_id (the pool that I got from step 2) with TLS218_DOCDB_FAM, and then I found the DOCDB families that I needed for the analysis (about 100.000 DOCDB_FAMILY_ID).
4. The next step was to found the patent citation. So, I used those DOCDB_FAMILY_ID to match with TLS228_DOCDB_FAM_CITN, and then I found about 30.000 DOCDB family missing in the citation database. I selected a random sample of 10 DOCDB_FAMILY_ID that the CITED_DOCDB_FAMILY_ID are missing and checked at the spacenet, and I found that 9 didn´t have cited or citing patents, but 1 have citing patents. So, I wonder if I should interpret the missing cases as if that the DOCDB family have not cited any patent?
Besides that, I am not sure that I understood what you mean by "the citations coverage". I looked at the "Overview of citation data in the EPO's citation database". Does the list of countries mean that the patent citation is only considered when it refers to a patent filled in one of those patent authorities listed in the citations coverage? As my research refers specifically to applicant "from" Brazil and the Brazilian patent authority is not listed in that citation coverage, is it possible that is the explanation for the missing cases?
[EDIT: In that case, I have tabulated the 33,639 appln_id related to the DOCDB_FAMILY_ID with missing CITED_DOCDB_FAMILY_ID, and I found that 90% have APPLN_AUTH == BR (patent office is Brazilian), but the others 10% is from patent authorities from US, ES, EP, JP.... What am I missing to interpret these data?]
Thank you for your help!
Best Regards,
Paula


Post Reply