self citation need some help

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

ugkim
Posts: 6
Joined: Tue Apr 26, 2016 11:32 am

self citation need some help

Post by ugkim » Tue May 03, 2016 6:42 am

I am collecting docdb family groups of CPC Y02E (and sub categories also) patent,
and try to count number of backward and forward citations of those patents (that I can ..mainly I use tls228),

but I really do not know how to count the number of self-citation without giving specific persons/companies information(same doc_std_name_id in the patents and their citations ?),
there are so many tables in Patstat, so is there a kind of indicator about self citations information? (in which tables?)

thank you in advance!


ugkim
Posts: 6
Joined: Tue Apr 26, 2016 11:32 am

nico, geert or anybody

Post by ugkim » Mon May 16, 2016 9:34 am

Nico, Geert!

is this(below) right approach to count 3 year time window forward citation number?
we will ignore the error from the inventions (docdb family id) with more than 2 earliest_filing_years.
also we want to avoid using publication table if possible, we care only the invention level (docdb family).

--------------------------------------
select distinct cited_docdb_family_id, count(distinct tls228_docdb_fam_citn.docdb_family_id)
from tls228_docdb_fam_citn
left join tls201_appln on tls201_appln.docdb_family_id = tls228_docdb_fam_citn.cited_docdb_family_id
left join tls201_appln t1 on t1.docdb_family_id = tls228_docdb_fam_citn.docdb_family_id

where cited_docdb_family_id in
(select distinct docdb_family_id
from tls201_appln
left join tls224_appln_cpc on tls224_appln_cpc.appln_id = tls201_appln.appln_id
where appln_kind = 'a'
and earliest_filing_year between 1970 and 2011
and cpc_class_symbol like 'Y02E 10/1%' )

and datediff(year, tls201_appln.earliest_filing_date, t1.earliest_filing_date ) <= 3

group by cited_docdb_family_id
------------------------------------


nico.rasters
Posts: 140
Joined: Wed Jul 08, 2009 5:51 pm
Contact:

Re: self citation need some help

Post by nico.rasters » Mon May 16, 2016 4:16 pm

I don't have PATSTAT Online, but the query seems ok.
You are doing a count for the cited families, which is the right way to do it for forward citations.
You are also using a COUNT(DISTINCT which takes care of any double count that will occur because you are joining with TLS201.

You can leave out the DISTINCT in select distinct cited_docdb_family_id, because you are already using a GROUP BY at the end.
In MySQL, the datediff function just returns the number of days, so the result would have to be <= 3*365. But maybe MSSQL accepts that extra "year" parameter that you are using.

You can/should always test your query with just a few families. You can enter some IDs instead of the subquery. It helps to do a SELECT *, so you can see what your query is doing exactly.

As I do not like complexity, my approach would be to first store the docdb families belonging to the Y02E patents in a table.
From this table I would create a subset of TLS228.
The final step would be to add the earliest filing dates for both the citing and cited families to this subset.

And to answer your first question: as far as I know there is no "self-citation" indicator available in PATSTAT.
________________________________________
Nico Doranov
Data Manager

Daigu Academic Services & Data Stewardship
http://www.daigu.nl/


Geert Boedt
Posts: 178
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: self citation need some help

Post by Geert Boedt » Thu May 19, 2016 5:18 pm

There is no "self citation" indicator in PATSTAT; so you would need to create that yourself, based on your understanding of what the concept means. (And doing this in PATSTAT Online would probably not be feasible, because you need intermediate stored tables to make this work in a more or less normal setting.)
There are some good articles/publications written on this subject which you can easily find back.
Google for "Does it matter where patent citations come from? Inventor vs. examiner citations in European patents" from Paola Criscuoloa & Bart Verspagen. But there is much more out there.

The bottom line is: you need to have a clear idea why you want to exclude self citations, and what is your data sample you are working with. Maybe it is worth to simulate your results with and without self citations, and look at the differences.
If you only look at US patents, then the playing field is level, but if your research involves EP as well as US or other patent filing authorities, then you might want to exclude self citations to avoid bias. Looking only at the "names" is maybe not sufficient; also examiners can cite applications from the applicant itself (and do so very often). Maybe it would make more sense to only include citations given by the examiners during search and examination. (those can be identified in PATSTAT via the CITN_ORIGIN attribute).
There is no golden bullet, it all depends on your data sample, and what exactly you want to research on.
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


ugkim
Posts: 6
Joined: Tue Apr 26, 2016 11:32 am

Re: self citation need some help

Post by ugkim » Tue May 31, 2016 6:14 am

Thanks Geert,

we are doing research about technology spill over, so it looks we must exclude self citations, and citations made by whom is still open.
before I start laboring manual work of forward citations of about 200 000 observations (actually less than 100 000 have forward citations).
if you have any -good/bad- idea please give me some hint!
total application number is definitely over 100 000, so data downloading should be divided somehow
(how to recombine them? copy and paste? )

patent office and invention year -and also triadic, family size, granted- will be controlled,
we want to know inventors country origin but there are too many missing data so we gave up.

our work is somehow similar to Antoine Dechezlepretre (CEP discussion paper No 1300, Sep 2014)


Geert Boedt
Posts: 178
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: self citation need some help

Post by Geert Boedt » Tue May 31, 2016 10:47 am

It is important that you define the term "self citation". If you start from the definition that self-citations are those where the citing and cited applicant are the same, the only way to exclude them is to look at the respective applicants. (The harmonised names will be useful to do this.) There are however a number of different definitions on what self citations could be.
If you want to exclude self citations because of the bias factor introduced by the applicant, then it might be sufficient to exclude the citations introduced by the applicant. (Or alternatively only take citations from the search and examination procedure.) The attribute citn_origin can be used to do this.
Simply comparing the names of the applicants will result in also the self citations introduced by the patent examiner being excluded.
With regards to the combine/paste/copy question, the most obvious solution is that you purchase and install the PATSTAT database on a local computer. Downloading "sets" via PATSTAT Online to recreate a larger data base goes against the philosophy of PATSTAT Online, and might lead to unexpected results because of duplicated records.
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


ugkim
Posts: 6
Joined: Tue Apr 26, 2016 11:32 am

Re: self citation need some help

Post by ugkim » Tue May 31, 2016 11:16 am

thank you again
we are seriously considering order DISC !!!!


Post Reply