Counting backward citations and data coverage

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

Esther researcher
Posts: 3
Joined: Tue Jan 09, 2018 7:00 am

Counting backward citations and data coverage

Post by Esther researcher » Mon Jan 15, 2018 11:56 am

Dear Geert,
I am interested in counting backward citations of Spanish companies in Patstat 2017b in the period 2000-2015. The objective is to analyse its evolution. My questions are:
1) I have to decide about the application authority (EP or also national offices). Then, how good is the coverage of backward citations for national offices? I have read the entry in Gianluca's blog "backward citation: analysis for PCT and national offices" from 2012 apparently saying that only citations in EP and WO patents are reliable reflected in Patstat.
2) I have to decide about the unit of analysis: applications or families? What is the best approach in your opinion?
In case I wanted to use families, I could retrieve appln_id by Spanish companies to the EP (or all offices, depending on your answer to question 1) with earliest priority year between 2000-2015. Then join to tls212 and group by docdb_family id. Would it be correct?
3) Eventually, I may also be interested in distinguish between examiner and applicant citations. I have read I can do it using citation origin or citation category. But this is only available for EP applications, isn't it?
4) How good is the classification of person_types in tls206 table? I have seen many companies under the "unknown category".

Thank you in advance,
Esther


Geert Boedt
Posts: 176
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: Counting backward citations and data coverage

Post by Geert Boedt » Mon Jan 15, 2018 3:52 pm

Hello Esther,
that's a lot of questions in one post.
I will tackle them one by one:
1) the coverage of citations in PATSTAT is dependant on what data the EPO receives from the respective national patent offices that carry out the searches. Additionally to the above, the EPO carries out searches for a number of countries (Italy, Belgium, Luxembourg,...). You can easily observe this by analysing the differences between the publn_auth and the citn_gener_auth .

Code: Select all

SELECT citing.publn_auth, citn_gener_auth, count(distinct(cited.pat_publn_id))
  FROM tls211_pat_publn citing join tls212_citation on citing.pat_publn_id = tls212_citation.pat_publn_id
  join tls211_pat_publn cited on tls212_citation.cited_pat_publn_id = cited.pat_publn_id
  where citn_gener_auth <> ''
group by citing.publn_auth, citn_gener_auth
order by citing.publn_auth, citn_gener_auth
The coverage itself can be found in the document "Overview of citation data in the EPO's citation database (REFI) " which you can find at this link: http://www.epo.org/searching-for-patent ... gular.html But you have to keep in mind that this document does not tell you where the coverage gaps are. There is no information on this, so if you would like to know it in detail you will need to carry out your own analysis. (You could use the above query and look at time frames, number of citations; normalised over the number of publications of search reports.
2) (Backward citations ?) That completely depends on the nature of your analysis. Anything that falls under the heading of "technology flows" would probably use family based citation analysis. And I have seen researchers analysing whether some patent authorities have a bias for citations from their "national" applicants or from patents filed in their own patent office. "Family based" data would make no sense when looking at specific citing behaviour from patent searchers.
If you want to use the concept of "earliest priority" with "family", then you will need to aggregate that data yourself at family level. Some researches also consider a time window (3 year,...) following publication date for their analysis of cited do to avoid bias of "older" technology that keeps on being cited over and over again. Researchers have to analyse themselves whether or not such approach fits their model or not. Earliest priority year = 2015 will probably not yield much data yet: +12 months for second filing + publication delay (even more for PCT).
3) Correct, the attribute "citn_origin" allows you to exclude for example the citations by the applicant. The above referred document will give you an overview on what kind of citations we have for specific patent authorities. You might want to analyse in more detail to see what kind of citations we have in PATSTAT.
4) I have to refer to the methodology described in the paper. If it shows that your sample contains too many "unknown" applicants, you could always improve on the data yourself by adding the missing category. (or you can post it here, and I will forward it to the ECOOM team so it can be added into the data for the next release).
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


Esther researcher
Posts: 3
Joined: Tue Jan 09, 2018 7:00 am

Re: Counting backward citations and data coverage

Post by Esther researcher » Tue Jan 23, 2018 10:06 am

Dear Geert,
Thank you very much for your helpful answer!

Best,
Esther


Post Reply