Forward and Backward Citation

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

jay2018
Posts: 10
Joined: Sun Nov 11, 2018 12:52 am

Forward and Backward Citation

Post by jay2018 » Sun Nov 11, 2018 5:10 am

Hi,

I have some question with regard to get the forward and backward citation count.
I try the following code:

Code: Select all

select distinct a.appln_id, a.appln_filing_date, i.ipc_class_symbol, count(distinct(c.citn_id))as fcitation, count(distinct(c.cited_pat_publn_id)) as bcitation, p.publn_nr, p.publn_date, p.publn_claims, p.publn_auth
from tls201_appln a 
left join tls209_appln_ipc i on a.appln_id = i.appln_id
left join tls211_pat_publn p on p.appln_id = i.appln_id
left join tls212_citation c on c.pat_publn_id = p.pat_publn_id
where i.ipc_class_symbol like 'H02M%'
and p.publn_auth = 'US'
and a.granted = '1'
and i.ipc_position = 'F'
AND a.earliest_publn_year BETWEEN 2005 AND 2017
and p.publn_first_grant = '1'
group by a.appln_id, i.ipc_class_symbol, p.publn_claims, p.publn_nr, a.appln_filing_date, p.publn_date, p.publn_auth
But I'm not sure if I use the correct data to get the count of forward citation (fcitation) and backward citation (bcitation). Could you help me with that?
Also, when I change the p.publn_auth to "CN" or "JP", the values for both fcitation and bcitation become zero. Is that because I used the wrong variables or the data is missing for other patent offices?

Best regards,
Wei


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Forward and Backward Citation

Post by EPO / PATSTAT Support » Thu Nov 22, 2018 11:15 am

Hello Wei,
A couple of observations:
In your query, you tried to count at the same time the forward and the backward citations. If one would want to do this, the query would need to have 2 joins via tls212, one to the forward citations and one to the backward citations. I feel this is needlessly complicating the data aggregation.
I would separate forward and backward citations and then use further analysis to bring the result together if there would be any need for that.
You used LEFT JOINS, which I personally try to avoid if there is no specific need for it. Assuming that you only want to look at applications that effectively have citations and an IPC code, this is not needed.
Your condition "and i.ipc_position = 'F'" will only have an effect for US applications, which in your query is indeed the case through the "and p.publn_auth = 'US'" conditions. But keep in mind that you will exclude all applications that have a H02M classifications NOT in the first position. (this amounts to about 30.000 out of 40.000 applications - is this really what you want ?)
And a last remark: the query specifies " count(distinct(c.citn_id)) ". Counting citn_id's will include NPL for the backward citations, but there are no forward citations to NPL. Unless you have a specific need to do so, I would only count citations to patent publications by forcing c.cited_pat_publn_id <> 0
Keeping the above in mind; here is a query for a backward citations count :

Code: Select all

select a.appln_id, a.appln_filing_date, p.publn_auth, p.publn_nr, p.publn_date,
 p.publn_claims,i.ipc_class_symbol, count(c.cited_pat_publn_id) as bcitation
from tls201_appln a 
  join tls209_appln_ipc i on a.appln_id = i.appln_id
  join tls211_pat_publn p on p.appln_id = a.appln_id
  join tls212_citation c on  p.pat_publn_id = c.pat_publn_id
where i.ipc_class_symbol like 'H02M%'
 and p.publn_auth = 'US'
 and p.publn_first_grant = 'Y'
 and a.granted = 'Y'
 and i.ipc_position = 'F'
 AND a.earliest_publn_year BETWEEN 2005 AND 2017
 and c.cited_pat_publn_id <> 0
group by a.appln_id, a.appln_filing_date, p.publn_nr, p.publn_date, 
	p.publn_claims, p.publn_auth , i.ipc_class_symbol
 order by a.appln_id
Here is a query that gives the individual backward citations (101 in total) for the first publication of the above query (US7881084):

Code: Select all

SELECT  citing.appln_id, citing.publn_auth, citing.publn_nr, citing.publn_kind, citing.publn_date, tls212_citation.*,
cited.publn_auth, cited.publn_nr, cited.publn_kind, cited.publn_date,tls215_citn_categ.citn_categ, npl_biblio
  FROM tls211_pat_publn citing join tls212_citation on citing.pat_publn_id = tls212_citation.pat_publn_id
  join tls211_pat_publn cited on tls212_citation.cited_pat_publn_id = cited.pat_publn_id
  left join tls215_citn_categ on tls212_citation.pat_publn_id = tls215_citn_categ.pat_publn_id and tls212_citation.citn_id = tls215_citn_categ.citn_id
  --left join needed here otherwise only citations from the search report are included in the result, and not for example citations during opposition.
  join tls214_npl_publn on tls212_citation.cited_npl_publn_id = tls214_npl_publn.npl_publn_id 
  where citing.appln_id in (463368) and citing.publn_kind = 'B2' and cited_pat_publn_id <> 0
    order by citing.publn_date desc, citing.appln_id
And here is the query that looks at the forward citations: (observe the change in the way the tls212 table is joined to the publication, and that we now count pat_publn_id's !)

Code: Select all

select a.appln_id, a.appln_filing_date, p.publn_auth, p.publn_nr, p.publn_date,
 p.publn_claims,i.ipc_class_symbol, count(distinct (c.pat_publn_id)) as fcitation
from tls201_appln a 
  join tls209_appln_ipc i on a.appln_id = i.appln_id
  join tls211_pat_publn p on p.appln_id = a.appln_id
  join tls212_citation c on  p.pat_publn_id = c.cited_pat_publn_id
where i.ipc_class_symbol like 'H02M%'
 and p.publn_auth = 'US'
 and p.publn_first_grant = 'Y'
 and a.granted = 'Y'
 and i.ipc_position = 'F'
 AND a.earliest_publn_year BETWEEN 2005 AND 2017
 and c.cited_pat_publn_id <> 0
group by a.appln_id, a.appln_filing_date, p.publn_nr, p.publn_date, 
	p.publn_claims, p.publn_auth , i.ipc_class_symbol
 order by a.appln_id
And here is a query that shows all the citing publications (9) for the first application resulted from the previous query (US7881084):

Code: Select all

select a.appln_id, a.appln_filing_date, p.publn_auth, p.publn_nr, p.publn_date,
 p.publn_claims,i.ipc_class_symbol, citing.*
from tls201_appln a 
  join tls209_appln_ipc i on a.appln_id = i.appln_id
  join tls211_pat_publn p on p.appln_id = a.appln_id
  join tls212_citation c on  p.pat_publn_id = c.cited_pat_publn_id
  join tls211_pat_publn citing on c.pat_publn_id = citing.pat_publn_id
where i.ipc_class_symbol like 'H02M%'
 and p.publn_auth = 'US'
 and p.publn_first_grant = 'Y'
 and a.granted = 'Y'
 and i.ipc_position = 'F'
 AND a.earliest_publn_year BETWEEN 2005 AND 2017
 and c.cited_pat_publn_id <> 0
 and a.appln_id = 463368
 order by a.appln_id
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


jay2018
Posts: 10
Joined: Sun Nov 11, 2018 12:52 am

Re: Forward and Backward Citation

Post by jay2018 » Tue Nov 27, 2018 6:54 am

Hi,

Thank you for your help with the code and I appreciate that.
And I have two more questions. If I want to get the country of origin data, is it available in the data set? Also, I find two variables about the inventor country. Is the tls206_person.person_ctry_code the nationality of inventors, and tls226_person_orig.residence_ctry_code the country of residence?

Best regards,
Wei


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Forward and Backward Citation

Post by EPO / PATSTAT Support » Tue Nov 27, 2018 4:22 pm

Hello Wei,
the person_ctry_code is the country code used for correspondence, we also call it the "place of business". It has no link to nationality or citizenship.

About the tls226 table, the residence code is only used in US data and only about 1200 inventors have a residence_ctry_code <> person_ctry_code.
So I think it is statistically irrelevant.

Here is some code I used to analyse it_:

Code: Select all

SELECT distinct [person_orig_id] ,[person_id] ,[source] ,[last_name]  ,[first_name] 
,[street],[city],[zip_code],[state],[person_ctry_code] ,[residence_ctry_code] ,[role]
  FROM [tls226_person_orig]
  where residence_ctry_code <> '' and residence_ctry_code <>  person_ctry_code
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


jay2018
Posts: 10
Joined: Sun Nov 11, 2018 12:52 am

Re: Forward and Backward Citation

Post by jay2018 » Sun Dec 02, 2018 5:19 pm

Hi,

Thank you for your comments. And I have two more questions.
1. What selection criteria should I add if I want to limit my search to only granted patent?
2. How should I change to code to have all applications that have an H02M classification, not only those that have H02M in the first position?

Best regards,
Wei


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Forward and Backward Citation

Post by EPO / PATSTAT Support » Mon Dec 03, 2018 4:48 pm

Hello Wei,
continuing on the same example; and assuming you mean "granted of the cited documents" you can simply "JOIN" a tls201_appln table to the "Cited" table (alias), and then add the attribute "GRANTED" in your SELECT clause. That will give for each citation a value "Y" or "N", depending on whether the application was granted or not (and a NULL value for NPL thanks to the LEFT JOIN).
You can then use it for selecting only the granted applications, or you could also specify "GRANTED = 'Y' " in your WHERE clause.

Code: Select all

SELECT  citing.appln_id, citing.publn_auth, citing.publn_nr, citing.publn_kind, citing.publn_date, tls212_citation.*,
cited.publn_auth, cited.publn_nr, cited.publn_kind, cited.publn_date,tls215_citn_categ.citn_categ, cited_ap.granted, npl_biblio
  FROM tls211_pat_publn citing join tls212_citation on citing.pat_publn_id = tls212_citation.pat_publn_id
  join tls211_pat_publn cited on tls212_citation.cited_pat_publn_id = cited.pat_publn_id
  left join tls215_citn_categ on tls212_citation.pat_publn_id = tls215_citn_categ.pat_publn_id and tls212_citation.citn_id = tls215_citn_categ.citn_id
  --left join needed here otherwise only citations from the search report are included in the result, and not for example citations during opposition.
  join tls214_npl_publn on tls212_citation.cited_npl_publn_id = tls214_npl_publn.npl_publn_id 
  left join tls201_appln cited_ap on cited.appln_id = cited_ap.appln_id
  where citing.appln_id in (463368) and citing.publn_kind = 'B2' and cited_pat_publn_id <> 0
    order by citing.publn_date desc, citing.appln_id
About "not only those that have H02M in the first position?"
In the classification code position; simply remove " and i.ipc_position = 'F' " from the WHERE clause.
Your question is rather SQL syntax related then PATSTAT related. I would advice you to have a look at the sample queries from the documentation, run and analyse those, and maybe change the conditions in the JOIN and WHERE clauses to get acquinted with the data structure and how SQL queries work on PATSTAT.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply