Hi everyone!
I'm currently doing PhD in Economics, and trying to construct the technological spillovers measure at industry-level. I'm just wondering is it by any chance that I can collect the number of backward citations for a period (2005-2013) for all manufacturing industries (like how many backward citations from industry A to industry B and vice versa)? If not, is it possible to do it for different IPC?
I am new to SQL and would be really grateful for any support I can get regarding to this.
I'm trying PATSTAT online autumn version.
Kind regards!
Andy Luong
Help with backward citation at industry-level
-
- Posts: 3
- Joined: Wed Mar 22, 2017 5:00 pm
-
- Posts: 176
- Joined: Tue Oct 19, 2004 10:36 am
- Location: Vienna
Re: Help with backward citation at industry-level
Hello Andy,
There is no ready aggregated data available that gives industry-industry citations. But the principle is the same as all citation analysis; you only have to aggregate it at a different level. The PATSTAT website contains a number of self-learning publications that have sample queries. In particular, have a look at : "An introduction to the PATSTAT database with example queries" ; it gives an example on how to calculate the number of received citations. (forward and backward citations are in principle the same, but seen from a different position.) You simply have to bring this to "industry level" via the IPC classification codes. (Assuming you use the NACE classification scheme to define the "industries".)
Also check Google Scholar for "patent citations technology flows", you will find plenty of relevant publications.
The PATSTAT database does contain a ready made family-family citation table (tls228). But this table includes all the different categories of citations, and many researchers want to exclude applicant citations (or only take citations from search and examination - so called examiner citations). So if you want to exclude the "self citation bias", you will need to do it from scratch via the tls212_citation tables. You will also have to make some decisions what to do about patents that belong to more then 1 "industry" because of the assigned IPCs. Weighing might be appropriate, but that depends on the level of analysis you want to do. If you compare industries, weighing will bias negatively applications that cover multiple technical fields.
Doing this kind of analysis on PATSTAT Online will probably not be possible as you will need to work with intermediate tables to keep it manageable. If your data sample is limited (<100.000 applications), you could make a MS ACCESS extraction, and then continue working offline. The data extraction tool offers an option to include cited, citing and family members, so your final sample will be more then 100.000 applications.
I hope this information is helpful.
There is no ready aggregated data available that gives industry-industry citations. But the principle is the same as all citation analysis; you only have to aggregate it at a different level. The PATSTAT website contains a number of self-learning publications that have sample queries. In particular, have a look at : "An introduction to the PATSTAT database with example queries" ; it gives an example on how to calculate the number of received citations. (forward and backward citations are in principle the same, but seen from a different position.) You simply have to bring this to "industry level" via the IPC classification codes. (Assuming you use the NACE classification scheme to define the "industries".)
Also check Google Scholar for "patent citations technology flows", you will find plenty of relevant publications.
The PATSTAT database does contain a ready made family-family citation table (tls228). But this table includes all the different categories of citations, and many researchers want to exclude applicant citations (or only take citations from search and examination - so called examiner citations). So if you want to exclude the "self citation bias", you will need to do it from scratch via the tls212_citation tables. You will also have to make some decisions what to do about patents that belong to more then 1 "industry" because of the assigned IPCs. Weighing might be appropriate, but that depends on the level of analysis you want to do. If you compare industries, weighing will bias negatively applications that cover multiple technical fields.
Doing this kind of analysis on PATSTAT Online will probably not be possible as you will need to work with intermediate tables to keep it manageable. If your data sample is limited (<100.000 applications), you could make a MS ACCESS extraction, and then continue working offline. The data extraction tool offers an option to include cited, citing and family members, so your final sample will be more then 100.000 applications.
I hope this information is helpful.
Best regards,
Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna
Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna
-
- Posts: 3
- Joined: Wed Mar 22, 2017 5:00 pm
Re: Help with backward citation at industry-level
Hi Geert!
This is Andy. I hope you are well.
Thank you very much for your reply Geert. I've followed your advice and learned about the SQL from several materials.
However, I'm really struggling with my own query right now. Can you please take a look and let me know if I did something wrong here. I did try to correct it for several days but could not have a solution. Basically what I want is to have a table of 4 variables: the citing_appln_id, citing_ipc, cited_appln_id, cited_ipc, then I can add the count for the number of patent citation later. For the first part, I try to make a sample "t5" for the citing_appln_id and citing_ipc, and for the second part I do it for the cited application, by going backward from tls212.
I hope my question is clear enough for you Geert. If not, please feel free to ask, I will get back to you as soon as I can. And I'm really looking forward to hearing from you Geert.
My query:
SELECT distinct t5.appln_id, t1.appln_filing_year, t5.citing_ipc, t10.cited_id, t10.cited_ipc
FROM (select distinct t1.appln_id, t1.apln_filing_year, t1.appln_kind, t2.ipc_class_symbol as citing_ipc
From (((tls201_appln as t1
INNER JOIN tls209_appln_ipc as t2 on t1.appln_id = t2.appln_id)
INNER JOIN tls211_pat_publn as t3 ON t1.appln_id = t3.appln_id)
INNER JOIN tls212_citation as t4 on t4.pat_publn_id = t3.pat_publn_id
WHERE ((t1.appln_filing_year) >= 2005
AND (t1.appln_filing_year) <= 2013
AND (t1.appln_kind = 'A'))) as t5
INNER JOIN (select distinct t9.ipc_class_symbol as cited_ipc, t9.appln_id as cited_id
From (((tls212_citation as t6
INNER JOIN tls211_pat_publn as t7 ON t6.cited_pat_publn_id = t7.pat_publn_id)
INNER JOIN tls201_appln as t8 on t7.appln_id = t8.appln_id)
INNER JOIN tls209_appln_ipc as t9 on t8.appln_id = t9.appln_id
WHERE ((citing_ipc LIKE 'D03B%'
OR citing_ipc LIKE 'A14G%'
OR citing_ipc LIKE 'D02M%')
AND (cited_ipc LIKE 'A47R%'
OR cited_ipc LIKE 'B68G%'))) as t10 on t10.cited_id = t5.appln_id
GROUP BY t5.appln_id
Problem: 3:51:20 PM [SELECT - 0 row(s), 0 secs] [Error Code: 156, SQL State: S0001] Incorrect syntax near the keyword 'WHERE'.
This is Andy. I hope you are well.
Thank you very much for your reply Geert. I've followed your advice and learned about the SQL from several materials.
However, I'm really struggling with my own query right now. Can you please take a look and let me know if I did something wrong here. I did try to correct it for several days but could not have a solution. Basically what I want is to have a table of 4 variables: the citing_appln_id, citing_ipc, cited_appln_id, cited_ipc, then I can add the count for the number of patent citation later. For the first part, I try to make a sample "t5" for the citing_appln_id and citing_ipc, and for the second part I do it for the cited application, by going backward from tls212.
I hope my question is clear enough for you Geert. If not, please feel free to ask, I will get back to you as soon as I can. And I'm really looking forward to hearing from you Geert.
My query:
SELECT distinct t5.appln_id, t1.appln_filing_year, t5.citing_ipc, t10.cited_id, t10.cited_ipc
FROM (select distinct t1.appln_id, t1.apln_filing_year, t1.appln_kind, t2.ipc_class_symbol as citing_ipc
From (((tls201_appln as t1
INNER JOIN tls209_appln_ipc as t2 on t1.appln_id = t2.appln_id)
INNER JOIN tls211_pat_publn as t3 ON t1.appln_id = t3.appln_id)
INNER JOIN tls212_citation as t4 on t4.pat_publn_id = t3.pat_publn_id
WHERE ((t1.appln_filing_year) >= 2005
AND (t1.appln_filing_year) <= 2013
AND (t1.appln_kind = 'A'))) as t5
INNER JOIN (select distinct t9.ipc_class_symbol as cited_ipc, t9.appln_id as cited_id
From (((tls212_citation as t6
INNER JOIN tls211_pat_publn as t7 ON t6.cited_pat_publn_id = t7.pat_publn_id)
INNER JOIN tls201_appln as t8 on t7.appln_id = t8.appln_id)
INNER JOIN tls209_appln_ipc as t9 on t8.appln_id = t9.appln_id
WHERE ((citing_ipc LIKE 'D03B%'
OR citing_ipc LIKE 'A14G%'
OR citing_ipc LIKE 'D02M%')
AND (cited_ipc LIKE 'A47R%'
OR cited_ipc LIKE 'B68G%'))) as t10 on t10.cited_id = t5.appln_id
GROUP BY t5.appln_id
Problem: 3:51:20 PM [SELECT - 0 row(s), 0 secs] [Error Code: 156, SQL State: S0001] Incorrect syntax near the keyword 'WHERE'.
-
- Posts: 176
- Joined: Tue Oct 19, 2004 10:36 am
- Location: Vienna
Re: Help with backward citation at industry-level
Hello Andy,
I had some problems with the re-naming and aliases in your query, so I therefore decided to make a query from scratch which I think will serve your quest.
You can adapt it and add some "counts" if you need numbers.
I assume that you will want to look at family/family citations, so I have added the docdb_family_id. You can then remove all the attributes at application level (ID, number,...and IPC codes) to get a very strict family/family citation table.
At family /family level (then you loose the granularity for each of the 20 -4x5- possible IPC pairs):
I had some problems with the re-naming and aliases in your query, so I therefore decided to make a query from scratch which I think will serve your quest.
You can adapt it and add some "counts" if you need numbers.
I assume that you will want to look at family/family citations, so I have added the docdb_family_id. You can then remove all the attributes at application level (ID, number,...and IPC codes) to get a very strict family/family citation table.
Code: Select all
Select distinct citing_ap.appln_id, citing_ap.docdb_family_id,
citing_ap.appln_nr_epodoc, citing_ipc.ipc_class_symbol,
cited_ap.appln_id, cited_ap.docdb_family_id, cited_ap.appln_nr_epodoc,
cited_ipc.ipc_class_symbol
from
tls209_appln_ipc as citing_ipc join tls201_appln as citing_ap on citing_ipc.appln_id = citing_ap.appln_id
join tls211_pat_publn as citing_pub on citing_ap.appln_id = citing_pub.appln_id
join tls212_citation on citing_pub.pat_publn_id = tls212_citation.pat_publn_id
join tls211_pat_publn as cited_pub on tls212_citation.cited_pat_publn_id = cited_pub.pat_publn_id
join tls201_appln as cited_ap on cited_pub.appln_id = cited_ap.appln_id
join tls209_appln_ipc as cited_ipc on cited_ap.appln_id = cited_ipc.appln_id
where citing_ap.appln_filing_year between 2005 and 2013
and cited_ap. appln_kind = 'A'
and left (citing_ipc.ipc_class_symbol,4) in ('A47B','A47C%','A47D','A47F')
and left (cited_ipc.ipc_class_symbol,4) in ('B29B','B29D','B60C', 'B67D','C08C')
order by citing_ap.appln_id
Code: Select all
Select distinct citing_ap.docdb_family_id, cited_ap.docdb_family_id
from
tls209_appln_ipc as citing_ipc join tls201_appln as citing_ap on citing_ipc.appln_id = citing_ap.appln_id
join tls211_pat_publn as citing_pub on citing_ap.appln_id = citing_pub.appln_id
join tls212_citation on citing_pub.pat_publn_id = tls212_citation.pat_publn_id
join tls211_pat_publn as cited_pub on tls212_citation.cited_pat_publn_id = cited_pub.pat_publn_id
join tls201_appln as cited_ap on cited_pub.appln_id = cited_ap.appln_id
join tls209_appln_ipc as cited_ipc on cited_ap.appln_id = cited_ipc.appln_id
where citing_ap.appln_filing_year between 2005 and 2013 and cited_ap. appln_kind = 'A'
and left (citing_ipc.ipc_class_symbol,4) in ('A47B','A47C%','A47D','A47F')
and left (cited_ipc.ipc_class_symbol,4) in ('B29B','B29D','B60C', 'B67D','C08C')
order by citing_ap.docdb_family_id
Best regards,
Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna
Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna
-
- Posts: 3
- Joined: Wed Mar 22, 2017 5:00 pm
Re: Help with backward citation at industry-level
Thank you very much for these queries Geert. I really appreciate your help.
Have a nice weekend!
Best,
Andy Luong
Have a nice weekend!
Best,
Andy Luong