I am a master student currently working on his semester thesis and was interested in using PATSTAT online (version of autumn 2015) to trace back the learning loci in the Lithium-ion battery (LIB) industry. More precisely, I want to perform a forward-citation analysis on a dataset of patents judged relevant:
I am a bit stuck at some point with the use of PATSTAT to finish up the data collection part. Do you think you could help me with that? If this is the case, in order to help you understand my questions, here a summary of my current situation:
I managed without problem to retrieve the patents of interest from the EPO database, but I have some issues performing the citation analysis. More especially, as the on-line version is less potent than the off-line version (especially in terms of intermediate creation of tables), I meet the following issues:
- (1) I adapted the query from (de Rassenfosse, Dernis, and Boedt 2014) in order to perform a forward citation-analysis on the selected patents (see my query) with a time window of 5 years. My problem now is to further adapt this query to do a family-family citations (as at the moment I am comparing patent applications and expose myself to double counting if I understood well). For this, I had two ideas:
- (A) Directly extracting the dataset of the relevant patents in terms of LIBs and then using MS access to manually flag the family.
- (B) Or applying a sub-query directly into the SQL query to restrain the result to intra-family citations.
- (2) Once I have my final dataset with the relevant patents from the LIBs industry and the counting of forward citations for each of them, I would have like to enhance this database by:
- •Collecting in a new database all the family_id of the backward AND forward citations of my initial dataset and apply the same key-word search string on them (in order to decrease the number of false negative I have due to the constraint on the technology code). Do you think such a manipulation is possible with PATSTAT online? If yes, is it possible you have some advice concerning such a query?
Cheers,
David
Attached: my query so far:
SELECT
DISTINCT t0.inpadoc_family_id, COUNT(DISTINCT t7.pat_publn_id), t0.appln_id, t6.earliest_date, t8.pat_publn_id, t8.publn_date
FROM
tls201_appln t0
INNER JOIN
(SELECT
t1.appln_id
FROM
tls201_appln t1
INNER JOIN
tls209_appln_ipc t2 ON t1.appln_id = t2.appln_id
INNER JOIN
tls224_appln_cpc t3 ON t2.appln_id = t3.appln_id
INNER JOIN
tls202_appln_title t4 ON t3.appln_id = t4.appln_id
INNER JOIN
tls203_appln_abstr t5 ON t4.appln_id = t5.appln_id
WHERE
YEAR(t1.appln_filing_date) BETWEEN 1990 AND 2010
AND (t1.appln_kind = 'A' OR t1.appln_kind = 'W')
AND (t2.ipc_class_symbol LIKE 'H01M 10/052')
-- other search strings which takes time to be computed
GROUP BY t1.appln_id) ta ON t0.appln_id = ta.appln_id
INNER JOIN
(SELECT
appln_id, MIN(publn_date) AS earliest_date
FROM
tls211_pat_publn
GROUP BY appln_id) t6 ON t0.appln_id = t6.appln_id
INNER JOIN
tls211_pat_publn t6b ON t6b.appln_id = t6.appln_id
INNER JOIN
tls212_citation t7 ON t6b.pat_publn_id = t7.cited_pat_publn_id
INNER JOIN
tls211_pat_publn t8 ON t7.pat_publn_id = t8.pat_publn_id
WHERE
YEAR(t6.earliest_date) != 9999
AND YEAR(t8.publn_date) != 9999
AND DATEDIFF(YEAR, t6.earliest_date, t8.publn_date) <= 5
GROUP BY t0.inpadoc_family_id, t0.appln_id, t6.earliest_date, t8.pat_publn_id, t8.publn_date
ORDER BY COUNT(DISTINCT t7.pat_publn_id) DESC, t0.appln_id ASC;