Hello Muyang,
as far as I can see, your query is correct, apart from a mistake in one of the ipc_class_symbol and a bracket that is not needed. Observe that I removed the embedded wild card in the classification codes;it can lead to errors because of the way the IPC classification hierarchy is formed. The number of characters before the / should always be 8, and therefore blanks should be added as illustrated below.
When doing this kind of analysis, one can always write a query to double check one specific application in detail (or publication in your case), and see if the numbers add up.
Here is an example of your query, but just looking at 1 publication:
Code: Select all
select count (distinct(a212.pat_publn_id)) as number , year(a211.publn_date) AS date
from patstat2021a.dbo.tls201_appln a1
join patstat2021a.dbo.tls209_appln_ipc a209 on a1.appln_id = a209.appln_id
join patstat2021a.dbo.tls207_pers_appln a207 on a1.appln_id = a207.appln_id
join patstat2021a.dbo.tls206_person a206 on a207.person_id = a206.person_id
join patstat2021a.dbo.tls211_pat_publn a211 on a211.appln_id = a1.appln_id
join patstat2021a.dbo.tls212_citation a212 on a211.pat_publn_id = a212.cited_pat_publn_id
where (a209.ipc_class_symbol like 'G06F 17/20%'
or a209.ipc_class_symbol like 'G06F 17/21%'
or a209.ipc_class_symbol like 'G06F 17/22%'
or a209.ipc_class_symbol like 'G06F 17/23%'
or a209.ipc_class_symbol like 'G06F 17/24%'
or a209.ipc_class_symbol like 'G06F 17/25%'
or a209.ipc_class_symbol like 'G06F 17/26%'
or a209.ipc_class_symbol like 'G06F 17/27%'
or a209.ipc_class_symbol like 'G06F 17/28%'
or a209.ipc_class_symbol like 'G06K 9%'
or a209.ipc_class_symbol like 'G06T 7%'
or a209.ipc_class_symbol like 'G10L 13/027%'
or a209.ipc_class_symbol like 'G10L 15%'
or a209.ipc_class_symbol like 'G10L 17%'
or a209.ipc_class_symbol like 'G10L 25/66%'
or a209.ipc_class_symbol like 'G06F 15/18%')
and year(a211.publn_date) between 2000 and 2020
and a1.granted= 'Y'
and a211.publn_first_grant= 'Y'
and a1.appln_auth='US'
and a206.person_ctry_code ='US'
and a1.ipr_type = 'PI'
and a212.cited_pat_publn_id <> 0
and publn_nr = '7437338' and publn_auth = 'US' and publn_kind = 'B1'
group by year(a211.publn_date)
order by date
The result is 30, meaning that the US publication (B1) was cited 30 times.
If you now double check in ESPACENET, you will see that there are only 20 forward citations.
https://worldwide.espacenet.com/patent/ ... S7437338B1
But ESPACENET only shows 1 record per family, so 20 citing families.
We can then double check this by querying a list with all the citing publications and add the citing (docdb) family.
Code: Select all
SELECT DISTINCT
cited.pat_publn_id
,cited.publn_auth +cited.publn_nr+ ' '+cited.publn_kind cited_pub
,cited.publn_date pubilcation_date_cited
,citing.publn_auth + citing.publn_nr +' '+ citing.publn_kind citing_pub
,citing.publn_date date_citing
,citing_app.docdb_family_id
FROM tls211_pat_publn as citing
JOIN tls212_citation as j ON citing.pat_publn_id = j.pat_publn_id
JOIN tls211_pat_publn AS cited ON j.cited_pat_publn_id = cited.pat_publn_id
join tls201_appln citing_app on citing_app.appln_id = citing.appln_id
WHERE
cited.publn_auth = 'US' and cited.publn_nr = '7437338' and cited.publn_kind = 'B1'
order by cited.pat_publn_id, citing.publn_date
The result is 31 citing publications, but one has the publication year 2021 (US10965622 B2--> 2021-03-30) which is outside the range you specified. If you then count the number of distinct families you come up to 20, which is exactly the same as what we see in Espacenet.
Use the same approach to double check a couple of publication, and you hopefully comes to similar results.