I want to limit my search for pharmaceutical patent applications filed in the US from 2005 to 2015. I would much appreciate if anyone could help me with some considerations.
According to Smoch's 2009 report to WIPO on IPC and Technology Concordance Table (https://www.wipo.int/meetings/en/doc_de ... _id=117672), the old classification for pharmaceuticals used to be A61K and A61P. The new version of the table suggests A61K (not A61K-008/). Also, the document suggests that organic fine chemistry (C07B, C07C, C07D, C07F, C07H, C07J, C40B) and biotechnology (C07G, C07K, C12M, C12N, C12P, C12Q, C12R, C12S) have considerable overlap with pharmaceuticals, indicating that those co-classified with A61K (not A61-008/) should be excluded. Therefore, it could be understood that these co-classified cases may be considered pharmaceuticals.
On the other hand, WIPO's 2019 Concordance Table (https://www.wipo.int/export/sites/www/i ... nology.xls) indicates another classification: A61K (not A61K-008/) and A61P.
Lastly, the document comments on a way to ensure proper classification by classifying applications based on the first/primary classification, which should be available for the USPTO.
My questions are:
1- In sectoral studies like this, is it right to just search for any IPC class (no matter the position) that corresponds to the identified relevant codes or should weights of positions be considered?
2- Can it be assumed that all USPTO applications have at least one first classification reported at PATSTAT?
3- If an application is reclassified, can this new classification change the first position or will it always be added as a new last/undefined position?
4- If the analysis only considers first classifications, is it right to define pharmaceuticals just as A61K (not A61-008) or should A61P also be included?
5- If the analysis considers both first and last (or undefined) classifications, is it right to define pharmaceuticals also with those defined as organic fine chemistry and biotechnology (in the first position) that are co-classified with these pharmaceutical codes (in the last/undefined position)?
6- How can I code a search for applications where ipc_class_symbol in any position is restricted to A61K (not A61K-008)?
7- How can I do the same search but defining that the ipc_class_symbol is in the first position?
Thanks for the help!
Best regards,
Eduardo Mercadante - LSE
IPC classification
-
- Posts: 440
- Joined: Thu Feb 22, 2007 5:33 pm
- Contact:
Re: IPC classification, F and L, Schmoch technology scheme
Dear Eduardo,
That is a whole lot of questions in one go, so let me tackle them one by one.
On question 1) In PATSTAT, we have pre-calculated the weighted technology shares for the each application based on the classification picture of each individual patent. Not on the position. This allows for more granular analysis then using the concept of “first” classification. (The source document does not clearly specify whether the methodology is applied on the IPC codes “ as published”, or taking into account possible re-classifications that could be the result from IPC revisions as well as examiner decisions.)
You can clearly see that when you run the SQL query below:
Examiners of the publishing patent authority (also EPO) will indicate the so called “first” classification codes, but I have never seen this being used for econometric analysis. The reason why I think it has no (or limited) use is the fact that IPC codes published on the A -publication (EP) can be different from the B publication, and can even change due to a revision of the classification scheme and consequent later reclassifications. PATSTAT will ALWAYS have the latest classifications -and that could even mean that the original classifications (as printed on the document) are not even in PASTAT Global. (You will find the classifications as published in PATSTAT Register !)
On question 2): thinking logically, every patent application should have IPC-codes published on the document (and as such have them in PATSTAT). This is not the case, and even less when taking into account that there should be an IPC code with “F” in the ipc_position attribute. There are different reasons such as: very old applications (before IPC existed; never IPC classified) and data errors. But I would consider these artefacts. Researchers can easily analyse these cases in depth if they think it might make a difference. The query below gives you a quick overview on the numbers.
Apart from the above, it seems that re-classifications and changes in the IPC scheme can make IPC-codes with an ‘F’ disappear.
I looked up a couple of cases by comparing the PATSTAT 2019b and 2020a editions, and when the IPC classification that had assigned the ‘F’ is removed from the IPC scheme – and from the application in PATSTAT, it seem that “no other classification takes its F-place” in PATSTAT. Run the query below on PASTAT2020A and 2019B an you will see the difference for this application. Looking up H04W4/22 -the “removed” IPC code with and “F”- on WIPO’s IPC tool shows that H04W4/22 was only active until the 2018.01 IPC version. So therefore, it is correctly not in the PATSTAT data base anymore.
On question 3): I would say that the above example illustrate that re-classifications can result in the position indicator being removed from an application. I would not use the ipc_position attribute for any analysis at all.
On question 4) I would follow the scheme exactly as coded in PATSTAT. (But you are free to analyse further in depth and maybe there is room for fine-tuning.) An expert examiner confirmed that they do use the F flag in their search strategy to narrow down and better target possible prior art; but for econometrics analysis, you would rather want recall versus precision. (I think).
On question 5) The above makes a case to not use the F and L flags at all. (Use the scheme as implemented and pre-agregated.)
On question 6) Something like this:
On question 7) Something like this:
I hope this answers your questions.
Geert BOEDT
That is a whole lot of questions in one go, so let me tackle them one by one.
On question 1) In PATSTAT, we have pre-calculated the weighted technology shares for the each application based on the classification picture of each individual patent. Not on the position. This allows for more granular analysis then using the concept of “first” classification. (The source document does not clearly specify whether the methodology is applied on the IPC codes “ as published”, or taking into account possible re-classifications that could be the result from IPC revisions as well as examiner decisions.)
You can clearly see that when you run the SQL query below:
Code: Select all
SELECT appln_auth, appln_nr , appln_filing_date,
tls230_appln_techn_field.*, tech_field.*
FROM
tls201_appln join tls230_appln_techn_field on tls201_appln.appln_id = tls230_appln_techn_field.appln_id
join (SELECT distinct techn_field_nr, techn_sector, techn_field FROM tls901_techn_field_ipc) tech_field
on tls230_appln_techn_field.techn_field_nr =
tech_field.techn_field_nr
where tls201_appln.appln_id = 1 --just to limit the result
On question 2): thinking logically, every patent application should have IPC-codes published on the document (and as such have them in PATSTAT). This is not the case, and even less when taking into account that there should be an IPC code with “F” in the ipc_position attribute. There are different reasons such as: very old applications (before IPC existed; never IPC classified) and data errors. But I would consider these artefacts. Researchers can easily analyse these cases in depth if they think it might make a difference. The query below gives you a quick overview on the numbers.
Code: Select all
SELECT earliest_filing_year
,count(docdb_family_id) fam_without_IPC
FROM tls201_appln
where appln_auth = 'US' and appln_kind= 'A'
and appln_id not in (select distinct appln_id from tls209_appln_ipc where ipc_position ='F')
and appln_id < 900000000
group by earliest_filing_year
order by earliest_filing_year
I looked up a couple of cases by comparing the PATSTAT 2019b and 2020a editions, and when the IPC classification that had assigned the ‘F’ is removed from the IPC scheme – and from the application in PATSTAT, it seem that “no other classification takes its F-place” in PATSTAT. Run the query below on PASTAT2020A and 2019B an you will see the difference for this application. Looking up H04W4/22 -the “removed” IPC code with and “F”- on WIPO’s IPC tool shows that H04W4/22 was only active until the 2018.01 IPC version. So therefore, it is correctly not in the PATSTAT data base anymore.
Code: Select all
SELECT appln_id
,ipc_class_symbol
,ipc_class_level
,ipc_version
,ipc_value
,ipc_position
,ipc_gener_auth
FROM tls209_appln_ipc
where appln_id = 467724167
order by ipc_class_symbol
On question 4) I would follow the scheme exactly as coded in PATSTAT. (But you are free to analyse further in depth and maybe there is room for fine-tuning.) An expert examiner confirmed that they do use the F flag in their search strategy to narrow down and better target possible prior art; but for econometrics analysis, you would rather want recall versus precision. (I think).
On question 5) The above makes a case to not use the F and L flags at all. (Use the scheme as implemented and pre-agregated.)
On question 6) Something like this:
Code: Select all
SELECT tls201_appln.appln_auth, tls201_appln.appln_nr, tls201_appln.appln_filing_date,
STRING_AGG ((ipc_class_symbol), ', ') ipc
FROM tls201_appln join tls209_appln_ipc on tls201_appln.appln_id = tls209_appln_ipc.appln_id
where appln_auth = 'US' and appln_kind= 'A'
and tls201_appln.appln_id in (select distinct appln_id from tls209_appln_ipc where left(ipc_class_symbol,4) = 'A61K')
and tls201_appln.appln_id not in (select distinct appln_id from tls209_appln_ipc where left(ipc_class_symbol,8) = 'A61K 8')
and tls201_appln.appln_id < 900000000
and appln_filing_year = 2015
group by tls201_appln.appln_auth, tls201_appln.appln_nr, tls201_appln.appln_filing_date
order by appln_filing_date desc
Code: Select all
SELECT tls201_appln.appln_auth, tls201_appln.appln_nr, tls201_appln.appln_filing_date,
STRING_AGG ((ipc_class_symbol), ', ') ipc
FROM tls201_appln join tls209_appln_ipc on tls201_appln.appln_id = tls209_appln_ipc.appln_id
where appln_auth = 'US' and appln_kind= 'A'
and tls201_appln.appln_id in (select distinct appln_id from tls209_appln_ipc where left(ipc_class_symbol,4) = 'A61K' and ipc_position ='F')
and tls201_appln.appln_id not in (select distinct appln_id from tls209_appln_ipc where left(ipc_class_symbol,8) = 'A61K 8')
and tls201_appln.appln_id < 900000000
and appln_filing_year = 2015
group by tls201_appln.appln_auth, tls201_appln.appln_nr, tls201_appln.appln_filing_date
order by appln_filing_date desc
Geert BOEDT
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org
EPO - Vienna
patstat @ epo.org
-
- Posts: 12
- Joined: Thu Sep 17, 2020 12:04 pm
Re: IPC classification
Thank you a lot for this amazing response! Your examples of coding were very good for me to train and refine my skills. Also, it was very inciteful because I was still fixated on ipc_position. I will remove that from any of my analyses.
Finally, considering that what I am doing, in the end, is trying to apply WIPO's and the EPO's classification for technologies based on IPC, I have decided to make use of what's already done. I will filter by tech_field_nr = 16 (Pharmaceuticals).
Thank you again!
Eduardo Mercadante - LSE
Finally, considering that what I am doing, in the end, is trying to apply WIPO's and the EPO's classification for technologies based on IPC, I have decided to make use of what's already done. I will filter by tech_field_nr = 16 (Pharmaceuticals).
Thank you again!
Eduardo Mercadante - LSE