Hi all,
I would like to know how you calculate the weights assigned to NACE codes in the TLS229_APPLN_NACE2 table.
Regards,
Aris
NACE2 codes weights calculation
-
- Posts: 440
- Joined: Thu Feb 22, 2007 5:33 pm
- Contact:
Re: NACE2 codes weights calculation
Hello Aris,
you find more information here:
concordance-table-between-ipc-and-nace2-9756
And in the PATSTAT Data Catalog.
Geert Boedt
you find more information here:
concordance-table-between-ipc-and-nace2-9756
And in the PATSTAT Data Catalog.
Geert Boedt
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org
EPO - Vienna
patstat @ epo.org
-
- Posts: 4
- Joined: Mon Jul 25, 2022 10:21 am
Re: NACE2 codes weights calculation
Thank you Geert for the reference.
The referred documents do not explain how NACE codes are applied to applications and how their weight is calculated.
Take, for example, the appln_id 437252193. The following query returns 9 NACE2 codes with different weights. Their weights sum to 1. It looks like an algorithm assigned those weights. Is there any document describing the algorithm/procedure?
Regards,
Aris
The referred documents do not explain how NACE codes are applied to applications and how their weight is calculated.
Take, for example, the appln_id 437252193. The following query returns 9 NACE2 codes with different weights. Their weights sum to 1. It looks like an algorithm assigned those weights. Is there any document describing the algorithm/procedure?
Code: Select all
select * from tls229_appln_nace2 where appln_id = 437252193
Code: Select all
appln_id nace2_code weight
437252193 "20.1" 0.012931035
437252193 "26.2" 0.10344828
437252193 "26.3" 0.79310346
437252193 "26.4" 0.004310345
437252193 "26.5" 0.021551725
437252193 "26.51" 0.004310345
437252193 "26.52" 0.004310345
437252193 "26.7" 0.012931035
437252193 "28.23" 0.04310345
Aris
-
- Posts: 440
- Joined: Thu Feb 22, 2007 5:33 pm
- Contact:
Re: NACE2 codes weights calculation
The computation of the WEIGHT value is more complicated than the simple ratio used for a technology field because each field get a weight according to a number of relations to the application:
- An application (id="123") has "A01N 1/00", "A01N 1/02" and "A22B 1/00" IPC symbols.
- Those are related respectively to NACE2 code "20.2" (with weight "1"), "20.2" (with weight "1"), and "28.9" (with weight "1")
- The TLS230_APPLN_TECHN_FIELD table will store ("123", "20.2", "0,6666667") and ("123", "28.9", "0,3333333") rows, as NACE2 code "20.2" represents 2/3 of all weights ((1 +1) / (1 +1 +1)).
Because the concordance table is not strictly 4 characters based, a single IPC code could generate 2 rows.
e.g. symbol "A61K 8/00" in [tls209_appln_ipc] could be linked to "A61K" and "A61K 8". We decided that only the most detailed IPC code will be taken into account. In this case "A61K 8". (The link to A61K would then be discarded.)
This was according to us the best and most representative way to implement the methodology and according to my knowledge it has been accepted by the PATSTAT community. But nothing stops users from developing their own interpretation of the whitepaper. All the reference data is available in the PATSTAT tables.
- First, per application, its IPC symbols (TLS209_APPLN_IPC.IPC_CLASS_SYMBOL matching TLS902_IPC_NACE2.IPC) linked to a NACE2 code (TLS902_IPC_NACE2.NACE2_CODE) get the default weight defined in TLS902_IPC_NACE2.NACE2_WEIGHT (usually 1.0)
- Some weights are then set to null when associated on the same application to some other IPCs (see TLS902_IPC_NACE2.NOT_WITH_IPC), unless also associated with some other IPCs (see TLS902_IPC_NACE2.UNLESS_WITH_IPC).
- Finally, per application, this weight is converted to a ratio between the NACE2 code weight computed par NACE2 code and the sum of all weight per application in order to get a value between 0 and 1.0 proportional (the sum per application of all TLS229_APPLN_NACE2.WEIGHT must always equals 1.0) - as you rightly stated.-
- An application (id="123") has "A01N 1/00", "A01N 1/02" and "A22B 1/00" IPC symbols.
- Those are related respectively to NACE2 code "20.2" (with weight "1"), "20.2" (with weight "1"), and "28.9" (with weight "1")
- The TLS230_APPLN_TECHN_FIELD table will store ("123", "20.2", "0,6666667") and ("123", "28.9", "0,3333333") rows, as NACE2 code "20.2" represents 2/3 of all weights ((1 +1) / (1 +1 +1)).
Because the concordance table is not strictly 4 characters based, a single IPC code could generate 2 rows.
e.g. symbol "A61K 8/00" in [tls209_appln_ipc] could be linked to "A61K" and "A61K 8". We decided that only the most detailed IPC code will be taken into account. In this case "A61K 8". (The link to A61K would then be discarded.)
This was according to us the best and most representative way to implement the methodology and according to my knowledge it has been accepted by the PATSTAT community. But nothing stops users from developing their own interpretation of the whitepaper. All the reference data is available in the PATSTAT tables.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org
EPO - Vienna
patstat @ epo.org
-
- Posts: 4
- Joined: Mon Jul 25, 2022 10:21 am
Re: NACE2 codes weights calculation
Thank you very much for answering my question.
Best regards,
Aris
Best regards,
Aris