The computation of the WEIGHT value is more complicated than the simple ratio used for a technology field because each field get a weight according to a number of relations to the application:
- First, per application, its IPC symbols (TLS209_APPLN_IPC.IPC_CLASS_SYMBOL matching TLS902_IPC_NACE2.IPC) linked to a NACE2 code (TLS902_IPC_NACE2.NACE2_CODE) get the default weight defined in TLS902_IPC_NACE2.NACE2_WEIGHT (usually 1.0)
- Some weights are then set to null when associated on the same application to some other IPCs (see TLS902_IPC_NACE2.NOT_WITH_IPC), unless also associated with some other IPCs (see TLS902_IPC_NACE2.UNLESS_WITH_IPC).
- Finally, per application, this weight is converted to a ratio between the NACE2 code weight computed par NACE2 code and the sum of all weight per application in order to get a value between 0 and 1.0 proportional (the sum per application of all TLS229_APPLN_NACE2.WEIGHT must always equals 1.0) - as you rightly stated.-
Sample dummy case:
- An application (id="123") has "A01N 1/00", "A01N 1/02" and "A22B 1/00" IPC symbols.
- Those are related respectively to NACE2 code "20.2" (with weight "1"), "20.2" (with weight "1"), and "28.9" (with weight "1")
- The TLS230_APPLN_TECHN_FIELD table will store ("123", "20.2", "0,6666667") and ("123", "28.9", "0,3333333") rows, as NACE2 code "20.2" represents 2/3 of all weights ((1 +1) / (1 +1 +1)).
Because the concordance table is not strictly 4 characters based, a single IPC code could generate 2 rows.
e.g. symbol "A61K 8/00" in [tls209_appln_ipc] could be linked to "A61K" and "A61K 8". We decided that only the most detailed IPC code will be taken into account. In this case "A61K 8". (The link to A61K would then be discarded.)
This was according to us the best and most representative way to implement the methodology and according to my knowledge it has been accepted by the PATSTAT community. But nothing stops users from developing their own interpretation of the whitepaper. All the reference data is available in the PATSTAT tables.