Grouping NACE2 Codes

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

nicole.crichton
Posts: 3
Joined: Tue Aug 16, 2022 3:28 pm

Grouping NACE2 Codes

Post by nicole.crichton » Tue Aug 16, 2022 4:56 pm

Hi, I am working on my masters dissertation and I am trying to extract patent data from PATSTAT by NACE2 codes. I have managed to do this successfully, but I would like to group patents with certain NACE2 codes together to reduce the number of rows in my results table. For example, I would like to group patents corresponding to NACE2 10, 11, and 12, so that they are one value in my table and NACE2 13, 14, and 15. Is it possible to do this on PATSTAT? And if so, how would I go about doing this?

Another issue I am having is that when I download the results table all the columns on PATSTAT become one column in Excel, separated by semi-colons, is there a way to rectify this issue?


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Grouping NACE2 Codes

Post by EPO / PATSTAT Support » Wed Aug 17, 2022 10:50 am

Can you post your SQL and give more details on how you want to "group".
The NACE data is based on an IPC concordance table; as a result, 1 patent application can have multiple (weighed) NACE codes. You can of course sum the weights to see how a patent portfolio would look, or even compare patent portfolios. But you will need to give more details.

On the excel: I assume you are using PATSTAT Online to extract data,... you can use excel functionality to specify that the semicolon indicates a new column in the data. But that is outside the scope of this forum (and can vary depending on your default EXCEL settings).
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


nicole.crichton
Posts: 3
Joined: Tue Aug 16, 2022 3:28 pm

Re: Grouping NACE2 Codes

Post by nicole.crichton » Wed Aug 17, 2022 12:35 pm

So this is my query:

Code: Select all

SELECT person_ctry_code, tls201_appln.earliest_filing_year, tls229_appln_nace2.nace2_code, COUNT(DISTINCT tls201_appln.inpadoc_family_id) TotalPatentFamilies
FROM tls201_appln JOIN tls207_pers_appln ON tls201_appln.appln_id = tls207_pers_appln.appln_id
JOIN tls206_person ON tls207_pers_appln.person_id = tls206_person.person_id
JOIN tls801_country ON tls206_person.person_ctry_code = tls801_country.ctry_code
JOIN tls229_appln_nace2 ON tls201_appln.appln_id = tls229_appln_nace2.appln_id
JOIN tls902_ipc_nace2 ON tls229_appln_nace2.nace2_code = tls902_ipc_nace2.nace2_code
WHERE person_ctry_code = 'BE' 
AND earliest_filing_year BETWEEN 2008 AND 2020
AND applt_seq_nr > 0 
AND ipr_type = 'PI'
GROUP BY person_ctry_code, tls201_appln.earliest_filing_year, tls229_appln_nace2.nace2_code
ORDER BY tls201_appln.earliest_filing_year, tls229_appln_nace2.nace2_code
So for the results I get a list of patent families corresponding to the nace2 codes. I would like to group nace2 codes at the 2 digit level, so for instance, any nace2 of 10, 11, and 12 have their patent families summed to one value in the table for each year. And, then nace2 of 13, 14, and 15 have their patent families summed into one value for each year as well. And so on so forth, so that instead of 1026 rows, I get about 96 rows instead.


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Grouping NACE2 Codes

Post by EPO / PATSTAT Support » Thu Sep 01, 2022 10:21 pm

Hello Nicole,
that can easily be done with a SQL CASE Expression. That allows users to create specific result columns based on the data of the withheld records created via your WHERE clause conditions and joined tables.

Have a look at this; you can then expand the logics to your own groups of NACE codes as you deem fit.

Code: Select all

SELECT person_ctry_code, tls201_appln.earliest_filing_year,
count(distinct( case when left(tls229_appln_nace2.nace2_code,2) in ('10','11','12') then tls201_appln.inpadoc_family_id end)) as 'nace_10_11_12_families',
count(distinct( case when left(tls229_appln_nace2.nace2_code,2) in ('13','14','15') then tls201_appln.inpadoc_family_id end)) as 'nace_13_14_15_families'
FROM tls201_appln JOIN tls207_pers_appln ON tls201_appln.appln_id = tls207_pers_appln.appln_id
JOIN tls206_person ON tls207_pers_appln.person_id = tls206_person.person_id
JOIN tls801_country ON tls206_person.person_ctry_code = tls801_country.ctry_code
JOIN tls229_appln_nace2 ON tls201_appln.appln_id = tls229_appln_nace2.appln_id
JOIN tls902_ipc_nace2 ON tls229_appln_nace2.nace2_code = tls902_ipc_nace2.nace2_code
WHERE person_ctry_code = 'BE' 
AND earliest_filing_year BETWEEN 2008 AND 2020
AND applt_seq_nr > 0 
AND ipr_type = 'PI'
GROUP BY person_ctry_code, tls201_appln.earliest_filing_year
ORDER BY tls201_appln.earliest_filing_year
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply