Extracting all swiss priority applications from applicants without duplicates

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

kevinkuri
Posts: 3
Joined: Thu Feb 09, 2023 7:27 pm

Extracting all swiss priority applications from applicants without duplicates

Post by kevinkuri » Thu Mar 16, 2023 4:19 pm

Hello everybody

I'm currently working on my thesis in which I am supposed to build a Swiss Technology Monitor. I am fairly new to PATSTAT Online, using the PATSTAT Online Autum 2022 version.

I trying to extract all Swiss Priority applications over all years with an SQL query however I noticed i still have multiple rows for the same application.

Here is what I've got so far (I got the code from here and modified it help-with-sql-query-caunt-applications- ... try-5691 ):

Code: Select all

SELECT DISTINCT a.*, ac.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
join tls224_appln_cpc ac on a.appln_id = ac.appln_id
WHERE appln_auth = 'EP'
AND appln_kind = 'A'  -- exclude PCT filings where the EPO only served as the Receiving Office
AND a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year between 1800 and 2022
AND applt_seq_nr > 0  -- consider only applicants
AND person_ctry_code = 'CH'  -- applicant must be from CH
The intended output table should have the following structure:
For each appln_id only 1 cpc_class_code

How do I:
A) Identify which CPC Class Symbol should be selected from the multiple current ones I receive right now?
B) How can I modify the query to give me the intended result?

Thanks alot in advance,

Kevin


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Extracting all swiss priority applications from applicants without duplicates

Post by EPO / PATSTAT Support » Thu Mar 30, 2023 1:00 pm

Hello Kevin,
Most applications will have more then one CPC classification code; and there is no uniform good way to assign one unique CPC classification symbol to one application (or family). (One could also question whether this is a good thing to do, because all classification codes have their purpose.)
One way of narrowing down to 1 CPC code assigned to an application is to take the so called "First" CPC code based on the cpc_position attribute. (in the tls225_docdb_fam_cpc table). The problem is that some applications do not have a so called "first" code (330 applications in your case), some have no CPC code at all (seldom for EP applications) and some have multiple (different) first codes. But a substantial part has simply one 'F' CPC code, so those should not be an issue. The problem of applications having multiple "F" labelled classification codes can partly be mitigated by taking for example only the first 8 (or 4) digits which will further reduce the set of applications having more then 1 CPC value at group or subgroup level. You will then of course lose some granularity, but maybe 8 or even 4 digits are sufficient for your analysis.
Here is the code that takes the first 8 digits for CPC, and if there is none, then it takes the first 8 digits for IPC. The remaining NULL values are those applications that do not have an IPC or CPC 'F' flagged classification.

Code: Select all

SELECT distinct a.appln_id, 
(case when left(ac.cpc_class_symbol,8) is not null then  'CPC '+left(ac.cpc_class_symbol,8) 
when left(ac.cpc_class_symbol,8) is null then 'IPC '+left(ic.ipc_class_symbol,8) 
end)  classfication
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
left join tls225_docdb_fam_cpc ac on a.docdb_family_id = ac.docdb_Family_id and  cpc_position= 'F'
left join tls209_appln_ipc ic on a.appln_id = ic.appln_id and ipc_position = 'F'
WHERE appln_auth = 'EP'
AND appln_kind = 'A'  -- exclude PCT filings where the EPO only served as the Receiving Office
AND a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year between 1800 and 2022
AND applt_seq_nr > 0  -- consider only applicants
AND person_ctry_code = 'CH'  -- applicant must be from CH
order by a.appln_id
To make a list that covers all cases and assign exactly 1 CPC (or IPC) code would need much more scripting to have all possible cases covered. If you take the above query and take 4 digits for IPC and CPC codes, then you will reduce it with about 10.000 rows that refer to the same application. But you can see that there are still some there where the first 4 digits are different. (appln_id = 47, 848 , 1422 etc...)

Left to do: looking up the NULL values (at 4 digits there are only a handful left) and de-duplicating the applications that have multiple codes, using whatever decision making process you find suitable. One can easily clean the duplicates out in excel.
On a side note: your post title reads "Extracting all Swiss priority applications". Though your query has no links to CH priorities as such. The query selects patents filed at the EPO that have at least 1 Swiss applicant. If you want to be sure that those EP filings are not preceded by another earlier priority filing, then you have to add "and a.appln_id not in (select appln_id from tls204_appln_prior)" in the WHERE clause. This will then limit the set of patents to "first filings at the EPO by Swiss applicants".
If you want the EP application to have an earlier priority filing that was filed for example in Switzerland, then you could add "and a.appln_id in (select tls204_appln_prior.appln_id from tls204_appln_prior join tls201_appln on tls204_appln_prior.prior_appln_id = tls201_appln.appln_id where appln_auth = 'CH')" to the WHERE clause. You will need to clearly define what patents you want to be included (or excluded) from your data sample.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


kevinkuri
Posts: 3
Joined: Thu Feb 09, 2023 7:27 pm

Re: Extracting all swiss priority applications from applicants without duplicates

Post by kevinkuri » Mon Apr 03, 2023 6:11 pm

Dear Geert

Thank you so much for your reply. I have adjusted the character count from 8 to 4 as you suggested, added p.person_data as another attribute and will now pull the data! I will then clean the data in Excel and proceed with geocoding it and my further analysis.

Best regards,

Kevin


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Extracting all swiss priority applications from applicants without duplicates

Post by EPO / PATSTAT Support » Thu Apr 06, 2023 2:25 pm

Have a look at the NUTS codes in tls206_person and table tls904_nuts; it might save you some work.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


kevinkuri
Posts: 3
Joined: Thu Feb 09, 2023 7:27 pm

Re: Extracting all swiss priority applications from applicants without duplicates

Post by kevinkuri » Mon Apr 17, 2023 3:19 pm

Thank you, I will include the NUTS into my dataset, it will sure help me when it comes to regional analyses.


Post Reply