Problem with concatenated fields

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

stefana.vasilache
Posts: 3
Joined: Wed May 06, 2020 2:09 pm

Problem with concatenated fields

Post by stefana.vasilache » Fri May 08, 2020 4:32 pm

Hello,

I am trying to retrieve the technical sector and technical field of an application, between 2014-2020, with both inventor(s) and applicant(s) romanian and i want to concatenate the sector and the field to avoid multiple results for the same application.

The problem is that when i concatenate these 2 fields, i get a very long string with the fields concatenated that repeats over and over like this:

Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering, Other fields, Civil engineering


My question is how can i avoid creating such a string and get only: Other fields, Civil engineering.

The query i used:

Code: Select all

SELECT distinct appln_auth, appln_nr, appln_filing_date
	  , inventor.person_ctry_code Inventor_country, applicant.person_ctry_code Applicant_country, applicant.person_name Aplicant_name, string_agg(cast(concat('', tls209_appln_ipc.ipc_class_symbol) as NVARCHAR(MAX)), ', ') Field
FROM tls201_appln join tls207_pers_appln on tls201_appln.appln_id =tls207_pers_appln.appln_id
join tls209_appln_ipc on tls201_appln.appln_id = tls209_appln_ipc.appln_id
	join (SELECT distinct tls207_pers_appln.appln_id, tls206_person.person_id, person_name, person_ctry_code 
			FROM tls206_person join tls207_pers_appln 
			on tls206_person.person_id = tls207_pers_appln.person_id and applt_seq_nr > 0 ) applicant 
	on tls201_appln.appln_id = applicant.appln_id

	join (SELECT distinct tls207_pers_appln.appln_id, tls206_person.person_id, person_ctry_code 
			FROM tls206_person join tls207_pers_appln
			on tls206_person.person_id = tls207_pers_appln.person_id and invt_seq_nr > 0 ) inventor
	on tls201_appln.appln_id = inventor.appln_id
where tls201_appln.appln_filing_year >= '2014' and  tls201_appln.appln_filing_year <= '2020' and inventor.person_ctry_code like 'RO' and applicant.person_ctry_code like 'RO' and tls201_appln.granted like 'Y'
group by tls201_appln.appln_nr, tls201_appln.appln_auth, tls201_appln.appln_filing_date, inventor.person_ctry_code, applicant.person_ctry_code, applicant.person_name
order by tls201_appln.appln_filing_date desc


mkracker
Posts: 120
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Problem with concatenated fields

Post by mkracker » Mon May 11, 2020 11:07 am

Hi Stefana,

Interesting question, so I had a look and tried it myself. You will find my code below. My thoughts:
  • I assume you did not consider that the technical field table TLS901_TECHN_FIELD_IPC contains multiple entries for each technical field number. There is one record for each technical field number per IPC main group. This is one reason for the repetition you observed.
  • Additionally to the technical fields your IPC field values also repeat a lot because an application typically has more than 1 applicant / inventor. So I created auxiliary tables for the IPC and also for the technical field, where each row corresponds to a single application and holds all IPCs resp. technical fields for this application.
  • I suppose you are using PATSTAT Online. Because it has some technical limitations, also regarding nested queries, and to simplify the script, I got rid of most of the nested queries. It's still quite a complex script. Note that if you would work with your own, local database, you would rather create and store intermediary tables, which would make creation and testing of the script easier.
  • Note that you still may have more than 1 row per application, because you retrieve the Romanian applicants, and there may be more than 1 for n application.

Code: Select all

SELECT DISTINCT a.appln_id, appln_auth, appln_nr, appln_filing_date
	  , p_invt.person_ctry_code Inventor_country, p_applt.person_ctry_code Applicant_country, 
	  pa_applt.applt_seq_nr, p_applt.person_name Applicant_name
	  , Field, TF_NAMES

	FROM tls201_appln a

	-- "applicant" table
	join tls207_pers_appln pa_applt	on a.appln_id = pa_applt.appln_id and pa_applt.applt_seq_nr > 0
	join tls206_person p_applt		on pa_applt.person_id = p_applt.person_id 
	
	-- "Inventor" table
    	join tls207_pers_appln pa_invt	on a.appln_id = pa_invt	.appln_id and pa_invt.applt_seq_nr > 0
	join tls206_person p_invt		on pa_invt	.person_id = p_invt	.person_id


 	left join -- outer join because IPCs are missing in rare cases
	-- create an auxiliary table aux209 which contains only 1 row per application 
		(SELECT appln_id, string_agg( tls209_appln_ipc.ipc_class_symbol, ', ') as Field
		from tls209_appln_ipc
		group by appln_id) aux209
	on a.appln_id = aux209.appln_id

	 left join -- outer join because there are no technical fields if IPCs are missing in rare cases
	-- create an auxiliary table aux230 which contains only 1 row per application 
		(SELECT appln_id, string_agg( TF_aux, ' / ') as TF_NAMES
		from tls230_appln_techn_field atf 
		join 
			-- create an auxiliary table aux901 which only contains 1 row per technical field
			(SELECT DISTINCT techn_field_nr, techn_sector + ' - ' + techn_field AS TF_aux
			from tls901_techn_field_ipc) aux901
			on atf.techn_field_nr = aux901.techn_field_nr
		group by appln_id
		) aux230
	on a.appln_id = aux230.appln_id

where a.appln_filing_year between 2014 and 2020  
	-- Due to the 18 month delay till the first publication you will not find (m)any application in 2019 and 2020
and p_applt.person_ctry_code like 'RO'  
and p_invt.person_ctry_code like 'RO' 
and a.granted like 'Y'
group by a.appln_id, a.appln_nr, a.appln_auth, a.appln_filing_date, p_invt.person_ctry_code, 
	p_applt.person_ctry_code,pa_applt.applt_seq_nr, p_applt.person_name, Field, TF_NAMES

order by a.appln_filing_date desc, a.appln_id, a.appln_auth, a.appln_nr
-------------------------------------------
Martin Kracker / EPO


Post Reply