PATSTAT Online: use of STRING_AGG(...) and aliases problem

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

PATSTAT Online: use of STRING_AGG(...) and aliases problem

Post by EPO / PATSTAT Support » Tue Jul 14, 2020 1:11 pm

Many PATSTAT Online users ask how to create "lists" that "group" data which is split over a number of different tables. PATSTAT is a relational data base, and -as you might expect- most of the data is what we call "normalised" in order to reduce data redundancy and improve data integrity. A disadvantage is that when users "JOIN" tables, the result list will be a list of "normalised data items", but from a point of "human readability" it feels unnecessary to have the data split into many rows, when our brain can perfectly understand groups of data.
To solve the problem the T-SQL STRING_AGG function was created which takes all expressions from rows and concatenates them into a single string. However, a problem of duplicate data occurs when the query joins a number of tables that are not included in the STRING_AGG function. The STRING_AGG function allows to sort the data items, but does not have a "distinct" function to de-duplicate. To solve that issue, we bring the STRING_AGG function in a sub query which will create a de-duplicated data attribute. PATSTAT Online has a problem with ALIAS renaming of sub queries, so you need to address the attributes in the SELECT clause without the reference to table created by the sub query. (it's a little bug in the third-party parser).
The code: here is a simple query that gives the data on 6 EP applications according to a number of criteria. It results in 172 unique rows of data, giving all the applicants, inventors and CPC codes.

Code: Select all

SELECT distinct a.appln_id,a.appln_auth ,a.appln_nr, a.earliest_filing_date, a.nb_applicants,a.nb_inventors, a.docdb_family_id, 
a.docdb_family_size, applt_seq_nr, invt_seq_nr,
psn_name, cpc_class_symbol
FROM tls201_appln a
join tls207_pers_appln pa on a.appln_id = pa.appln_id
join tls206_person p on pa.person_id = p.person_id 
join tls224_appln_cpc CPC on a.appln_id = cpc.appln_id
WHERE (a.appln_id in (select appln_id from tls224_appln_cpc where left(cpc_class_symbol,8) = 'Y02T  10'))
and  a.earliest_filing_date  = '2003-11-19'
and a.appln_auth = 'EP'
ORDER BY a.appln_id
And this is the query when the same data is "grouped", joining the applicants, inventors and CPC codes in 3 seperate columns - resulting in 6 rows of data.

Code: Select all

SELECT DISTINCT a.appln_id,a.appln_auth ,a.appln_nr, a.earliest_filing_date, a.nb_applicants,a.nb_inventors, a.docdb_family_id, 
a.docdb_family_size, 
APPLICANT_GROUP,
INVENTOR_GROUP,
CPC_GROUP
FROM tls201_appln a
left join (select pa.appln_id ,STRING_AGG (cast((psn_name+' ['+person_ctry_code+']'+' ['+psn_sector+']' )as NVARCHAR(MAX)), '¦ ') APPLICANT_GROUP
FROM  tls207_pers_appln pa join tls206_person pap on pa.person_id = pap.person_id where pa.applt_seq_nr > 0  group by pa.appln_id) applicants
ON a.appln_id = applicants.appln_id
left join (select pa.appln_id ,STRING_AGG (cast((psn_name+' ['+person_ctry_code+']'+' ['+psn_sector+']' )as NVARCHAR(MAX)), '¦ ') INVENTOR_GROUP
FROM  tls207_pers_appln pa join tls206_person pap on pa.person_id = pap.person_id where pa.invt_seq_nr > 0 group by pa.appln_id) inventors
on a.appln_id = inventors.appln_id
join (select appln_id ,STRING_AGG (cast((cpc_class_symbol)as NVARCHAR(MAX)), '¦ ') WITHIN GROUP (ORDER BY cpc_class_symbol asc) CPC_GROUP
FROM  tls224_appln_cpc group by appln_id) CPC
ON a.appln_id = CPC.appln_id
where (a.appln_id in (select appln_id from tls224_appln_cpc where left(cpc_class_symbol,8) = 'Y02T  10'))
and  a.earliest_filing_date  = '2003-11-19'
and a.appln_auth = 'EP'
ORDER BY a.appln_auth,a.earliest_filing_date
(User having a local PATSTAT Global installation with WRITE access will rather use temporary tables or views instead of the above sub-query approach.)
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply