To solve the problem the T-SQL STRING_AGG function was created which takes all expressions from rows and concatenates them into a single string. However, a problem of duplicate data occurs when the query joins a number of tables that are not included in the STRING_AGG function. The STRING_AGG function allows to sort the data items, but does not have a "distinct" function to de-duplicate. To solve that issue, we bring the STRING_AGG function in a sub query which will create a de-duplicated data attribute. PATSTAT Online has a problem with ALIAS renaming of sub queries, so you need to address the attributes in the SELECT clause without the reference to table created by the sub query. (it's a little bug in the third-party parser).
The code: here is a simple query that gives the data on 6 EP applications according to a number of criteria. It results in 172 unique rows of data, giving all the applicants, inventors and CPC codes.
Code: Select all
SELECT distinct a.appln_id,a.appln_auth ,a.appln_nr, a.earliest_filing_date, a.nb_applicants,a.nb_inventors, a.docdb_family_id,
a.docdb_family_size, applt_seq_nr, invt_seq_nr,
psn_name, cpc_class_symbol
FROM tls201_appln a
join tls207_pers_appln pa on a.appln_id = pa.appln_id
join tls206_person p on pa.person_id = p.person_id
join tls224_appln_cpc CPC on a.appln_id = cpc.appln_id
WHERE (a.appln_id in (select appln_id from tls224_appln_cpc where left(cpc_class_symbol,8) = 'Y02T 10'))
and a.earliest_filing_date = '2003-11-19'
and a.appln_auth = 'EP'
ORDER BY a.appln_id
Code: Select all
SELECT DISTINCT a.appln_id,a.appln_auth ,a.appln_nr, a.earliest_filing_date, a.nb_applicants,a.nb_inventors, a.docdb_family_id,
a.docdb_family_size,
APPLICANT_GROUP,
INVENTOR_GROUP,
CPC_GROUP
FROM tls201_appln a
left join (select pa.appln_id ,STRING_AGG (cast((psn_name+' ['+person_ctry_code+']'+' ['+psn_sector+']' )as NVARCHAR(MAX)), '¦ ') APPLICANT_GROUP
FROM tls207_pers_appln pa join tls206_person pap on pa.person_id = pap.person_id where pa.applt_seq_nr > 0 group by pa.appln_id) applicants
ON a.appln_id = applicants.appln_id
left join (select pa.appln_id ,STRING_AGG (cast((psn_name+' ['+person_ctry_code+']'+' ['+psn_sector+']' )as NVARCHAR(MAX)), '¦ ') INVENTOR_GROUP
FROM tls207_pers_appln pa join tls206_person pap on pa.person_id = pap.person_id where pa.invt_seq_nr > 0 group by pa.appln_id) inventors
on a.appln_id = inventors.appln_id
join (select appln_id ,STRING_AGG (cast((cpc_class_symbol)as NVARCHAR(MAX)), '¦ ') WITHIN GROUP (ORDER BY cpc_class_symbol asc) CPC_GROUP
FROM tls224_appln_cpc group by appln_id) CPC
ON a.appln_id = CPC.appln_id
where (a.appln_id in (select appln_id from tls224_appln_cpc where left(cpc_class_symbol,8) = 'Y02T 10'))
and a.earliest_filing_date = '2003-11-19'
and a.appln_auth = 'EP'
ORDER BY a.appln_auth,a.earliest_filing_date