Displaying multiple inventors, applicants, IPC codes in same row (next to each other) rather than below each other

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

fabian.scheifele
Posts: 2
Joined: Wed Dec 15, 2021 11:11 am

Displaying multiple inventors, applicants, IPC codes in same row (next to each other) rather than below each other

Post by fabian.scheifele » Wed Jan 12, 2022 3:13 pm

Dear community,

I want to extract patents that pertain to a series of IPC codes across different application authority and I am interested in having the information about the nationality of the applicant(s) and inventor(s).
I would like to export the data in the format 1 patent application (i.e. unique application_id) = 1 row in the dataset.

At the moment, I receive multiple rows for patents that have more than one inventor, applicant or IPC class. Is it possible to have these additional inventors in an extra column in the same row (i.e. "next to each other rather than below each other")? This would make the dataprocessing easier.

For example I have a patent that has multiple inventors from Austria, Germany and Japan which creates 8 rows for the same patent application, but I would like to have 1 row with columns Applicant1, Applicant2, IPC1, IPC2 etc. I can also add the query as soon as I have but I am waiting for my collaborator to send it to me.
Regards
Fabian


EPO / PATSTAT Support
Posts: 424
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Displaying multiple inventors, applicants, IPC codes in same row (next to each other) rather than below each other

Post by EPO / PATSTAT Support » Thu Jan 13, 2022 7:09 pm

Hello Fabian,
your question is rather a pure SQL question then a PATSTAT question.
The PATSTAT data base is completely normalised, and what you want to do is basically a form of denormalization. This can be useful for reporting purposes (making more compact tables), but it makes it more difficult to further aggregate or visualise the data in for example graphs.
The main stumble block for denormalization is that you (probably) don't know how many applicants, inventors and IPC codes your patents will have. And that raises the question on how many columns you will need to "provide" for your individual applicants, inventors and ipc codes. There are 2 approaches: static or dynamic denormalization.
Static is easy to understand, dynamic can become very complex when using multiple tables. (It is often done via user defined functions that need to be created on the server.)
Static means that the number of columns is hard coded in the query.
In the solution below, I limited to 5 applicants, 5 inventors and 5 ipc codes (maximum). If there are more, they will be omitted, if less, it will show up as "blanks". You can of course expand the query by simply coying the "CASE" lines. The applicant and inventors have sequence numbers, so we can thankfully make use of that in the CASE statement. The IPC codes don't have sequence numbers, so therefore I made a small subroutine to give the IPC codes a ranking (alphabetically), and gave them a number. (You could adapt this subquery to taking only the first 4 our 8 digits of the IPC codes to limit further.)

Don't forget to change the "where tls201_appln.appln_id between 1 and 10000" clause to something that will select the data you want to use.

Code: Select all

SELECT tls201_appln.appln_auth+tls201_appln.appln_nr application, tls201_appln.appln_filing_date,
max(case when applt_seq_nr = 1 then '['+person_ctry_code +'] '+psn_name else '' end) applt_1,
max(case when applt_seq_nr = 2 then '['+person_ctry_code +'] '+psn_name else '' end) applt_2,
max(case when applt_seq_nr = 3 then '['+person_ctry_code +'] '+psn_name else '' end) applt_3,
max(case when applt_seq_nr = 4 then '['+person_ctry_code +'] '+psn_name else '' end) applt_4,
max(case when applt_seq_nr = 5 then '['+person_ctry_code +'] '+psn_name else '' end) applt_5,
max(case when invt_seq_nr = 1 then '['+person_ctry_code +'] '+psn_name else '' end) inv_1,
max(case when invt_seq_nr = 2 then '['+person_ctry_code +'] '+psn_name else '' end) inv_2,
max(case when invt_seq_nr = 3 then '['+person_ctry_code +'] '+psn_name else '' end) inv_3,
max(case when invt_seq_nr = 4 then '['+person_ctry_code +'] '+psn_name else '' end) inv_4,
max(case when invt_seq_nr = 5 then '['+person_ctry_code +'] '+psn_name else '' end) inv_5,
max(case when IPC_rank = 1 then temp.ipc_class_symbol  else '' end) ipc_1,
max(case when IPC_rank = 2 then temp.ipc_class_symbol  else '' end) ipc_2,
max(case when IPC_rank = 3 then temp.ipc_class_symbol  else '' end) ipc_3,
max(case when IPC_rank = 4 then temp.ipc_class_symbol  else '' end) ipc_4,
max(case when IPC_rank = 5 then temp.ipc_class_symbol  else '' end) ipc_5
FROM tls201_appln   join tls207_pers_appln on tls201_appln.appln_id = tls207_pers_appln.appln_id
  join tls206_person on tls207_pers_appln.person_id = tls206_person.person_id
  join
  -- below is the subroutine to rank and number the IPC codes
  (Select * 
  from (Select ST.appln_id
       , ST.ipc_class_symbol
       , ROW_NUMBER() OVER (PARTITION BY ST.appln_id
                    ORDER BY ST.ipc_class_symbol) AS IPC_rank
		From tls209_appln_ipc AS ST) ipc
   ) as temp on temp.appln_id = tls201_appln.appln_id

where tls201_appln.appln_id between 1 and 10000
group by tls201_appln.appln_auth+tls201_appln.appln_nr, tls201_appln.appln_filing_date, tls201_appln.appln_id
order by tls201_appln.appln_id
Dynamic normalisation will create the necessary columns depending on the data, but coding such approaches are explained in a number of specialised SQL blogs - and outside the scope of this forum...
Feel free to post other solutions.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply