Help with SQL query - Multi-part identifier

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

Moos
Posts: 2
Joined: Wed Jul 13, 2016 10:17 am

Help with SQL query - Multi-part identifier

Post by Moos » Wed Jul 13, 2016 10:40 am

Dear community,

I'm working on a PATSTAT Online (Autumn 2015) patent query since several weeks. My aim is to download the entire patent portfolio without duplicats of specialized companys (in this case 'Tesla Motors').

My current query (see at the end) is bulit on seven sub-queries targeting my desired tsl-tables.

- Sub-query 1 is working fine:
(select *
from tls206_person e
where upper(e.person_name) like 'TESLA ENG%'
or upper(e.person_name) like 'TESLA MOTORS%'
or upper(e.person_name) like 'TESLA CO%'
or upper(e.person_name) like '<weitere konsolidierte Tochtergesellschaften>%')

-Sub-query 2 is working fine:
(select distinct a.person_name, d.appln_id
from
(select *
from tls206_person e
where upper(e.person_name) like 'TESLA ENG%'
or upper(e.person_name) like 'TESLA MOTORS%'
or upper(e.person_name) like 'TESLA CO%'
or upper(e.person_name) like '<weitere konsolidierte Tochtergesellschaften>%') a , tls207_pers_appln d
where a.person_id=d.person_id)

-Sub-query 3 - ERROR CODE
(select distinct a.*, b.docdb_family_id
from
(select distinct a.person_name, d.appln_id
from
(select *
from tls206_person e
where upper(e.person_name) like 'TESLA ENG%'
or upper(e.person_name) like 'TESLA MOTORS%'
or upper(e.person_name) like 'TESLA CO%'
or upper(e.person_name) like '<weitere konsolidierte Tochtergesellschaften>%') a , tls207_pers_appln d
where a.person_id=d.person_id) a, tls201_appln b
where a.appln_id=b.appln_id)
18:31:17 [SELECT - 0 row(s), 0 secs] [Error Code: 4104, SQL State: S0001] The multi-part identifier "d.appln_id" could not be bound.

Now, all following Sub-queries are blocked by errors!
I'm not quite sure what went wrong. However, I suspect the following:
- Maybe there is a problem with the formulation of the JOIN clauses (here with a simple WHERE)?
- MS SQL may have problems integrate the 'd.appln_id' column in the following SELECT commands?

For further questions I am always at your disposal.
Thank you for your're help!

---
Full query:

select distinct priority_year, person_name, docdb_family_id, substring(ipc_class_symbol, 1, 4) ipc_subclass
from
(select distinct a.*, b.publn_auth, min(extract(year from appln_filing_date)) over(partition by docdb_family_id) priority_year
from
(select distinct d.*, a.appln_filing_date, a.appln_auth, b.ipc_class_symbol
from tls201_appln a, tls209_appln_ipc b,
(select distinct a.*, b.appln_id appln_id_fam
FROM
(select distinct a.*, b.docdb_family_id
from
(select distinct a.person_name, d.appln_id
from
(select *
from tls206_person e
where upper(e.person_name) like 'TESLA ENG%'
or upper(e.person_name) like 'TESLA MOTORS%'
or upper(e.person_name) like 'TESLA CO%'
or upper(e.person_name) like '<weitere konsolidierte Tochtergesellschaften>%') a , tls207_pers_appln d
where a.person_id=d.person_id) a, tls201_appln b
where a.appln_id=b.appln_id) a, tls228_docdb_fam_citn b
where a.docdb_family_id=b.docdb_family_id
order by a.person_name, a.docdb_family_id, a.appln_id) d
where a.appln_id=b.appln_id and a.appln_id=d.appln_id_fam
and a.ipr_type='PI'
and extract(year from a.appln_filing_date)>= 2002
and extract(year from a.appln_filing_date)<=2014
order by person_name, docdb_family_id, ipc_class_symbol) a, tls211_pat_publn b
where a.appln_id_fam=b.appln_id
order by person_name, docdb_family_id, ipc_class_symbol)
order by priority_year, person_name, docdb_family_id, ipc_subclass


mkracker
Posts: 96
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Help with SQL query - Multi-part identifier

Post by mkracker » Wed Jul 20, 2016 8:12 am

Hi Moos,

Bad news: I cannot comprehend your query. In fact, I never saw a query with 7 nested(!) subqueries and I cannot imagine that this is ever necessary.
Good news: Your information need is quite typical and you can solve it in a much easier way.

Instead of analysing your query I propose one which probably comes close to what you need:

Code: Select all

select p.*, a.*   
from tls201_appln a
join tls207_pers_appln pa on a.appln_id = pa.appln_id
join tls206_person p on pa.person_id = p.person_id
where a.appln_id in 
	(select min(a2.appln_id)  -- apply here any logic to find a suitable family representative
	from tls201_appln a2
	join tls207_pers_appln pa2 on a2.appln_id = pa2.appln_id
	join tls206_person p2 on pa2.person_id = p2.person_id
	where appln_filing_year between 2002 and 2014
	and ipr_type = 'PI'
	and (  person_name like 'TESLA ENG%'  
		or person_name like 'TESLA MOTORS%'
		or person_name like 'TESLA CO%'
		or person_name like '<weitere konsolidierte Tochtergesellschaften>%')
	and applt_seq_nr > 0 -- for the sake of good practise: limit to applicants only
	group by docdb_family_id
	)
order by a.appln_id, person_name
There is also one subquery. It has the task to identify the applications you are interested in. Actually, it is only necessary because you want to have the "patent portfolio without duplicates", which I interpret as having only 1 application for each DOCDB family. I do not know which application would best represent a family, so I just take the one with the smallest APPLN_ID. (In practice, you probably would have a more complex rule or select the family representative intellectually offline after having downloaded the data).

The outer query has the task to retrieve all the information you want from the application which is the family representative.

Just a few notes:
  • Use indentation to structure the query. This makes it much more readable
  • String comparison in PATSTAT is by default case insensitive. So the costly UPPER function is not necessary.
  • You do not need the YEAR function, because the year of the application filing date is available as a separate attribute
  • For the sake of good order: with attributes APPLT_SEQ_NR / INVT_SEQ_NR you may restrict names to applicants / inventors
  • You have been using the 2015 Autumn data. The newer 2016 Spring database will give you some more hits
I hope I could help.
-------------------------------------------
Martin Kracker / EPO PATSTAT


Moos
Posts: 2
Joined: Wed Jul 13, 2016 10:17 am

Re: Help with SQL query - Multi-part identifier

Post by Moos » Thu Jul 21, 2016 2:42 pm

Dear Mr. Kracker,

Thank you very much for your help! I've learned a lot from your additional notes.

You have intepreted my oversized query correctly regarding to the "patent portfolio without duplicates" / having only one application for each DOCDB family. Furthermore, your query suggestion works fine (& really quick) for my analysis.

Now I'm wondering, if there is a possibility to insert another row per assigned Patent-IPC to capture all single IPCs per patent application without duplicats (in the result table)?

As a result following rows could be in the result table:
person_id
person_name
...
appln_id
appln_filing_date/year
...
appln_nr_epodoc --> IPC per single application?
...

Thank you and beste wishes!


mkracker
Posts: 96
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Help with SQL query - Multi-part identifier

Post by mkracker » Fri Jul 22, 2016 10:18 am

I assume that when you write "insert another row per assigned Patent-IPC " you mean that you want additional columns for each IPC symbol of an application. I can imagine that this is possible with SQL, but it would be quite tricky. Anyway,I would discourage you to do so, because that is not the way relational databases should be used.

What is the situation here - and its very typical: You have a 1:n relationship, i. e. 1 application (1 row in TLS201) may have multiple IPC assignments ("n" rows in table TLS209). This can be expressed in 2 ways:
  • a) Applications and IPCs are stored in 2 separate tables. No data is duplicated / is redundant. 1 row in the application table is related to n rows in the IPC table. The tables can be joined via the APPLN_ID attribute.
  • b) All the data is stored in 1 table, each row containing the data of 1 application and 1 IPC. Consequently, if there are multiple IPCs, the application data repeats. Also, if here is no IPC, there will be no row at all (unless you use OUTER JOINS).
How to apply this with PATSTAT Online:
  • ad approach a):
    Run a query which returns the APPLN_ID of the applications you are interested in. It is not relevant whether there are duplicates or not. You may but you do not need to return IPCs or any other data else. Actually, in the query I propose earlier, you may simple run the sub-query only, because it already returns the APPLN_ID.
    Then run in PATSTAT Online the Subset Download (menu Download > Prepare Download > PATSTAT subset- see also the PATSTAT Online user manual on the EPO web page). Select the tables you need, e.g. TLS201, TLS209, TLS206, .. Download the tables in the preferred format (MS Access; or csv for loading each table into Excel or your preferred data management or visualisation tool). Post process as needed.
    The Subset Download feature is a real powerful tool. It's worth having a closer look at it.
  • ad approach b)
    Create a query which returns a table which contains IPC data and (potentially repeating) application data. Go to the Result Window and download the result table (menu Download > Prepare Download > Result table). Again, load the data in you preferred tool and manipulate as needed.
So if you really want to add as many columns as you have IPCs (Think twice about it!), you can do it offline with your preferred tool.

Good luck,
Martin
-------------------------------------------
Martin Kracker / EPO PATSTAT


Post Reply