Searching for help concerning PATSTAT query

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

Student_Constance
Posts: 6
Joined: Thu May 04, 2017 5:20 pm

Searching for help concerning PATSTAT query

Post by Student_Constance » Wed Jun 07, 2017 5:16 pm

Hello everyone,

concerning my recent research project I need the following data:

the number of patents applied concerning a specific company in a specific year.

For example:
Company: Siemens
Year: 2013

Therefore I found this query which works great.

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND appln_kind = 'A'  -- exclude PCT filings where the EPO only served as the Receiving Office
AND a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
AND applt_seq_nr > 0  -- consider only applicants
AND person_ctry_code = 'DE'  -- applicant must be from Germany
AND psn_name like 'SIEMENS%' -- do not require a blank after SIEMENS, otherwise you would miss the plain name "SIEMENS"
Now I want to add a certain requirement to the query:

The system should display only those patents which were granted subsequently.

Could you help me adjusting the query code?

Thank you!

Kind regards
Student_Constance


mkracker
Posts: 121
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Searching for help concerning PATSTAT query

Post by mkracker » Thu Jun 08, 2017 6:54 am

Hi Constance,

Just add "and granted = 1" to the WHERE clause of your query.
Look up attribute GRANTED in the Data Catalog for more details.

Best regards,
Martin / EPO
-------------------------------------------
Martin Kracker / EPO


Student_Constance
Posts: 6
Joined: Thu May 04, 2017 5:20 pm

Re: Searching for help concerning PATSTAT query

Post by Student_Constance » Fri Jun 09, 2017 9:48 am

Thanks for your reply!

I have another question:

With the query above I only get the EP applications filed by the German company "Siemens".

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND appln_kind = 'A'  -- exclude PCT filings where the EPO only served as the Receiving Office
AND a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
AND applt_seq_nr > 0  -- consider only applicants
AND psn_name like 'SIEMENS%' -- do not require a blank after SIEMENS, otherwise you would miss the plain name "SIEMENS"
But: I'm searching for all patent applications worldwide filed by the company "Siemens". (not only EP applications but also PCT applications).

(I'm doing this search query for several companies e.g. Apple, Boeing, Bosch, Siemens etc. which are doing business globally. Therefore I'm searching for all patents worldwide in a specific year)

I've also removed the code line "AND person_ctry_code = 'DE' -- applicant must be from Germany" because I'm not interested in the nationality of the applicant.

Could you help me adjusting the query code?

Thank you!

Kind regards
Student_Constance


mkracker
Posts: 121
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Searching for help concerning PATSTAT query

Post by mkracker » Fri Jun 09, 2017 2:01 pm

To include worldwide regional / national filings, just remove " appln_auth = 'EP' " in your query.
To also include international filings (= PCT filings at WIPO), remove " appln_kind = 'A' ".

To better understand what a query does and to use SQL effectively, I suggest to read the SQL self-study course "Using PATSTAT with SQL for beginners", downloadable from the bottom of page http://www.epo.org/searching-for-patent ... .html#tab1 or something similar.

You are filtering by name. Even nicely harmonised names like PSN_NAME are not perfect and might return some unexpected results. So I recommend that you also retrieve and check the names themselves ("SELECT *" instead "SELECT a.*"). Sorting the result by name (see ORDER BY clause) is easy and might help you to spot outliners.

Good luck,
Martin / EPO
-------------------------------------------
Martin Kracker / EPO


Student_Constance
Posts: 6
Joined: Thu May 04, 2017 5:20 pm

Re: Searching for help concerning PATSTAT query

Post by Student_Constance » Fri Jun 09, 2017 7:48 pm

Thanks for your help.

I adjusted the query to the following code:

Code: Select all

SELECT DISTINCT *
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
AND applt_seq_nr > 0  -- consider only applicants
AND psn_name like 'Siemens%' -- do not require a blank after SIEMENS, otherwise you would miss the plain name "SIEMENS"
AND granted=1
Order by psn_name
Like you mentioned in your last post, I get some unexpected results.
(e.g. Siemens, SIEMENS SHANGHAI MEDICAL EQUIPMENT, SIEMENS SCHWEIZ etc.)
Now I should be able to spot outliners and delete them.

I have two more questions:

First: Why do I get different results using this two search requests?

First request:

Code: Select all

SELECT DISTINCT *
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
AND applt_seq_nr > 0  -- consider only applicants
AND psn_name like 'Siemens%' -- do not require a blank after SIEMENS, otherwise you would miss the plain name "SIEMENS"
AND granted=1
Order by psn_name
Result: 2279

Second request:

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year = 2013
AND applt_seq_nr > 0  -- consider only applicants
AND psn_name like 'Siemens%' -- do not require a blank after SIEMENS, otherwise you would miss the plain name "SIEMENS"
AND granted=1
Result: 2222

According to the first request I select all columns and with the second request I select all columns of table "tls201_appln".
Why is there a difference in the two results when I only search for "Siemens" in the column psn_name?

Second: What happens when there is more than one applicant per application?
Are these patents counted twice?


I think after these questions everything should be clarified.
Thanks for you help! The support of the EPO is awesome.


mkracker
Posts: 121
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Searching for help concerning PATSTAT query

Post by mkracker » Tue Jun 13, 2017 2:15 pm

Hi,

When you do a JOIN, then you get 1 record for each combination of joined tables which fulfil the JOIN conditions (e. g. "ON a.appln_id = pa.appln_id"). As an example: You are joining applications and persons (e. g. applicants) and a specific application has 5 persons associated, you will retrieve 5 records.

There exists some applications which do have more than 1 applicant called "SIEMENS%", so these applications will be returned multiple times. E. g. application with APPLN_ID = 415193109 has even 3 such applicants:
- Siemens Aktiengesellschaft (DE)
- Siemens Corporation (US)
- Siemens Medical Solutions USA, Inc. (US)

Your first query uses "DISTINCT * ". * stands for all attributes of all tables. So you will retrieve all combinations of these tables. In the example above, you will get all 3 SIEMENS applicants, all with the same application data (from table TLS201_APPLN).
Due to the different applicants, all rows will be different, so the DISTINCT clause has no effect.

Your second query uses "DISTINCT a.* ". So you will retrieve only the application data from table TLS201_APPLN, which you call by the alias "a". For the example above, you would normally retrieve 3 records - 1 for each combination application & applicant. But because of the DISTINCT clause, the 3 rows for application with APPLN_ID = 415193109 will be reduced to a single record. So query 2 retrieves less records than query 1.

Martin / PATSTAT EPO
-------------------------------------------
Martin Kracker / EPO


Student_Constance
Posts: 6
Joined: Thu May 04, 2017 5:20 pm

Re: Searching for help concerning PATSTAT query

Post by Student_Constance » Sat Jun 24, 2017 12:06 am

Hi,

I have another question concerning my search query.

First search query:

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year between 1998 and 2015
AND applt_seq_nr > 0  -- consider only applicants
AND psn_name like 'ABB %' -- do not require a blank after SIEMENS, otherwise you would miss the plain name "SIEMENS"
Results: 10 291

Second search query:

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year between 1998 and 2015
AND applt_seq_nr > 0  -- consider only applicants
AND psn_name like 'ASEA BROWN BOVERI JUMET%' -- do not require a blank after SIEMENS, otherwise you would miss the plain name "SIEMENS"
Results: 2

Now i want to combine both codes into one search query...

I tried this one:

Code: Select all

SELECT DISTINCT a.*
FROM tls201_appln a
JOIN tls207_pers_appln pa on a.appln_id = pa.appln_id
JOIN tls206_person p on pa.person_id = p.person_id
WHERE appln_auth = 'EP'
AND a.appln_id < 900000000   -- exclude artificial applications (see PATSTAT Data Catalog for details)
AND appln_filing_year between 1998 and 2015
AND applt_seq_nr > 0  -- consider only applicants
AND psn_name like 'ABB %' -- do not require a blank after SIEMENS, otherwise you would miss the plain name "SIEMENS"
OR psn_name like 'ASEA BROWN BOVERI JUMET%'
Results: 10320

If you take both results from above and add them, you do not get the new result.
(10 291 + 2 = 10293 != 10320)

How can I combine both codes to get the correct number of patent applications?

Thanks for you help. Great support.


mkracker
Posts: 121
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Searching for help concerning PATSTAT query

Post by mkracker » Sat Jun 24, 2017 8:40 am

Hi,

in your second query of our previous post you are mixing ANDs and ORs. You can do that, but you have to be careful on how SQL interprets this (see https://docs.microsoft.com/en-us/sql/t- ... ansact-sql.

Ist much safer to explicitely control the order of evaluation by using brackets.
So better use :
AND (psn_name like 'ABB %'
OR psn_name like 'ASEA BROWN BOVERI JUMET%')
which returns 10 292 records.

Still, 10 291 + 2 != 10 292.
This is explained because there is one application (APPLN_ID = 15617493) where both companies are applicants.

Martin
-------------------------------------------
Martin Kracker / EPO


Post Reply