Query Help for UK Patent Data

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

LucaGI1998
Posts: 3
Joined: Thu Mar 26, 2020 11:41 pm

Query Help for UK Patent Data

Post by LucaGI1998 » Fri Mar 27, 2020 1:55 am

Dear PATSTAT community,

I am currently doing some thesis work on UK firm innovation. Since I am new to SQL and PATSTAT, I have been reading through the DATA catalogue to try and make sure I get the data which I need but I'm not entirely sure about a few things. Here is what I'm trying to find:

1) I would like to get all Patents filed by UK based companies in 2008 from all Patenting offices. I realise not all applications will be filed under 1) a company name and 2) there may be multiple versions of said "name"
-Hopefully I would get over this using a harmonised name when I conduct my own analysis (although I may not capture all what I want, which is fine).
2) In order to avoid duplication, e.g. a UK firm patents product 'x' with JPO for japan and identical product 'x' with USPTO for the United States, I want to make sure that an invention patented in multiple jurisdictions is only counted once. It doesn't matter from what patenting office.
3) The respective IPC(s) of all the Patents filed
-I am aware that a single application may have multiple IPC classifications but I am not too worried about this at the moment.

I'd like to order my data as follows, if I can:
-Harmonised company name
-'z' number of patents per harmonised company
-respective IPC(s) of each individual patent filed by harmonised company

From what I understand, this will give me the total number of patents filled by a harmonised company, taking into account patent families and counting them as '1' invention, and the totals of the individual IPCs for all patents of the harmonised company.

This IPCs total is crucial because it'll help me determine the classification weights of harmonised company patents (i.e. company 'x' files 'z' patents, which gives 1 B27D IPCS, 2 A24B IPCS and 4 C12C IPC, tells me that majority of classifications is in Manufacture of Beverages) to figure out what industry I should classify the "harmonised company" as.

I greatly appreciate the help in writing the appropriate query code.

Thanks a lot :) :)


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Query Help for UK Patent Data

Post by EPO / PATSTAT Support » Mon Mar 30, 2020 10:28 pm

Hello Luca,
your request is not so easy to answer because it means looking at a data sample from different angles: applicants, harmonised names, family picture (authorities where family members were filed), filing date, ipc_codes. Trying to aggregate this all into some kind of 1 single table is rather impossible because there are plenty of variations across the various date items. As you rightly pointed out: company names can differ across family members but also: applications can have multiple applicants, family members can be filed in 2008, but just as well in 2007 or 2009 (or even further away), applications can have different IPC pictures, etc...

The easiest is to start extracting a data set according to your basic requirements: filed in 2008, GB applicants (and you specified explicitly companies.) You could use the query below, and then extract a subset. (without family members or forward/backward citations).

Code: Select all

SELECT psn_name, person_name, person_ctry_code,appln_nr_epodoc, psn_name, tls201_appln.appln_id, docdb_family_id , STRING_AGG ((ipc_class_symbol), ', ') ipc
FROM tls201_appln
JOIN tls207_pers_appln on tls201_appln.appln_id = tls207_pers_appln.appln_id
JOIN tls206_person on tls206_person.person_id = tls207_pers_appln.person_id
left join tls209_appln_ipc on tls201_appln.appln_id = tls209_appln_ipc.appln_id
WHERE tls201_appln.APPLN_FILING_YEAR = 2008
AND TLS206_PERSON.PERSON_CTRY_CODE='GB'
and applt_seq_nr >0 
and invt_seq_nr = 0 --to exclude as much as possible applicant-inventors; natural persons 
and ipr_type = 'PI'
group by psn_name, person_name ,person_ctry_code,appln_nr_epodoc, psn_name, tls201_appln.appln_id, docdb_family_id
order by docdb_family_id, tls201_appln.appln_id
This query will give you exactly 26.361 records - PATSTAT 2019b-
Not applications ! There are applications that have >1 applicant.
This data set can then be use to do your "counting".
If you want to count "inventions", you will need to count distinct docdb family ids's. Because 1 family can have multiple applications with different IPC pictures, you might have to aggregate your IPC-codes at family level - for example by simply de-duplicating the IPC-codes. Some researchers might apply weighing as well, but as you already limit your data set to applications filed in 2008, some family members will anyway fall outside the scope so personally I don't think it makes sense to go as far as counting weights. (-but it's your research- )
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


LucaGI1998
Posts: 3
Joined: Thu Mar 26, 2020 11:41 pm

Re: Query Help for UK Patent Data

Post by LucaGI1998 » Thu Apr 02, 2020 3:01 pm

Thanks a lot for the reply, it was very helpful.

A quick follow up, I now want to extend my year range between 2006-2011, and flesh my subset out a little. The tables generated from your very helpful code that you provided gave docbd family IDs, which can be used to "count" inventions. Now with this extended time range, how could I make sure I don't double count inventions which have been applied for in different jurisdictions?
If you count docdb_family id's then you count inventions. Extending the range can be done by changing the where clause in to tls201_appln.APPLN_FILING_YEAR between 2006 and 2011
Is there a way somehow I find out patent family size (the total number of jurisdictions in which each invention is patented) which would solve this issue?
in the tls201 table you will find the attribute [docdb_family_size] that does exactly that
Some added information on how I should "count" these inventions would be greatly appreciated. Perhaps, is there a way with the docbd family IDs I'll generate, I can order my data by docbd family ID, the jurisdictions/patent offices each one of these applications was filed and application date to alleviate these problems? How would I go about doing this?
If you need to see in which countries patent family members have been filed, then you can simply look at the patent authority. - keep in mind that a filing at the EPO can result in a patent covering multiple EPO member states.
Second, as you mentioned "1 family can have multiple applications with different IPC pictures" and that I might have to aggregate my IPC-codes at family level "by simply de-duplicating the IPC-codes". What does this mean exactly, how would I do this.
It is fairly simply, but this kind of data aggregations becomes too complex in PATSTAT Online. Making a data base extraction will not work because the data set will become too big.
You also mentioned some researchers might apply weightings. Is this something I can do in PATSTAT? I would still like to include the table with the number of patent classes a patent is classified into (the IPC table).
Yes, that is perfectly possible, but you will need a locally installed PATSTAT data base; it requires the use of intermediate tables.
Lastly, to try and distinguish patent quality, I have a few indicators which I want to try and calculate with information from PATSTAT. I want to try and calculate an originality index, measuring how diverse a patent’s backward citations are and a generality index, measuring of how diverse a patent’s forward citations are. Therefore, I would need to include information on any backward and forward citations on these patents filled in this 2006-2011 year range. How would I go about doing this? Can these tables be added to the already existing query you gave or the new one which I'll try to generate.
This becomes too much for PATSTAT Online, your initial set of all applications filed between 2006 and 2011 having a GB applicant will already return 150.000 patent applications - the maximum number to create an extraction in PATSTAT Online is 100.000

Code: Select all

SELECT distinct tls201_appln.appln_id
FROM tls201_appln
JOIN tls207_pers_appln on tls201_appln.appln_id = tls207_pers_appln.appln_id
JOIN tls206_person on tls206_person.person_id = tls207_pers_appln.person_id
left join tls209_appln_ipc on tls201_appln.appln_id = tls209_appln_ipc.appln_id
WHERE tls201_appln.APPLN_FILING_YEAR between 2006 and 2011
AND TLS206_PERSON.PERSON_CTRY_CODE='GB'
and applt_seq_nr >0 
and invt_seq_nr = 0 --to exclude as much as possible applicant-inventors; natural persons 
and ipr_type = 'PI'
I just wanted to add, I have a list of 6,786 firms which I am really specifically looking at in my analysis. The above query that I'll generate will give the information of ALL GB firms' patents. Would it therefore be easier to conduct this research using "harmonised names" from the 6,786 firms I have instead of finding information of ALL GB firms' patents?
It will be rather impossible to write a query in PATSTAT online to match 7000 companies. These are the kind of projects where you need PATSTAT on a local server.
Can they be inputed into PATSTAT whereby if there are matches, I'd give tables of the above criterium I'm searching for? How would I go about doing this.
You need a full PATSTAT data base for that. Your first hurdle would be to find best possible matches between your set of 7000 companies and the names of the patent applicants. Then you have to make a data model of what you need for your research, and then you need to check how the PATSTAT data should be aggregated to fit the model you have in mind. I have pointed out a couple of difficulties -multiple filing dates per family, multiple IPC pictures per family, ...- It can all be dealt with,but this is not something that can be done in one single query for 7000 companies.
Again, I greatly appreciate all the help in getting this query right

Thanks a lot :) :)

Luca


EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

Re: Query Help for UK Patent Data

Post by EPO / PATSTAT Support » Mon Apr 06, 2020 7:28 pm

Hello Luca,
kindly keep in mind that we do not have the resources to support individual researchers with drafting SQL queries for their research project. Many of your questions have been dealt with in previous forum posts, or are rather "SQL" syntax related.
I took the freedom to answer a couple of your answers in-line your post and for the rest I refer to the many publications by other researchers that have been using PATSTAT.
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


LucaGI1998
Posts: 3
Joined: Thu Mar 26, 2020 11:41 pm

Re: Query Help for UK Patent Data

Post by LucaGI1998 » Tue Apr 07, 2020 3:12 pm

Thank you for your response to my questions!

As I said in my first post, I am new to PATSTAT and so whilst I am everyday trying to familiarise myself with it, I felt it appropriate to use the help forums, like many others have, to get started on the right foot. I am by no means asking anyone to do my thesis for me, the individual work I am putting in extends far past the tips I have asked for.

I simply asked for further clarification on advice which you had generously offered. I have been looking at other forum posts and felt, given the style of the back and forth of other threads, this was appropriate to do so (given I am also a customer of the PATSTAT dataset). I will continue to look at other threads to help me - as you pointed out many of my questions had been answered - but I don't really understand what I have done wrong here.

If I am not using the function of the forum in the correct way, I would prefer if you messaged me privately next time instead of telling my on a public thread. I hope I am still allowed to ask for help on forums if I need it in the future.


Post Reply