Thanks a lot for the reply, it was very helpful.
A quick follow up, I now want to extend my year range between 2006-2011, and flesh my subset out a little. The tables generated from your very helpful code that you provided gave docbd family IDs, which can be used to "count" inventions. Now with this extended time range, how could I make sure I don't double count inventions which have been applied for in different jurisdictions?
If you count docdb_family id's then you count inventions. Extending the range can be done by changing the where clause in to tls201_appln.APPLN_FILING_YEAR between 2006 and 2011
Is there a way somehow I find out patent family size (the total number of jurisdictions in which each invention is patented) which would solve this issue?
in the tls201 table you will find the attribute [docdb_family_size] that does exactly that
Some added information on how I should "count" these inventions would be greatly appreciated. Perhaps, is there a way with the docbd family IDs I'll generate, I can order my data by docbd family ID, the jurisdictions/patent offices each one of these applications was filed and application date to alleviate these problems? How would I go about doing this?
If you need to see in which countries patent family members have been filed, then you can simply look at the patent authority. - keep in mind that a filing at the EPO can result in a patent covering multiple EPO member states.
Second, as you mentioned "1 family can have multiple applications with different IPC pictures" and that I might have to aggregate my IPC-codes at family level "by simply de-duplicating the IPC-codes". What does this mean exactly, how would I do this.
It is fairly simply, but this kind of data aggregations becomes too complex in PATSTAT Online. Making a data base extraction will not work because the data set will become too big.
You also mentioned some researchers might apply weightings. Is this something I can do in PATSTAT? I would still like to include the table with the number of patent classes a patent is classified into (the IPC table).
Yes, that is perfectly possible, but you will need a locally installed PATSTAT data base; it requires the use of intermediate tables.
Lastly, to try and distinguish patent quality, I have a few indicators which I want to try and calculate with information from PATSTAT. I want to try and calculate an originality index, measuring how diverse a patent’s backward citations are and a generality index, measuring of how diverse a patent’s forward citations are. Therefore, I would need to include information on any backward and forward citations on these patents filled in this 2006-2011 year range. How would I go about doing this? Can these tables be added to the already existing query you gave or the new one which I'll try to generate.
This becomes too much for PATSTAT Online, your initial set of all applications filed between 2006 and 2011 having a GB applicant will already return 150.000 patent applications - the maximum number to create an extraction in PATSTAT Online is 100.000
Code: Select all
SELECT distinct tls201_appln.appln_id
FROM tls201_appln
JOIN tls207_pers_appln on tls201_appln.appln_id = tls207_pers_appln.appln_id
JOIN tls206_person on tls206_person.person_id = tls207_pers_appln.person_id
left join tls209_appln_ipc on tls201_appln.appln_id = tls209_appln_ipc.appln_id
WHERE tls201_appln.APPLN_FILING_YEAR between 2006 and 2011
AND TLS206_PERSON.PERSON_CTRY_CODE='GB'
and applt_seq_nr >0
and invt_seq_nr = 0 --to exclude as much as possible applicant-inventors; natural persons
and ipr_type = 'PI'
I just wanted to add, I have a list of 6,786 firms which I am really specifically looking at in my analysis. The above query that I'll generate will give the information of ALL GB firms' patents. Would it therefore be easier to conduct this research using "harmonised names" from the 6,786 firms I have instead of finding information of ALL GB firms' patents?
It will be rather impossible to write a query in PATSTAT online to match 7000 companies. These are the kind of projects where you need PATSTAT on a local server.
Can they be inputed into PATSTAT whereby if there are matches, I'd give tables of the above criterium I'm searching for? How would I go about doing this.
You need a full PATSTAT data base for that. Your first hurdle would be to find best possible matches between your set of 7000 companies and the names of the patent applicants. Then you have to make a data model of what you need for your research, and then you need to check how the PATSTAT data should be aggregated to fit the model you have in mind. I have pointed out a couple of difficulties -multiple filing dates per family, multiple IPC pictures per family, ...- It can all be dealt with,but this is not something that can be done in one single query for 7000 companies.
Again, I greatly appreciate all the help in getting this query right
Thanks a lot
Luca