How to count inventors

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

How to count inventors

Post by knikonor » Mon Dec 16, 2013 10:49 am

Dear All,

can anyone point me to an idea on how to count a number of inventors for a given
year for a selected CPC or IPC class? I should distinguish between authors and inventors.

If I use online version of PATSTAT may I use count(distinct(tls201_appln.nb_inventors))
construction for each year? What construction may I use in offline edition?


mkracker
Posts: 120
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: How to count inventors

Post by mkracker » Wed Dec 18, 2013 5:27 pm

Have a look at the "SQL Self Study Course", downloadable from the PATSTAT online page on http://www.epo.org/searching/subscripti ... nline.html .
It describes the basics and some advanced features of SQL and contains loads of examples using PATSTAT.

Page 78 explains how to retrieve applicants and / or inventors.

Hope that helps,
Martin

Martin Kracker
EPO - PATSTAT
-------------------------------------------
Martin Kracker / EPO


knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

Re: How to count inventors

Post by knikonor » Fri Dec 20, 2013 6:13 pm

Hello Martin,

thanks for pointing me to that document. Actually I have a slightly different problem.
I need to extract a total number of inventors (maybe including applicants) for a given
year(s) regardles of who is first or second etc for a given application.

Am I correct that a maximum applt_seq_nr or invt_seq_nr represents a total number
of applicants and I have to somehow SUM it for each application in a year to get a total number?

Best regards,
Kirill.


nico.rasters
Posts: 140
Joined: Wed Jul 08, 2009 5:51 pm
Contact:

Re: How to count inventors

Post by nico.rasters » Fri Dec 20, 2013 8:59 pm

My answer is based on PATSTAT October 2011... but it's probably valid for all versions.

If you sum the MAX(`applt_seq_nr`) you are going to double count applicants. The same applies to `invt_seq_nr` for inventors.

For example, if we take DOCDB family with ID 10341 we see that it has multiple appln_ids. These are basically the same patents with the same applicants and inventors.
SELECT * FROM `TLS218_DOCDB_FAM` WHERE `DOCDB_FAMILY_ID` =10341;

SELECT * FROM `TLS207_PERS_APPLN` WHERE `APPLN_ID` IN ( 932134, 2963200, 12099883, 20364395, 21815643, 51358704 ) ORDER BY `TLS207_PERS_APPLN`.`PERSON_ID` ASC;

PERSON_ID, APPLN_ID, APPLT_SEQ_NR, INVT_SEQ_NR
15042239, 12099883, 1, 0
15042239, 51358704, 1, 0
15042258, 932134, 1, 0
15042258, 21815643, 1, 0
15042289, 20364395, 1, 0
15469883, 51358704, 0, 1
15469884, 12099883, 0, 1
26431679, 51358704, 0, 2
26431680, 12099883, 0, 2

See how `person_id` 15042239 is mentioned as an applicant for both `appln_id` 12099883 and 51358704? Also notice `person_id` 26431679 and 26431680. This is actually the same person, namely Dr. Alex Pedolin.

SELECT * FROM `TLS206_PERSON` WHERE `PERSON_ID` IN ( 26431679, 26431680 );
This query returns "PEDOLIN ALEX" and "PEDOLIN ALEX DR.".

In order to count inventors, I'd focus on the priority patents only (any patent that you can not find in `tls204_appln_prior`) and then go for your solution to sum the max sequence numbers.
________________________________________
Nico Doranov
Data Manager

Daigu Academic Services & Data Stewardship
http://www.daigu.nl/


knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

Re: How to count inventors

Post by knikonor » Sat Dec 21, 2013 6:15 pm

Dear Nico,

thanks you so much for the comments. As far as we need to count only published patents inventors,
should we use primary tls211_pat_publn table? We actually don't need applications in other stages.
So if I use the query like this - what is not correct with the data I'll get?
BTW it executes VERY long...

SELECT YEAR(pub.publn_date) as Year, COUNT(p.person_id) AS Person

FROM tls211_pat_publn pub

LEFT OUTER JOIN tls201_appln a ON pub.appln_id = a.appln_id
LEFT OUTER JOIN tls207_pers_appln pa ON a.appln_id = pa.appln_id
LEFT OUTER JOIN tls206_person p ON pa.person_id = p.person_id
LEFT OUTER JOIN tls224_appln_cpc cpc ON cpc.appln_id = pub.appln_id
LEFT OUTER JOIN tls209_appln_ipc ipc ON ipc.appln_id = pub.appln_id

WHERE (pa.applt_seq_nr > 0) AND (pa.invt_seq_nr > 0)

AND (a.ipr_type = 'PI')
AND (YEAR(pub.publn_date) BETWEEN 1900 AND 2012)
AND (cpc.cpc_class_symbol LIKE 'B82Y%') OR (ipc.ipc_class_symbol LIKE 'B82Y%')

GROUP BY YEAR(pub.publn_date)
ORDER BY YEAR(pub.publn_date) ASC

And returning to your previous post, as I am not an SQL expert - I have no idea how
to implement counting SUM(MAX(invt_seq_nr)) for each year...

Thanks in advance,
Kirill.


nico.rasters
Posts: 140
Joined: Wed Jul 08, 2009 5:51 pm
Contact:

Re: How to count inventors

Post by nico.rasters » Sat Dec 21, 2013 7:34 pm

That's quite a complex query, and you probably do not need the ORDER BY statement.
You seem to be focusing on nanotechnology patents. Any specific reason for the YEAR(pub.publn_date) BETWEEN 1900 AND 2012?

Anyway, what I would do in your case is first to create a table that holds the appln_ids of the nanotechnology patents you are interested in. That will be a SELECT DISTINCT `appln_id` query with ipr_type, cpc_class_symbol and ipc_class_symbol as criteria. Then you can build a table with publn_ids.

Btw, WHERE (pa.applt_seq_nr > 0) AND (pa.invt_seq_nr > 0) means the applicant is also an inventor.
Usually it's one or the other.

I'll have a better look tomorrow. You are using PATSTAT October 2013 and I have only just finished installing that version.
________________________________________
Nico Doranov
Data Manager

Daigu Academic Services & Data Stewardship
http://www.daigu.nl/


knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

Re: How to count inventors

Post by knikonor » Sat Dec 21, 2013 7:50 pm

Thanks Nico!

Currently I'm trying construction like:

SELECT YEAR(pub.publn_date) as Year, SUM(Inventors.Max) as Max_Inventors
FROM
(
SELECT MAX(tls207_pers_appln.invt_seq_nr) as Max
FROM tls207_pers_appln
) Inventors, tls201_appln a
...

WHERE a.appln_id NOT IN
(SELECT appln_id
FROM tls204_appln_prior)

AND (a.ipr_type = 'PI')
AND (YEAR(pub.publn_date) BETWEEN 2011 AND 2012)
AND (cpc.cpc_class_symbol LIKE 'B82Y%') -- OR (ipc.ipc_class_symbol LIKE 'B82Y%')

GROUP BY YEAR(pub.publn_date)
ORDER BY YEAR(pub.publn_date)

Grammaticaly it's ok but will I get data I want - is a question. Currently I'm using April 2013.
As for years - it's just a draft, actually I'm trying to find ALL inventors for B82Y and count them
by year.


nico.rasters
Posts: 140
Joined: Wed Jul 08, 2009 5:51 pm
Contact:

Re: How to count inventors

Post by nico.rasters » Sat Dec 21, 2013 8:11 pm

Start with:
INSERT INTO `tmp` (`appln_id`)
SELECT `appln_id` FROM `tls209_appln_ipc` WHERE `ipc_class_symbol` LIKE "B82Y%";
It helps if this field is indexed... even if it's just the four leftmost characters.

INSERT INTO `tmp` (`appln_id`)
SELECT `appln_id` FROM `tls224_appln_cpc` WHERE `cpc_class_symbol` LIKE "B82Y%";

INSERT INTO `nano` (`appln_id`)
SELECT DISTINCT `appln_id` FROM `tmp`;

You can truncate `tmp` now.

INSERT INTO `nano_priorities` (`appln_id`)
SELECT `appln_id` FROM `nano` WHERE `appln_id` NOT IN (SELECT `appln_id` FROM `tls204_appln_prior`);

More later... I don't have April 2013 installed yet either :roll:
________________________________________
Nico Doranov
Data Manager

Daigu Academic Services & Data Stewardship
http://www.daigu.nl/


nico.rasters
Posts: 140
Joined: Wed Jul 08, 2009 5:51 pm
Contact:

Re: How to count inventors

Post by nico.rasters » Sun Dec 22, 2013 9:52 pm

This is based on PATSTAT October 2011, but the concept applies to all versions.

First retrieve your nanotechnology patents.
INSERT INTO `tmp` (`appln_id`)
SELECT `appln_id` FROM `tls209_appln_ipc` WHERE `ipc_class_symbol` LIKE "B82Y%";
# 1981 rows inserted. ( Query took 3.4008 sec )
Feel free to add more appln_ids to `tmp` based on cpc_class_symbol or whatever.

I could have done a DISTINCT in the earlier query, but not if you run multiple queries (e.g. if you use cpc_class_symbol as well)
INSERT INTO `nano` (`appln_id`)
SELECT DISTINCT `appln_id` FROM `tmp`;
# 1675 rows inserted. ( Query took 0.0017 sec )

Now we take only the priority patents. You can complicate your life by also looking at continuations and so on, and the people at Leuven wrote a nice script for it... but for now this should be fine.
INSERT INTO `nano_priorities` (`appln_id`)
SELECT `appln_id` FROM `nano` WHERE `appln_id` NOT IN (SELECT `appln_id` FROM `tls204_appln_prior`);
# 549 rows inserted. ( Query took 1.0839 sec )

Just to show you that one application might have several publications linked to it:
SELECT * FROM `tls211_pat_publn` INNER JOIN `nano_priorities` ON `tls211_pat_publn`.`appln_id` = `nano_priorities`.`appln_id`;
# 731 total, Query took 0.4159 sec

This is just to get an overview of what the inventor table looks like.
SELECT *
FROM `tls207_pers_appln`
INNER JOIN `nano_priorities` ON `tls207_pers_appln`.`appln_id` = `nano_priorities`.`appln_id`
WHERE `invt_seq_nr` >0;
# 1,035 total, Query took 0.1419 sec

But we need the application year (not the publication year, that does not make much sense) so we have to bring in the tls201_appln table. This query shows all the data.
SELECT *
FROM `tls201_appln`
INNER JOIN `tls207_pers_appln` ON `tls201_appln`.`appln_id` = `tls207_pers_appln`.`appln_id`
INNER JOIN `nano_priorities` ON `tls201_appln`.`appln_id` = `nano_priorities`.`appln_id`
WHERE `invt_seq_nr`>0;
# 1,035 total, Query took 0.2424 sec

We want the total number of inventors per year. The SUM(MAX(... idea was nice, but we really shouldn't count the same inventor twice, therefore I propose a COUNT(DISTINCT(`person_id`)).
SELECT YEAR(`appln_filing_date`), COUNT(DISTINCT(`person_id`))
FROM `tls201_appln`
INNER JOIN `tls207_pers_appln` ON `tls201_appln`.`appln_id` = `tls207_pers_appln`.`appln_id`
INNER JOIN `nano_priorities` ON `tls201_appln`.`appln_id` = `nano_priorities`.`appln_id`
WHERE `invt_seq_nr`>0
GROUP BY YEAR(`appln_filing_date`);

My results:
2003,5
2004,26
2005,3
2006,4
2007,26
2008,415
2009,204
2010,145
2011,129

You should get higher numbers because PATSTAT October 2011 had an issue with nanotechnology: http://forums.epo.org/epo-worldwide-pat ... c2001.html
________________________________________
Nico Doranov
Data Manager

Daigu Academic Services & Data Stewardship
http://www.daigu.nl/


knikonor
Posts: 19
Joined: Fri Nov 01, 2013 1:50 pm

Re: How to count inventors

Post by knikonor » Thu Dec 26, 2013 6:13 pm

Hello Nico,

I do appreciate your help very much. It works for me by now.

Regards,
Kirill.


Post Reply