How to count inventors
How to count inventors
Dear All,
can anyone point me to an idea on how to count a number of inventors for a given
year for a selected CPC or IPC class? I should distinguish between authors and inventors.
If I use online version of PATSTAT may I use count(distinct(tls201_appln.nb_inventors))
construction for each year? What construction may I use in offline edition?
can anyone point me to an idea on how to count a number of inventors for a given
year for a selected CPC or IPC class? I should distinguish between authors and inventors.
If I use online version of PATSTAT may I use count(distinct(tls201_appln.nb_inventors))
construction for each year? What construction may I use in offline edition?
Re: How to count inventors
Have a look at the "SQL Self Study Course", downloadable from the PATSTAT online page on http://www.epo.org/searching/subscripti ... nline.html .
It describes the basics and some advanced features of SQL and contains loads of examples using PATSTAT.
Page 78 explains how to retrieve applicants and / or inventors.
Hope that helps,
Martin
Martin Kracker
EPO - PATSTAT
It describes the basics and some advanced features of SQL and contains loads of examples using PATSTAT.
Page 78 explains how to retrieve applicants and / or inventors.
Hope that helps,
Martin
Martin Kracker
EPO - PATSTAT
-------------------------------------------
Martin Kracker / EPO
Martin Kracker / EPO
Re: How to count inventors
Hello Martin,
thanks for pointing me to that document. Actually I have a slightly different problem.
I need to extract a total number of inventors (maybe including applicants) for a given
year(s) regardles of who is first or second etc for a given application.
Am I correct that a maximum applt_seq_nr or invt_seq_nr represents a total number
of applicants and I have to somehow SUM it for each application in a year to get a total number?
Best regards,
Kirill.
thanks for pointing me to that document. Actually I have a slightly different problem.
I need to extract a total number of inventors (maybe including applicants) for a given
year(s) regardles of who is first or second etc for a given application.
Am I correct that a maximum applt_seq_nr or invt_seq_nr represents a total number
of applicants and I have to somehow SUM it for each application in a year to get a total number?
Best regards,
Kirill.
-
- Posts: 140
- Joined: Wed Jul 08, 2009 5:51 pm
- Contact:
Re: How to count inventors
My answer is based on PATSTAT October 2011... but it's probably valid for all versions.
If you sum the MAX(`applt_seq_nr`) you are going to double count applicants. The same applies to `invt_seq_nr` for inventors.
For example, if we take DOCDB family with ID 10341 we see that it has multiple appln_ids. These are basically the same patents with the same applicants and inventors.
SELECT * FROM `TLS218_DOCDB_FAM` WHERE `DOCDB_FAMILY_ID` =10341;
SELECT * FROM `TLS207_PERS_APPLN` WHERE `APPLN_ID` IN ( 932134, 2963200, 12099883, 20364395, 21815643, 51358704 ) ORDER BY `TLS207_PERS_APPLN`.`PERSON_ID` ASC;
PERSON_ID, APPLN_ID, APPLT_SEQ_NR, INVT_SEQ_NR
15042239, 12099883, 1, 0
15042239, 51358704, 1, 0
15042258, 932134, 1, 0
15042258, 21815643, 1, 0
15042289, 20364395, 1, 0
15469883, 51358704, 0, 1
15469884, 12099883, 0, 1
26431679, 51358704, 0, 2
26431680, 12099883, 0, 2
See how `person_id` 15042239 is mentioned as an applicant for both `appln_id` 12099883 and 51358704? Also notice `person_id` 26431679 and 26431680. This is actually the same person, namely Dr. Alex Pedolin.
SELECT * FROM `TLS206_PERSON` WHERE `PERSON_ID` IN ( 26431679, 26431680 );
This query returns "PEDOLIN ALEX" and "PEDOLIN ALEX DR.".
In order to count inventors, I'd focus on the priority patents only (any patent that you can not find in `tls204_appln_prior`) and then go for your solution to sum the max sequence numbers.
If you sum the MAX(`applt_seq_nr`) you are going to double count applicants. The same applies to `invt_seq_nr` for inventors.
For example, if we take DOCDB family with ID 10341 we see that it has multiple appln_ids. These are basically the same patents with the same applicants and inventors.
SELECT * FROM `TLS218_DOCDB_FAM` WHERE `DOCDB_FAMILY_ID` =10341;
SELECT * FROM `TLS207_PERS_APPLN` WHERE `APPLN_ID` IN ( 932134, 2963200, 12099883, 20364395, 21815643, 51358704 ) ORDER BY `TLS207_PERS_APPLN`.`PERSON_ID` ASC;
PERSON_ID, APPLN_ID, APPLT_SEQ_NR, INVT_SEQ_NR
15042239, 12099883, 1, 0
15042239, 51358704, 1, 0
15042258, 932134, 1, 0
15042258, 21815643, 1, 0
15042289, 20364395, 1, 0
15469883, 51358704, 0, 1
15469884, 12099883, 0, 1
26431679, 51358704, 0, 2
26431680, 12099883, 0, 2
See how `person_id` 15042239 is mentioned as an applicant for both `appln_id` 12099883 and 51358704? Also notice `person_id` 26431679 and 26431680. This is actually the same person, namely Dr. Alex Pedolin.
SELECT * FROM `TLS206_PERSON` WHERE `PERSON_ID` IN ( 26431679, 26431680 );
This query returns "PEDOLIN ALEX" and "PEDOLIN ALEX DR.".
In order to count inventors, I'd focus on the priority patents only (any patent that you can not find in `tls204_appln_prior`) and then go for your solution to sum the max sequence numbers.
________________________________________
Nico Doranov
Data Manager
Daigu Academic Services & Data Stewardship
http://www.daigu.nl/
Nico Doranov
Data Manager
Daigu Academic Services & Data Stewardship
http://www.daigu.nl/
Re: How to count inventors
Dear Nico,
thanks you so much for the comments. As far as we need to count only published patents inventors,
should we use primary tls211_pat_publn table? We actually don't need applications in other stages.
So if I use the query like this - what is not correct with the data I'll get?
BTW it executes VERY long...
SELECT YEAR(pub.publn_date) as Year, COUNT(p.person_id) AS Person
FROM tls211_pat_publn pub
LEFT OUTER JOIN tls201_appln a ON pub.appln_id = a.appln_id
LEFT OUTER JOIN tls207_pers_appln pa ON a.appln_id = pa.appln_id
LEFT OUTER JOIN tls206_person p ON pa.person_id = p.person_id
LEFT OUTER JOIN tls224_appln_cpc cpc ON cpc.appln_id = pub.appln_id
LEFT OUTER JOIN tls209_appln_ipc ipc ON ipc.appln_id = pub.appln_id
WHERE (pa.applt_seq_nr > 0) AND (pa.invt_seq_nr > 0)
AND (a.ipr_type = 'PI')
AND (YEAR(pub.publn_date) BETWEEN 1900 AND 2012)
AND (cpc.cpc_class_symbol LIKE 'B82Y%') OR (ipc.ipc_class_symbol LIKE 'B82Y%')
GROUP BY YEAR(pub.publn_date)
ORDER BY YEAR(pub.publn_date) ASC
And returning to your previous post, as I am not an SQL expert - I have no idea how
to implement counting SUM(MAX(invt_seq_nr)) for each year...
Thanks in advance,
Kirill.
thanks you so much for the comments. As far as we need to count only published patents inventors,
should we use primary tls211_pat_publn table? We actually don't need applications in other stages.
So if I use the query like this - what is not correct with the data I'll get?
BTW it executes VERY long...
SELECT YEAR(pub.publn_date) as Year, COUNT(p.person_id) AS Person
FROM tls211_pat_publn pub
LEFT OUTER JOIN tls201_appln a ON pub.appln_id = a.appln_id
LEFT OUTER JOIN tls207_pers_appln pa ON a.appln_id = pa.appln_id
LEFT OUTER JOIN tls206_person p ON pa.person_id = p.person_id
LEFT OUTER JOIN tls224_appln_cpc cpc ON cpc.appln_id = pub.appln_id
LEFT OUTER JOIN tls209_appln_ipc ipc ON ipc.appln_id = pub.appln_id
WHERE (pa.applt_seq_nr > 0) AND (pa.invt_seq_nr > 0)
AND (a.ipr_type = 'PI')
AND (YEAR(pub.publn_date) BETWEEN 1900 AND 2012)
AND (cpc.cpc_class_symbol LIKE 'B82Y%') OR (ipc.ipc_class_symbol LIKE 'B82Y%')
GROUP BY YEAR(pub.publn_date)
ORDER BY YEAR(pub.publn_date) ASC
And returning to your previous post, as I am not an SQL expert - I have no idea how
to implement counting SUM(MAX(invt_seq_nr)) for each year...
Thanks in advance,
Kirill.
-
- Posts: 140
- Joined: Wed Jul 08, 2009 5:51 pm
- Contact:
Re: How to count inventors
That's quite a complex query, and you probably do not need the ORDER BY statement.
You seem to be focusing on nanotechnology patents. Any specific reason for the YEAR(pub.publn_date) BETWEEN 1900 AND 2012?
Anyway, what I would do in your case is first to create a table that holds the appln_ids of the nanotechnology patents you are interested in. That will be a SELECT DISTINCT `appln_id` query with ipr_type, cpc_class_symbol and ipc_class_symbol as criteria. Then you can build a table with publn_ids.
Btw, WHERE (pa.applt_seq_nr > 0) AND (pa.invt_seq_nr > 0) means the applicant is also an inventor.
Usually it's one or the other.
I'll have a better look tomorrow. You are using PATSTAT October 2013 and I have only just finished installing that version.
You seem to be focusing on nanotechnology patents. Any specific reason for the YEAR(pub.publn_date) BETWEEN 1900 AND 2012?
Anyway, what I would do in your case is first to create a table that holds the appln_ids of the nanotechnology patents you are interested in. That will be a SELECT DISTINCT `appln_id` query with ipr_type, cpc_class_symbol and ipc_class_symbol as criteria. Then you can build a table with publn_ids.
Btw, WHERE (pa.applt_seq_nr > 0) AND (pa.invt_seq_nr > 0) means the applicant is also an inventor.
Usually it's one or the other.
I'll have a better look tomorrow. You are using PATSTAT October 2013 and I have only just finished installing that version.
________________________________________
Nico Doranov
Data Manager
Daigu Academic Services & Data Stewardship
http://www.daigu.nl/
Nico Doranov
Data Manager
Daigu Academic Services & Data Stewardship
http://www.daigu.nl/
Re: How to count inventors
Thanks Nico!
Currently I'm trying construction like:
SELECT YEAR(pub.publn_date) as Year, SUM(Inventors.Max) as Max_Inventors
FROM
(
SELECT MAX(tls207_pers_appln.invt_seq_nr) as Max
FROM tls207_pers_appln
) Inventors, tls201_appln a
...
WHERE a.appln_id NOT IN
(SELECT appln_id
FROM tls204_appln_prior)
AND (a.ipr_type = 'PI')
AND (YEAR(pub.publn_date) BETWEEN 2011 AND 2012)
AND (cpc.cpc_class_symbol LIKE 'B82Y%') -- OR (ipc.ipc_class_symbol LIKE 'B82Y%')
GROUP BY YEAR(pub.publn_date)
ORDER BY YEAR(pub.publn_date)
Grammaticaly it's ok but will I get data I want - is a question. Currently I'm using April 2013.
As for years - it's just a draft, actually I'm trying to find ALL inventors for B82Y and count them
by year.
Currently I'm trying construction like:
SELECT YEAR(pub.publn_date) as Year, SUM(Inventors.Max) as Max_Inventors
FROM
(
SELECT MAX(tls207_pers_appln.invt_seq_nr) as Max
FROM tls207_pers_appln
) Inventors, tls201_appln a
...
WHERE a.appln_id NOT IN
(SELECT appln_id
FROM tls204_appln_prior)
AND (a.ipr_type = 'PI')
AND (YEAR(pub.publn_date) BETWEEN 2011 AND 2012)
AND (cpc.cpc_class_symbol LIKE 'B82Y%') -- OR (ipc.ipc_class_symbol LIKE 'B82Y%')
GROUP BY YEAR(pub.publn_date)
ORDER BY YEAR(pub.publn_date)
Grammaticaly it's ok but will I get data I want - is a question. Currently I'm using April 2013.
As for years - it's just a draft, actually I'm trying to find ALL inventors for B82Y and count them
by year.
-
- Posts: 140
- Joined: Wed Jul 08, 2009 5:51 pm
- Contact:
Re: How to count inventors
Start with:
INSERT INTO `tmp` (`appln_id`)
SELECT `appln_id` FROM `tls209_appln_ipc` WHERE `ipc_class_symbol` LIKE "B82Y%";
It helps if this field is indexed... even if it's just the four leftmost characters.
INSERT INTO `tmp` (`appln_id`)
SELECT `appln_id` FROM `tls224_appln_cpc` WHERE `cpc_class_symbol` LIKE "B82Y%";
INSERT INTO `nano` (`appln_id`)
SELECT DISTINCT `appln_id` FROM `tmp`;
You can truncate `tmp` now.
INSERT INTO `nano_priorities` (`appln_id`)
SELECT `appln_id` FROM `nano` WHERE `appln_id` NOT IN (SELECT `appln_id` FROM `tls204_appln_prior`);
More later... I don't have April 2013 installed yet either
INSERT INTO `tmp` (`appln_id`)
SELECT `appln_id` FROM `tls209_appln_ipc` WHERE `ipc_class_symbol` LIKE "B82Y%";
It helps if this field is indexed... even if it's just the four leftmost characters.
INSERT INTO `tmp` (`appln_id`)
SELECT `appln_id` FROM `tls224_appln_cpc` WHERE `cpc_class_symbol` LIKE "B82Y%";
INSERT INTO `nano` (`appln_id`)
SELECT DISTINCT `appln_id` FROM `tmp`;
You can truncate `tmp` now.
INSERT INTO `nano_priorities` (`appln_id`)
SELECT `appln_id` FROM `nano` WHERE `appln_id` NOT IN (SELECT `appln_id` FROM `tls204_appln_prior`);
More later... I don't have April 2013 installed yet either
________________________________________
Nico Doranov
Data Manager
Daigu Academic Services & Data Stewardship
http://www.daigu.nl/
Nico Doranov
Data Manager
Daigu Academic Services & Data Stewardship
http://www.daigu.nl/
-
- Posts: 140
- Joined: Wed Jul 08, 2009 5:51 pm
- Contact:
Re: How to count inventors
This is based on PATSTAT October 2011, but the concept applies to all versions.
First retrieve your nanotechnology patents.
INSERT INTO `tmp` (`appln_id`)
SELECT `appln_id` FROM `tls209_appln_ipc` WHERE `ipc_class_symbol` LIKE "B82Y%";
# 1981 rows inserted. ( Query took 3.4008 sec )
Feel free to add more appln_ids to `tmp` based on cpc_class_symbol or whatever.
I could have done a DISTINCT in the earlier query, but not if you run multiple queries (e.g. if you use cpc_class_symbol as well)
INSERT INTO `nano` (`appln_id`)
SELECT DISTINCT `appln_id` FROM `tmp`;
# 1675 rows inserted. ( Query took 0.0017 sec )
Now we take only the priority patents. You can complicate your life by also looking at continuations and so on, and the people at Leuven wrote a nice script for it... but for now this should be fine.
INSERT INTO `nano_priorities` (`appln_id`)
SELECT `appln_id` FROM `nano` WHERE `appln_id` NOT IN (SELECT `appln_id` FROM `tls204_appln_prior`);
# 549 rows inserted. ( Query took 1.0839 sec )
Just to show you that one application might have several publications linked to it:
SELECT * FROM `tls211_pat_publn` INNER JOIN `nano_priorities` ON `tls211_pat_publn`.`appln_id` = `nano_priorities`.`appln_id`;
# 731 total, Query took 0.4159 sec
This is just to get an overview of what the inventor table looks like.
SELECT *
FROM `tls207_pers_appln`
INNER JOIN `nano_priorities` ON `tls207_pers_appln`.`appln_id` = `nano_priorities`.`appln_id`
WHERE `invt_seq_nr` >0;
# 1,035 total, Query took 0.1419 sec
But we need the application year (not the publication year, that does not make much sense) so we have to bring in the tls201_appln table. This query shows all the data.
SELECT *
FROM `tls201_appln`
INNER JOIN `tls207_pers_appln` ON `tls201_appln`.`appln_id` = `tls207_pers_appln`.`appln_id`
INNER JOIN `nano_priorities` ON `tls201_appln`.`appln_id` = `nano_priorities`.`appln_id`
WHERE `invt_seq_nr`>0;
# 1,035 total, Query took 0.2424 sec
We want the total number of inventors per year. The SUM(MAX(... idea was nice, but we really shouldn't count the same inventor twice, therefore I propose a COUNT(DISTINCT(`person_id`)).
SELECT YEAR(`appln_filing_date`), COUNT(DISTINCT(`person_id`))
FROM `tls201_appln`
INNER JOIN `tls207_pers_appln` ON `tls201_appln`.`appln_id` = `tls207_pers_appln`.`appln_id`
INNER JOIN `nano_priorities` ON `tls201_appln`.`appln_id` = `nano_priorities`.`appln_id`
WHERE `invt_seq_nr`>0
GROUP BY YEAR(`appln_filing_date`);
My results:
2003,5
2004,26
2005,3
2006,4
2007,26
2008,415
2009,204
2010,145
2011,129
You should get higher numbers because PATSTAT October 2011 had an issue with nanotechnology: http://forums.epo.org/epo-worldwide-pat ... c2001.html
First retrieve your nanotechnology patents.
INSERT INTO `tmp` (`appln_id`)
SELECT `appln_id` FROM `tls209_appln_ipc` WHERE `ipc_class_symbol` LIKE "B82Y%";
# 1981 rows inserted. ( Query took 3.4008 sec )
Feel free to add more appln_ids to `tmp` based on cpc_class_symbol or whatever.
I could have done a DISTINCT in the earlier query, but not if you run multiple queries (e.g. if you use cpc_class_symbol as well)
INSERT INTO `nano` (`appln_id`)
SELECT DISTINCT `appln_id` FROM `tmp`;
# 1675 rows inserted. ( Query took 0.0017 sec )
Now we take only the priority patents. You can complicate your life by also looking at continuations and so on, and the people at Leuven wrote a nice script for it... but for now this should be fine.
INSERT INTO `nano_priorities` (`appln_id`)
SELECT `appln_id` FROM `nano` WHERE `appln_id` NOT IN (SELECT `appln_id` FROM `tls204_appln_prior`);
# 549 rows inserted. ( Query took 1.0839 sec )
Just to show you that one application might have several publications linked to it:
SELECT * FROM `tls211_pat_publn` INNER JOIN `nano_priorities` ON `tls211_pat_publn`.`appln_id` = `nano_priorities`.`appln_id`;
# 731 total, Query took 0.4159 sec
This is just to get an overview of what the inventor table looks like.
SELECT *
FROM `tls207_pers_appln`
INNER JOIN `nano_priorities` ON `tls207_pers_appln`.`appln_id` = `nano_priorities`.`appln_id`
WHERE `invt_seq_nr` >0;
# 1,035 total, Query took 0.1419 sec
But we need the application year (not the publication year, that does not make much sense) so we have to bring in the tls201_appln table. This query shows all the data.
SELECT *
FROM `tls201_appln`
INNER JOIN `tls207_pers_appln` ON `tls201_appln`.`appln_id` = `tls207_pers_appln`.`appln_id`
INNER JOIN `nano_priorities` ON `tls201_appln`.`appln_id` = `nano_priorities`.`appln_id`
WHERE `invt_seq_nr`>0;
# 1,035 total, Query took 0.2424 sec
We want the total number of inventors per year. The SUM(MAX(... idea was nice, but we really shouldn't count the same inventor twice, therefore I propose a COUNT(DISTINCT(`person_id`)).
SELECT YEAR(`appln_filing_date`), COUNT(DISTINCT(`person_id`))
FROM `tls201_appln`
INNER JOIN `tls207_pers_appln` ON `tls201_appln`.`appln_id` = `tls207_pers_appln`.`appln_id`
INNER JOIN `nano_priorities` ON `tls201_appln`.`appln_id` = `nano_priorities`.`appln_id`
WHERE `invt_seq_nr`>0
GROUP BY YEAR(`appln_filing_date`);
My results:
2003,5
2004,26
2005,3
2006,4
2007,26
2008,415
2009,204
2010,145
2011,129
You should get higher numbers because PATSTAT October 2011 had an issue with nanotechnology: http://forums.epo.org/epo-worldwide-pat ... c2001.html
________________________________________
Nico Doranov
Data Manager
Daigu Academic Services & Data Stewardship
http://www.daigu.nl/
Nico Doranov
Data Manager
Daigu Academic Services & Data Stewardship
http://www.daigu.nl/
Re: How to count inventors
Hello Nico,
I do appreciate your help very much. It works for me by now.
Regards,
Kirill.
I do appreciate your help very much. It works for me by now.
Regards,
Kirill.