Page 1 of 1

Inventor Country of Origin

Posted: Sat Mar 21, 2015 4:16 pm
by flow
Hello everyone,

I'm working on my master thesis which is about internationalisation of R&D in software companies.
For this reason, I would like to collect some patent data about some numbers of companies.

More specifically, I need to:

1) Collect the number of patents application by company where the IPC code is G06F

This is the query I'm using (in this example the company is Microsoft):

SELECT *
FROM tls206_person i
JOIN tls207_pers_appln a ON i.person_id = a.person_id
JOIN tls201_appln g ON a.appln_id = g.appln_id
JOIN tls209_appln_ipc f ON a.appln_id =f.appln_id
WHERE ( ipc_subclass_symbol ='G06F' AND i.person_name like 'microsoft%')

Is it correct?


2) get information about the inventor(s) country in order to compare it with the applicant country .
Is there a way to do that or at least an attribute that specifically identifies the inventor country ?

Thank you very much for your help!

Re: Inventor Country of Origin

Posted: Fri Mar 27, 2015 7:12 pm
by nico.rasters
I'd first try to identify "Microsoft". In tls206 you'll find plenty of Microsofts, also because the inventors put c/o Microsoft in their name. So this is when people look a shortcut in the form of a standardized name.
There are several options: tls208, OECD HAN, EEE-PPAT (from KU Leuven)
For example the query SELECT * FROM `tls208_doc_std_nms` WHERE `doc_std_name` LIKE "Microsoft%"; gives you 78 results in PATSTAT October 2014.
In the end it's up to you which way to go. You are probably not looking at Microsoft subsidiaries (shortcut 1), so you might as well take the standardized way approach (shortcut 2). Just keep in mind that it does not have to be the mother company that does the patenting. Letting some vague subsidiary do the application could be part of the strategy.

Assuming you now have a set of patents that belong to "Microsoft" (store these in a new table), you can narrow it down to the G06F patents (store these in a new table as well). Then for these Microsoft-G06F patents you can retrieve the list of inventors. And that's when you'll discover that the country information isn't all that. I'll save you a headache: see http://gder.phpnet.org/rassenfosse/data.html
They use all sorts of tricks to fill in the inventor's country. Uhm, you are talking about country of origin though. That sounds more like RESIDENCE_CTRY_CODE than PERSON_CTRY_CODE (which is just the country where someone works). And even then it's not the country of origin, but just where you currently live. Best to forget about the "of origin" part.

Not much of a walkthrough, but I wouldn't want to take the challenge out of your Master thesis ;)
If you have more questions, don't hesitate to ask though.

Re: Inventor Country of Origin

Posted: Fri Mar 27, 2015 8:57 pm
by flow
I just forgot to say that I'm using Patstat Online so I'm not able to create tables nor to use the csv from Rassenfosse :( .

Anyway, thanks a lot for the explanation about the standardized names.
By the way, the "of origin" part was exactly about the inventor country (PERSON_CTRY_CODE) so now I only need to figure out how to implement your suggestions on Patstat Online.

Thank you very much for your help!

Re: Inventor Country of Origin

Posted: Sat Mar 28, 2015 11:35 pm
by nico.rasters
My usual approach (=setting up intermediate tables) will unfortunately not work in PATSTAT Online.
Lucky you though, in PATSTAT Online the tls206 table has been enriched with EEE-PPAT and OECD HAN standard names. Take your pick.
To narrow down on applicant names, you could use tls207 `applt_seq_nr`>0... but there are some patents that list the inventor as the applicant as well, so instead of using it as a search criteria just add it to your SELECT statement.
Keep in mind that while the availability of inventor information on average might not be so good, certain patents have a higher quality than others. So for example a European filing might have all the information, while some subsequent filing in the middle of nowhere will have no information. And this brings us to the concept of patent families. The same invention, but filed in various countries. Basically only one of the patents in the whole family needs to have detailed information. And if there is no information at all? Then you could simply assume that the inventor did not list his country because it is the same as the country of the applicant.

Let's assume that we've decided that Microsoft Corporation is doc_std_name_id 2292. There are actually more IDs that match Microsoft. Let's take 4537916 as well. You probably need to use a subquery. Running the query on tls209 -without distinct- gave me 117,929 results btw. Even after using distinct there are still far too many appln_ids to use WHERE `appln_id` IN (1, 2, 3, 4, ...). See below for an example.

SELECT SQL_NO_CACHE *
FROM `tls207_pers_appln`
INNER JOIN `tls206_person` ON `tls207_pers_appln`.`person_id` = `tls206_person`.`person_id`
WHERE `tls207_pers_appln`.`appln_id` IN
(SELECT DISTINCT `tls209_appln_ipc`.`appln_id`
FROM `tls209_appln_ipc`
INNER JOIN `tls207_pers_appln` ON `tls209_appln_ipc`.`appln_id` = `tls207_pers_appln`.`appln_id`
INNER JOIN `tls206_person` ON `tls207_pers_appln`.`person_id` = `tls206_person`.`person_id`
WHERE `doc_std_name_id` IN ( 2292, 4537916 ) AND `ipc_class_symbol` LIKE "G06F%");

The first 10 rows already look funny. Six inventors and they're all the same? It's interesting to note that person_id 263 is blank, except for doc_std_name_id 2041689. And that's a blank name. Shouldn't be in the database if you ask me. That's an error in their algorithm to distinguish persons. They should create a new person_id for each fully blank person, not assume that all blanks are equal.

Code: Select all

+-----------+-----------+--------------+-------------+-----------+------------------+-----------------+-----------------------+-------------------------------------+
| person_id | appln_id  | applt_seq_nr | invt_seq_nr | person_id | person_ctry_code | doc_std_name_id | person_name           | person_address                      |
+-----------+-----------+--------------+-------------+-----------+------------------+-----------------+-----------------------+-------------------------------------+
|       263 |    304883 |            0 |           1 |       263 |                  |         2041689 |                       |                                     |
|       263 |    304883 |            0 |           2 |       263 |                  |         2041689 |                       |                                     |
|       263 |    304883 |            0 |           3 |       263 |                  |         2041689 |                       |                                     |
|       263 |    304883 |            0 |           4 |       263 |                  |         2041689 |                       |                                     |
|       263 |    304883 |            0 |           5 |       263 |                  |         2041689 |                       |                                     |
|       263 |    304883 |            0 |           6 |       263 |                  |         2041689 |                       |                                     |
|       263 | 419191813 |            2 |           0 |       263 |                  |         2041689 |                       |                                     |
|      2682 |      1224 |            1 |           0 |      2682 | US               |            2292 | MICROSOFT CORPORATION | One Microsoft Way,Redmond, WA 98052 |
|      2682 |     95326 |            1 |           0 |      2682 | US               |            2292 | MICROSOFT CORPORATION | One Microsoft Way,Redmond, WA 98052 |
|      2682 |    124940 |            1 |           0 |      2682 | US               |            2292 | MICROSOFT CORPORATION | One Microsoft Way,Redmond, WA 98052 |
+-----------+-----------+--------------+-------------+-----------+------------------+-----------------+-----------------------+-------------------------------------+
10 rows in set (2.03 sec)
If you want me to check some things in PATSTAT October 2014, send me an email (info@daigu.nl).

Re: Inventor Country of Origin

Posted: Mon Mar 30, 2015 12:38 pm
by flow
Thank you very much for your help, again.
I would be totally lost without you!