I have two questions related to the coverage of citation data, and the classification of persons into companies or individuals. I am trying to compute the citations that companies headquartered in different countries make to each other, combining the data in table TLS212_CITATION and TLS206_PERSON using TLS227_PERS_PUBLN as a link between the two.
My first question concerns the coverage of table TLS212_CITATION. I have seen the answer to this thread: forward-citations-and-coverage-9180 and had a look at these two documents: https://documents.epo.org/projects/baby ... 202023.pdf, https://documents.epo.org/projects/baby ... 202301.pdf. My understanding is that these contain the number of citations made and received that I can find in PATSTAT.
However, I am not able to reproduce these numbers at all. What do they represent? Are they unique citations from a document to another? How are the countries in this table assigned?
True, I have the Spring 2020 edition of PATSTAT, but when I run the simplest query:
Code: Select all
SELECT
tCIT.PAT_PUBLN_ID,
tCIT.CITED_PAT_PUBLN_ID
FROM
TLS212_CITATION tCIT
I only get a little less than 367 million observations, more than 100 million less than what the table "Overview public citation data in EPO's citation database (REFI)" reported as the number of cited documents.
It would be very helpful to know what is the query to reproduce the table (number of citing and cited by country) so that I can understand what I am doing wrong.
My second question concerns the classification on persons into company or individual. As part of my exercise, I need to understand in which countries companies are incorporated. The way I went about this was using the field PSN_SECTOR in TLS206_PERSON and restrict to observations that contained "COMPANY" in the field. However, I noticed that these numbers are quite small and exhibit substantial breaks over time. For example, running the query
Code: Select all
SELECT
COUNT(t3.EARLIEST_PUBLN_YEAR) AS NUM_PER_YEAR,
t3.EARLIEST_PUBLN_YEAR,
t2.PERSON_CTRY_CODE
FROM
TLS201_APPLN t3
JOIN
TLS207_PERS_APPLN t1
ON t3.APPLN_ID = t1.APPLN_ID
JOIN
TLS206_PERSON t2
ON t1.PERSON_ID = t2.PERSON_ID
WHERE t2.PSN_SECTOR LIKE '%COMPANY%'
AND t2.PSN_SECTOR NOT LIKE '%GOV%'
AND t2.PSN_SECTOR NOT LIKE '%NON-PROFIT%'
AND t2.PSN_SECTOR NOT LIKE '%UNIVERSITY%'
GROUP BY
t3.EARLIEST_PUBLN_YEAR,
t2.PERSON_CTRY_CODE
Can this data be relied upon or do I have to just look at the country of persons, bunching all together? I would like to arrive at actual corporate assignees of the patents, so I would appreciate alternative suggestions to arrive at the same result.
Thanks a lot for your help.