Problem with CPC

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

ansianla
Posts: 2
Joined: Sat Feb 06, 2016 9:03 pm

Problem with CPC

Post by ansianla » Sat Feb 06, 2016 9:15 pm

Hi there,

I try to extract all the patent in CPC subclass of Y02B 20/72 and download the bibliographic data. I have no problem getting the result though one thing I could not figure out is that how to get all the CPC associated with the patent. In my result table, I have the CPC code that I searched with and if the code is the first one in the list or not.

Thanks in advance!

P.S. I can get the all the CPC associated with a patent by searching EPO website classification class search, but the problem with that search is limited to 500 results and do not have citation information.


Geert Boedt
Posts: 178
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: Problem with CPC

Post by Geert Boedt » Wed Feb 10, 2016 10:18 am

Hello,
could you post your query, so that we can have a look at it ?
PATSTAT does not limit queries to 500 results, so I wonder if you are using maybe another data base (ESPACENET) ?
Also keep in mind that, contrary to other patent authorities, the EPO does not apply a concept of "first" classification code.
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


ansianla
Posts: 2
Joined: Sat Feb 06, 2016 9:03 pm

Re: Problem with CPC

Post by ansianla » Thu Feb 18, 2016 7:43 pm

Yes, I was using ESPACENET and got limited results. Though I like to have bibliographic data similar to ESPACENET result without limitation.

What I like to do is to co-classification study for CPC class of Y02B 20/72. Following is my query to search PATSTAT database:

SELECT
DISTINCT t1.appln_id, t1.appln_nr, t1.nb_citing_docdb_fam, t2.cpc_class_symbol,t2.cpc_position,t4.citn_id,t4.citn_origin
FROM
tls201_appln t1
INNER JOIN
tls224_appln_cpc t2 ON t1.appln_id = t2.appln_id
INNER JOIN
tls211_pat_publn t3 ON t2.appln_id = t3.appln_id
INNER JOIN
tls212_citation t4 ON t3.pat_publn_id = t4.pat_publn_id
WHERE
year(t1.appln_filing_date) = 2012
AND (t1.appln_kind = 'A' OR t1.appln_kind = 'W')
AND t2.cpc_class_symbol like 'Y02B 20/72%';

In the result table, only CPC class I see is Y02B 20/72 and whether it is first class assigned to patent or later with using cpc_position, my question is how I can have all the CPC class associated with a patent in my result table, not just Y02B 20/72 and its position.

Besides, for technology convergence analysis, I like to have more information about the backward citations like their patent number or their class, though the only thing I can get with my query is the number of citations, how I can get more data about citing patent.

Thanks in advance for your help.


Geert Boedt
Posts: 178
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: Problem with CPC

Post by Geert Boedt » Fri Feb 19, 2016 10:35 am

Hello Ansiania,
here is a query that will give you all the CPC codes from patents that have been classified with "Y02B 20/72" (Energy efficient lighting technologies -->in street lighting).

Code: Select all

select tls201_appln.appln_id, appln_auth, appln_nr, appln_kind, appln_filing_date, granted, nb_citing_docdb_fam, cpc_position, cpc_class_symbol
from tls201_appln join tls224_appln_cpc  on tls201_appln.appln_id = tls224_appln_cpc.appln_id
where 
tls201_appln.appln_id in (select distinct appln_id from  tls224_appln_cpc where cpc_class_symbol =  'Y02B  20/72')
order by tls201_appln.appln_id, cpc_position, cpc_class_symbol
The above query will give duplicates at application level because for each different CPC classification there will be created a row (record).
With regards to the cited (backward) or citing (forward) citations, you need to use the tls212_citation table.
If you only want to know the family/family citation ratio (forward citations), then you can simply use the pre-aggregated attribute nb_citing_docdb_fam that will give you the number. If you intend to analyse "citing technologies", then it becomes more complicated because you need to look at the classifications of the citing documents and the number of records will explode quickly. Assume you have 5 applications ( where cpc_class_symbol = 'Y02B 20/72'), each cited by 5 other applications, each having 3 classification codes = 75 rows. In reality there are 12.520 applications in that group. See query below. You can of course reduce the number by limiting year, authorities, etc...

Code: Select all

select distinct appln_id from tls224_appln_cpc  
where cpc_class_symbol =  'Y02B  20/72'
Here is an example query that retrieves all the citing publications and respective CPC classification codes for 1 application. (appln_id = 25740 --> EP08158122) . This single application has 5 citing publications, resulting in 57 rows when the CPC codes are added. If you want to have the list for all 12.520 applications, then you remove the /* */ and adapt the appln_id condition in the "WHERE" clause, and you will have 67.300 records.

Code: Select all

SELECT cited_ap.appln_id
      ,appln_auth
      ,appln_nr
      ,appln_kind
      ,appln_filing_date
      ,nb_citing_docdb_fam
	  ,citing_pub.publn_auth
	  ,citing_pub.publn_nr
	  ,citing_pub.publn_kind
	  ,citing_pub.publn_date
	  ,cpc_position
	  ,cpc_class_symbol
FROM tls201_appln cited_ap join tls211_pat_publn cited_pub on cited_ap.appln_id = cited_pub.appln_id
join tls212_citation on cited_pub.pat_publn_id = tls212_citation.cited_pat_publn_id
join tls211_pat_publn citing_pub on tls212_citation.pat_publn_id = citing_pub.pat_publn_id
join tls224_appln_cpc on citing_pub.appln_id = tls224_appln_cpc.appln_id
where cited_ap.appln_id =  25740 /*in (select distinct appln_id from tls224_appln_cpc  
							where cpc_class_symbol =  'Y02B  20/72') */
order by cited_ap.appln_filing_date,cited_ap.appln_id,citing_pub.publn_date, cpc_position, cpc_class_symbol
For this kind of (rather extensive) analysis, I would recommend you to purchase PATSTAT and install it on a local server. (Or you could download the extended data set as a MS ACCESS database via PATSTAT Online.) Many researchers use family/family citation concepts to avoid duplication when doing the counts. You could then use the tls228_docdb_fam_citn table and de-duplicate the CPC classification codes from the citing family members. But this depends of course on the nature and granularity needed for your research.
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


r.vdiemen
Posts: 4
Joined: Fri Dec 04, 2015 5:23 pm

Re: Problem with CPC

Post by r.vdiemen » Wed Feb 24, 2016 1:08 pm

Hi Geert,

Your reply was very useful for my research as well, but I am interested in 'cited technologies' (the technologies these patents have drawn on), instead of 'citing technologies'. I have purchased PATSTAT, but I'm having difficulties constructing a query for backward citations. Could you please help me with this? Would it be the same as the first half of the query above, but without the citing_pub table?

Many thanks in advance,
Renée


Geert Boedt
Posts: 178
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: Problem with CPC

Post by Geert Boedt » Wed Feb 24, 2016 6:36 pm

Hello Renée,
continuing on the same example: you can use more or less the same query, but instead of "where cited_ap.appln_id = 25740 " you replace it by "where citing_pub.appln_id = 25740 " (Observe the aliases I used in the query to make it a bit more understandable.) I also had to join the CPC table with the cited applications instead of the citing publications.
If you are only interested in seeing the classification codes of the cited documents (as there will many duplicates because some cited documents have the same classifications), then you remove the detailed attributes of the cited_ap table from the select statement and add a GROUP BY.

Code: Select all

SELECT cited_ap.appln_id
      ,appln_auth
      ,appln_nr
      ,appln_kind
      ,appln_filing_date
      ,nb_citing_docdb_fam
     ,citing_pub.publn_auth
     ,citing_pub.publn_nr
     ,citing_pub.publn_kind
     ,citing_pub.publn_date
     ,cpc_class_symbol
FROM tls201_appln cited_ap join tls211_pat_publn cited_pub on cited_ap.appln_id = cited_pub.appln_id
join tls212_citation on cited_pub.pat_publn_id = tls212_citation.cited_pat_publn_id
join tls211_pat_publn citing_pub on tls212_citation.pat_publn_id = citing_pub.pat_publn_id
join tls224_appln_cpc on cited_ap.appln_id = tls224_appln_cpc.appln_id
where citing_pub.appln_id =  25740 
order by cited_ap.appln_filing_date,cited_ap.appln_id,citing_pub.publn_date
[/i]
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


r.vdiemen
Posts: 4
Joined: Fri Dec 04, 2015 5:23 pm

Re: Problem with CPC

Post by r.vdiemen » Thu Apr 28, 2016 2:41 pm

Hi Geert,

Thank you very much for all of your help so far. I was hoping you'd be able to help me on one further question.

I want to examine which technologies have influenced the development of the technology I'm examining (hence the backward citations). Specifically, I want to identify the 'most important influencing technologies', ie the technology knowledge base.

Can this be done by simply running the above query, and then counting the number of times each CPC code associated with a backward citation comes up? Or is this too simplified?

Many thanks,
Renée


Geert Boedt
Posts: 178
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: Problem with CPC

Post by Geert Boedt » Thu Apr 28, 2016 6:27 pm

Hello Renée,
it is a bit too simple; the concept of counting the number of applications (or better patent families) having a certain classification code (be it CPC or IPC or whatever other criteria you use), is correct. But you need to see if the cited documents belong the same "class" you want to analyse.
knowledge_transfer.png
knowledge_transfer.png (74.1 KiB) Viewed 2515 times
The picture says more then a thousand words. The example here is referring to "industry", which could be defined through the NACE classification codes (table tls229_appln_nace2). Equally we could use a technology classification such as the one in table tls230_appln_techn_field. Many other variations are possible: CPC, IPC, Applicants (competitor analysis), Application authorities (to analyse technology flows between countries), Inventors (social networking ?), etc...
It is impossible to create queries for all of the examples but the concept is such that you have to aggregate the data at the correct level. For example applicant_applicant citations can only be done if you move form the cited-citing publications to tls206_person tables that contains the cited and citing applicants.
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


nico.rasters
Posts: 140
Joined: Wed Jul 08, 2009 5:51 pm
Contact:

Re: Problem with CPC

Post by nico.rasters » Fri May 13, 2016 12:18 pm

When looking at citations to study influencing technologies, keep in mind that there's a difference in citation behaviour per patent office and that the EPO uses search codes to indicate the relevance of a citation. Plus you will have to correct for the source of origin to avoid double count.

I don't know if "industry" comes into play (but I'm not a fan of CPC-NACE and such schemes).
I also don't know if self-citations play a role. Wouldn't hurt to add a dummy for this though.

The CPC classes themselves contains several levels: section, class, subclass, maingroup, subgroup. Comparing on a section level will give different results than comparing on subclass level.

Also consider the following two scenarios:
A patent cites three patents P1, P2, P3 that each hold only one class, respectively A, B, C.
A patent cites one patent P1 with CPC classes A,B,C.
Do you want these two scenarios to give you the same outcome, or not?
________________________________________
Nico Doranov
Data Manager

Daigu Academic Services & Data Stewardship
http://www.daigu.nl/


Post Reply