counting documents in CPC groups

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

b.paolo45
Posts: 2
Joined: Wed Oct 22, 2014 5:24 pm

counting documents in CPC groups

Post by b.paolo45 » Wed Oct 22, 2014 5:31 pm

Hello

is there a simple way, using a PATSTAT query, to count all the documents in each group of the CPC classification?

Regards

Paolo


Geert Boedt
Posts: 176
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: counting documents in CPC groups

Post by Geert Boedt » Tue Nov 11, 2014 12:59 pm

Hello Paolo,
if you mean by "main group". being the first 8 digits of the classification code then the following query will give you the result.
select left(tls224_appln_cpc.cpc_class_symbol,8) as CPC_main_group, count(distinct(appln_id)) from tls224_appln_cpc
group by left(tls224_appln_cpc.cpc_class_symbol,8)
order by left(tls224_appln_cpc.cpc_class_symbol,8)


The result of such query does not tell so much,apart maybe from the distribution of applications over CPC main group level.

So here comes a similar more interesting query, where we have joined the CPC table with the application table, and extracted a new table that gives the evolution (based on application year) of patents filed at the EPO that have a classification starting with 'Y02E 10/7' (patents in the field of "wind energy").

SELECT tls201_appln.appln_filing_year, tls201_appln.appln_auth, Count(tls224_appln_cpc.appln_id) AS no_of_appplications
FROM tls201_appln INNER JOIN tls224_appln_cpc ON tls201_appln.appln_id = tls224_appln_cpc.appln_id
WHERE Left(cpc_class_symbol,10)='Y02E 10/7' and tls201_appln.appln_auth = 'EP'
GROUP BY tls201_appln.appln_filing_year, tls201_appln.appln_auth
ORDER BY tls201_appln.appln_filing_year;


The result will look like this:

appln_filing_year appln_auth no_of_appplications
1978 EP 3
1979 EP 11
1980 EP 33
1981 EP 43
1982 EP 37
1983 EP 31
1984 EP 26
1985 EP 25
1986 EP 16
1987 EP 21
1988 EP 17
1989 EP 30
1990 EP 18
1991 EP 26
1992 EP 27
1993 EP 15
1994 EP 37
1995 EP 37
1996 EP 35
1997 EP 43
1998 EP 62
1999 EP 84
2000 EP 202
2001 EP 230
2002 EP 375
2003 EP 346
2004 EP 392
2005 EP 419
2006 EP 472
2007 EP 792
2008 EP 1091
2009 EP 1538
2010 EP 1999
2011 EP 1979
2012 EP 1098
2013 EP 272

Geert BOEDT
EPO - PATSTAT
patstat @ epo.org
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


nico.rasters
Posts: 140
Joined: Wed Jul 08, 2009 5:51 pm
Contact:

Re: counting documents in CPC groups

Post by nico.rasters » Mon Nov 17, 2014 7:07 pm

Be careful with:
  • data availability: do all patents have CPC codes? it wasn't true for IPC, I don't know what's the status for CPC.
  • whole counts vs fractional counts: patents tend to have more than one patent class. Do you count each class as 1 or as 1/(number of classes)?
  • double counting: not really an issue in Geert's example but when you're looking at patents worldwide you should keep patent families in mind in order to avoid double counting CPC codes.
  • terminology: you write "documents", but in the database you'll find applications and publications. You probably need applications, but to me publications sound more like "documents".
________________________________________
Nico Doranov
Data Manager

Daigu Academic Services & Data Stewardship
http://www.daigu.nl/


b.paolo45
Posts: 2
Joined: Wed Oct 22, 2014 5:24 pm

Re: counting documents in CPC groups

Post by b.paolo45 » Sun Dec 07, 2014 11:00 am

Hi Geert

thanks for the example, I was able to re-use it somehow for my purpose.
(There is however a problem with your query since it returns 0 results.)

I was trying to retrieve all the applications belonging to some particular main groups
(i.e. having /00 after the subclass).

Paolo


Geert Boedt
Posts: 176
Joined: Tue Oct 19, 2004 10:36 am
Location: Vienna

Re: counting documents in CPC groups

Post by Geert Boedt » Thu Jan 08, 2015 5:07 pm

Hello Paolo,

This query should do the job:

select left(tls224_appln_cpc.cpc_class_symbol,11)
as CPC, count(distinct(appln_id)) from tls224_appln_cpc
where SUBSTRING(left(tls224_appln_cpc.cpc_class_symbol,11),9,3 )='/00'
group by left(tls224_appln_cpc.cpc_class_symbol,11)
order by left(tls224_appln_cpc.cpc_class_symbol,11)


Nico's remarks are of course to be taken into account: applications can have multiple CPC (or IPC) codes, and fractional counting might be the most appropriate method. (this depends on what you want to prove).
And it is true that many patent applications do not have a CPC (or IPC) classification. When doing your data aggregation it is important the check the data coverage of the attributes you will be using to see if it will introduce any errors.

This query will give you the number of applications for which there is no CPC classification available, grouped per application authority. (28 million in total) As you can observe, there are even 8.000 EP applications without CPC classifications.

SELECT appln_auth, count(tls201_appln.appln_id)
FROM tls201_appln
left join tls224_appln_cpc on tls201_appln.appln_id
= tls224_appln_cpc.appln_id
where cpc_class_symbol is null and tls201_appln.appln_id < 900000000
group by appln_auth
order by appln_auth


Best regards,
Geert Boedt
Patent information for Business Use
EPO Vienna
patstat @ epo.org
Best regards,

Geert Boedt
PATSTAT support
Business Use of Patent Information
EPO Vienna


Post Reply