Question: How to effeciently retrieve in a same table with different timescale and different CPC number?

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

hz2019
Posts: 4
Joined: Tue Jun 30, 2020 2:41 pm

Question: How to effeciently retrieve in a same table with different timescale and different CPC number?

Post by hz2019 » Tue Jun 30, 2020 3:00 pm

Dear Community,

thanks for your help in advance!

I'd like to retrieve on
"Which are the top 10 most cited applications in related with (CPC: E04B 1/762) filed between 2000-2005, 2006-2010,2010-2015 ? "

I finished to retrieve in one particular timescale 2000-2005 as follows:

Code: Select all

SELECT DISTINCT TOP 10 docdb_family_id,nb_citing_docdb_fam,a.appln_id,CONCAT(appln_auth, appln_nr, appln_kind) AS country_id, appln_filing_date,earliest_filing_id,earliest_filing_date
FROM tls201_appln  a
JOIN tls224_appln_cpc c ON a.appln_id=c.appln_id
WHERE cpc_class_symbol= 'E04B   1/762'
AND appln_filing_year <='2005'
AND appln_filing_year >='2000'
AND a.appln_id=earliest_filing_id
ORDER BY nb_citing_docdb_fam DESC, docdb_family_id DESC

but I'm just wondering how can I retrieve the three time scale at the same statement ( to get the top 10 at each 3 time scale, at one same retrieval), I tried the UNION ALL in this way:

Code: Select all

SELECT * FROM (
SELECT  TOP 10 docdb_family_id,nb_citing_docdb_fam,a.appln_id,CONCAT(appln_auth, appln_nr, appln_kind) AS country_id, appln_filing_date,earliest_filing_id,earliest_filing_date
FROM tls201_appln  a
JOIN tls224_appln_cpc c ON a.appln_id=c.appln_id
WHERE cpc_class_symbol= 'E04B   1/762'
AND appln_filing_year <='2005'
AND appln_filing_year >='2000'
AND a.appln_id=earliest_filing_id
ORDER BY nb_citing_docdb_fam DESC, docdb_family_id DESC) d

UNION ALL

SELECT * FROM (
SELECT  TOP 10 docdb_family_id,nb_citing_docdb_fam,a.appln_id,CONCAT(appln_auth, appln_nr, appln_kind) AS country_id, appln_filing_date,earliest_filing_id,earliest_filing_date
FROM tls201_appln  a
JOIN tls224_appln_cpc c ON a.appln_id=c.appln_id
WHERE cpc_class_symbol= 'E04B   1/762'
AND appln_filing_year <='2010'
AND appln_filing_year >='2005'
AND a.appln_id=earliest_filing_id
ORDER BY nb_citing_docdb_fam DESC, docdb_family_id DESC) e
but there is always a multi-part identifier which is very annoyning.

Any suggestion on this? Thanks!

And I after the retrieval I still need to retrieve in about 100 different CPC symbols, how can I batch on that at one same retrieval?

Thank you so much !
HZ


mkracker
Posts: 114
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Question: How to effeciently retrieve in a same table with different timescale and different CPC number?

Post by mkracker » Wed Jul 01, 2020 10:25 am

Dear HZ,

PATSTAT Online and its 3rd party components unfortunately is not free of bugs. In certain complex queries (e.g. nested queries with aliases) it detects syntax errors where there are none. Often there is a workaround by slightly re-writing the query, like - what I did below - to replace "SELECT *" by the corresponding list of qualified attributes.

The more difficult part was to get the top 10 applications for each of multiple CPC codes. You can do this by using in SQL a so-called "Windows function" and the ROW_NUMBER function. Check the Internet for an explanation how they works.

2 issues I saw:
  • You obviously want to retrieve only 1 application per family. Filtering by earliest application ID ("a.appln_id=earliest_filing_id") is not a bad idea, but does not work if the earliest application e.g. has been used as priority, but has never been published. We call these applications "artificial applications" because we nevertheless include them in PATSTAT for consistency reasons (with an APPLN_ID > 900 000 000), but they will not be part of the family you want to retrieve. So the comparison above will fail and you will miss this family.
    See an alternative solution I propose below.
  • I noted that your time ranges overlap.
The query I show below is doable in PATSTAT Online. It's quite complex because everything has to be computed in a single query. It would be simpler using PATSTAT data loaded in your own DB system, because you could split the computation in multiple steps and store temporary results for testing and further processing. Depending on your project, this might be an option.

Code: Select all

SELECT  r, temp1.cpc_class_symbol,  temp1.docdb_family_id, nb_citing_docdb_fam, appln_id, country_id, 
		appln_filing_date,earliest_filing_id,earliest_filing_date
FROM ( 
	SELECT ROW_NUMBER() OVER (PARTITION BY cpc_class_symbol ORDER BY nb_citing_docdb_fam DESC) r,
		cpc_class_symbol, docdb_family_id, nb_citing_docdb_fam, a.appln_id, CONCAT(appln_auth, appln_nr, appln_kind) AS country_id, 
		appln_filing_date,earliest_filing_id,earliest_filing_date
	from tls201_appln a
	join tls224_appln_cpc c on a.appln_id = c.appln_id
	WHERE appln_filing_year between 2000 and 2005
	and cpc_class_symbol IN ('E04B   1/762', 'A61k   9/00') -- add more CPC symbols here if needed
	and not exists -- make sure it is the earliest filed application in teh DOCDB family
				-- if there are multiple early applications filed on the same day, take the one with the smallest APPLN_ID
		(select *
		from tls201_appln x
		where x.docdb_family_id = a.docdb_family_id
		AND (x.appln_filing_date < a.appln_filing_date
			OR (x.appln_filing_date = a.appln_filing_date AND x.appln_id < a.appln_id))
		)
	) temp1
where r <= 10 -- to get the top 10 for each window (defined by the OVER() clause)
UNION ALL

SELECT  r, temp2. cpc_class_symbol,  temp2.docdb_family_id, nb_citing_docdb_fam, appln_id, country_id, 
		appln_filing_date,earliest_filing_id,earliest_filing_date
FROM ( 
	SELECT ROW_NUMBER() OVER (PARTITION BY cpc_class_symbol ORDER BY nb_citing_docdb_fam DESC) r,
		cpc_class_symbol, docdb_family_id, nb_citing_docdb_fam, a.appln_id, CONCA
        T(appln_auth, appln_nr, appln_kind) AS country_id, 
		appln_filing_date,earliest_filing_id,earliest_filing_date
	from tls201_appln a
	join tls224_appln_cpc c on a.appln_id = c.appln_id
	WHERE appln_filing_year between 2006 and 2010
	and cpc_class_symbol IN ('E04B   1/762', 'A61k   9/00') -- add more CPC symbols here if needed
	and not exists -- make sure it is the earliest filed application in teh DOCDB family
				-- if there are multiple early applications filed on the same day, take the one with the smallest APPLN_ID
		(select *
		from tls201_appln x
		where x.docdb_family_id = a.docdb_family_id
		AND (x.appln_filing_date < a.appln_filing_date
			OR (x.appln_filing_date = a.appln_filing_date AND x.appln_id < a.appln_id))
		)
	) temp2
where r <= 10 -- to get the top 10 for each window (defined by the OVER() clause)
Best regards,
-------------------------------------------
Martin Kracker / EPO


hz2019
Posts: 4
Joined: Tue Jun 30, 2020 2:41 pm

Re: Question: How to effeciently retrieve in a same table with different timescale and different CPC number?

Post by hz2019 » Thu Jul 16, 2020 9:03 pm

Thank you Martin, Super! That is a very useful suggestion!

By the way, where can I download PATSTA locally? I only find the link to use it online.

Best,
HZ


mkracker
Posts: 114
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Question: How to effeciently retrieve in a same table with different timescale and different CPC number?

Post by mkracker » Mon Jul 20, 2020 7:14 am

Dear HZ,

The data on which PATSTAT Online is based is also available as bulk data for download and import into your own data base management system. It is a subscription product. If you just need a single edition, PATSTAT Global (worldwide patents) currently costs 975 EUR and PATSTAT EP Register (only EP patents, but even more detailed data) costs 1 420 EUR, plus a 150 EUR service fee for download.
You may use the data for unlimited time within the site of your organisation, e.g. campus of a university.

All information can be found here: https://www.epo.org/searching-for-paten ... tstat.html
In case you have questions regarding commercial conditions or the purchase, you may contact our Customer Service Center at csc (at) epo (dot) org.

Best regards,
-------------------------------------------
Martin Kracker / EPO


Post Reply