PATSTAT MSSQL Loading Script

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

skim20
Posts: 1
Joined: Fri Feb 07, 2020 5:36 pm

PATSTAT MSSQL Loading Script

Post by skim20 » Fri Feb 07, 2020 5:45 pm

Dear PATSTAT community,

I am using PATSTAT for the first time and am trying to understand how to load my data into MSSQL. I've run the the database creation script and the table creation scripts that were provided by the EPO, but am having trouble understanding how to actually load the data into the empty tables that were created.

The documentation on how to load PATSTAT (http://documents.epo.org/projects/babyl ... .11_en.pdf), provides links to user generated loading scripts on Oracle, MySQL, etc., but doesn't have any links to loading scripts for MSSQL, which seem a bit strange since the EPO provided MSSQL scripts for database and table creation.

Am I missing something obvious? If you could please point me to a loading script for MSSQL, I would be grateful. Thank you!


EPO / PATSTAT Support
Posts: 160
Joined: Thu Feb 22, 2007 5:33 pm

Re: PATSTAT MSSQL Loading Script

Post by EPO / PATSTAT Support » Wed Feb 12, 2020 4:24 pm

Hello skim20,
the loading procedure is very much dependant on your MS SQL server edition and your server setting.
At the EPO we use MS SQL Enterprise edition 2017.
This means we can use the Bulk data tools to import the data. Earlier versions of MS SQL server might struggle with the scripts.
So what works for us is:
1) create database (similar to what you find in the documentation):

Code: Select all

USE [master]
GO
CREATE DATABASE [patstat2019b_test] ON  PRIMARY 
( NAME = N'patstat2019b_test_dat', FILENAME = N'G:\workdata\DB\patstat2019b_testdat.mdf' , SIZE = 200MB , MAXSIZE = UNLIMITED , FILEGROWTH = 51200KB )
 LOG ON 
( NAME = N'patstat2019b_test_log', FILENAME = N'G:\workdata\DB\patstat2019b_testlog.ldf' , SIZE = 10MB , MAXSIZE = UNLIMITED , FILEGROWTH = 15360KB )
 COLLATE SQL_Latin1_General_CP1_CI_AS
GO
EXEC dbo.sp_dbcmptlevel @dbname=N'patstat2019b_test', @new_cmptlevel=100
GO
IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [patstat2019b_test].[dbo].[sp_fulltext_database] @action = 'enable'
end
GO
ALTER DATABASE [patstat2019b_test] SET ANSI_NULL_DEFAULT OFF 
GO
ALTER DATABASE [patstat2019b_test] SET ANSI_NULLS OFF 
GO
ALTER DATABASE [patstat2019b_test] SET ANSI_PADDING OFF 
GO
ALTER DATABASE [patstat2019b_test] SET ANSI_WARNINGS OFF 
GO
ALTER DATABASE [patstat2019b_test] SET ARITHABORT OFF 
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_CLOSE OFF 
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_CREATE_STATISTICS ON 
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_SHRINK OFF 
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_UPDATE_STATISTICS ON 
GO
ALTER DATABASE [patstat2019b_test] SET CURSOR_CLOSE_ON_COMMIT OFF 
GO
ALTER DATABASE [patstat2019b_test] SET CURSOR_DEFAULT  GLOBAL 
GO
ALTER DATABASE [patstat2019b_test] SET CONCAT_NULL_YIELDS_NULL OFF 
GO
ALTER DATABASE [patstat2019b_test] SET NUMERIC_ROUNDABORT OFF 
GO
ALTER DATABASE [patstat2019b_test] SET QUOTED_IDENTIFIER OFF 
GO
ALTER DATABASE [patstat2019b_test] SET RECURSIVE_TRIGGERS OFF 
GO
ALTER DATABASE [patstat2019b_test] SET  DISABLE_BROKER 
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_UPDATE_STATISTICS_ASYNC OFF 
GO
ALTER DATABASE [patstat2019b_test] SET DATE_CORRELATION_OPTIMIZATION OFF 
GO
ALTER DATABASE [patstat2019b_test] SET TRUSTWORTHY OFF 
GO
ALTER DATABASE [patstat2019b_test] SET ALLOW_SNAPSHOT_ISOLATION OFF 
GO
ALTER DATABASE [patstat2019b_test] SET PARAMETERIZATION SIMPLE 
GO
ALTER DATABASE [patstat2019b_test] SET  READ_WRITE 
GO
ALTER DATABASE [patstat2019b_test] SET RECOVERY FULL 
GO
ALTER DATABASE [patstat2019b_test] SET  MULTI_USER 
GO
ALTER DATABASE [patstat2019b_test] SET PAGE_VERIFY CHECKSUM  
GO
ALTER DATABASE [patstat2019b_test] SET DB_CHAINING OFF 
2) Create table(s) (similar to what you find in the documentation)--> repeat for each table

Code: Select all

USE [patstat2019b_test]
GO
/****** Object:  Table [dbo].[tls201_appln]    Script Date: 06/22/2018 08:13:34 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tls201_appln](
	[appln_id] [int] NOT NULL DEFAULT ('0'),
	[appln_auth] [char](2) NOT NULL DEFAULT (''),
	[appln_nr] [varchar](15)  NOT NULL DEFAULT (''),
	[appln_kind] [char](2) NOT NULL DEFAULT ('  '),
	[appln_filing_date] [date] NOT NULL DEFAULT ('9999-12-31'),
	[appln_filing_year] [smallint] NOT NULL DEFAULT '9999',
	[appln_nr_epodoc] [varchar](20)  NOT NULL DEFAULT (''),
	[appln_nr_original] [varchar](100) NOT NULL DEFAULT (''),
	[ipr_type] [char](2) NOT NULL DEFAULT (''),
	[receiving_office] [char](2) NOT NULL DEFAULT (''),
	[internat_appln_id] [int] NOT NULL DEFAULT ('0'),
	[int_phase] [char](1) NOT NULL DEFAULT ('N'),
	[reg_phase] [char](1) NOT NULL DEFAULT ('N'),
	[nat_phase] [char](1) NOT NULL DEFAULT ('N'),
	[earliest_filing_date] [date] NOT NULL DEFAULT ('9999-12-31'),
	[earliest_filing_year] [smallint] NOT NULL DEFAULT '9999',
	[earliest_filing_id] [int] NOT NULL DEFAULT '0',
	[earliest_publn_date] [date] NOT NULL DEFAULT ('9999-12-31'),
	[earliest_publn_year] [smallint] NOT NULL DEFAULT '9999',
	[earliest_pat_publn_id] [int] NOT NULL DEFAULT '0',
	[granted] [char](1) NOT NULL DEFAULT ('N'),
	[docdb_family_id] [int] NOT NULL DEFAULT ('0'),
	[inpadoc_family_id] [int] NOT NULL DEFAULT ('0'),
	[docdb_family_size] [smallint] NOT NULL default '0',
	[nb_citing_docdb_fam] [smallint] NOT NULL default '0',
	[nb_applicants] [smallint] NOT NULL default '0',
	[nb_inventors] [smallint] NOT NULL default '0',
PRIMARY KEY CLUSTERED 
(
	[appln_id] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
3)Launch bulk data query for each of the unzipped files:
Observations: the row terminator according to our documentation should be \r\n; MS SQL server 2017 adds this automatically when the FORMAT = 'CSV' is specified. (If you add it anyway, it will give an error !)
FIELDQUOTE removes the quotes from text fields on import; it does not work with pre-2017 MS SQL server editions.

Code: Select all

BULK INSERT tls201_appln
FROM '\\172.21.1.151\TempData\tls201_part01\tls201_part01.txt'
WITH (FIRSTROW = 2,
    FIELDTERMINATOR = ',',
    ROWTERMINATOR='\n',
FIELDQUOTE = '"',
BATCHSIZE=250000,
FORMAT = 'CSV',
    MAXERRORS=2);	
Another method is to use the wizard; right click on the data base name, then take "IMPORT data" option and work yourself through the wizard. (import flat file will not work as it only allows to create a table from a single file, and not to "insert" more records from the other files)
The IMPORT data allows you to let the wizard define type and length of columns in the table. You can in principle even skip the seperate creation of the tables, but don't forget to define that you want to ADD records from the second CSV file onwards.
You WILL HAVE TO CHECK/ADAPT field length and type to the settings as defined in the data catalog. (see also attached file)
PATSTAT_field_data_types.xlsx
(19.08 KiB) Downloaded 2 times
The PATSTAT team does not provide help on scripting
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply