MS SQL server bulk loading script example

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

EPO / PATSTAT Support
Posts: 425
Joined: Thu Feb 22, 2007 5:33 pm
Contact:

MS SQL server bulk loading script example

Post by EPO / PATSTAT Support » Wed Feb 12, 2020 4:24 pm

Loading PATSTAT in a MS SQL server is dependant on your MS SQL server edition and your server settings.
At the EPO we use MS SQL Enterprise edition 2017.
This means we can use the Bulk data tools to import the data. The scripts might not work on earlier versions of MS SQL server. So, this is what works for us:
1) create database (similar to what you find in the documentation):

Code: Select all

USE [master]
GO
CREATE DATABASE [patstat2019b_test] ON  PRIMARY 
( NAME = N'patstat2019b_test_dat', FILENAME = N'G:\workdata\DB\patstat2019b_testdat.mdf' , SIZE = 200MB , MAXSIZE = UNLIMITED , FILEGROWTH = 51200KB )
 LOG ON 
( NAME = N'patstat2019b_test_log', FILENAME = N'G:\workdata\DB\patstat2019b_testlog.ldf' , SIZE = 10MB , MAXSIZE = UNLIMITED , FILEGROWTH = 15360KB )
 COLLATE SQL_Latin1_General_CP1_CI_AS
GO
EXEC dbo.sp_dbcmptlevel @dbname=N'patstat2019b_test', @new_cmptlevel=100
GO
IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [patstat2019b_test].[dbo].[sp_fulltext_database] @action = 'enable'
end
GO
ALTER DATABASE [patstat2019b_test] SET ANSI_NULL_DEFAULT OFF 
GO
ALTER DATABASE [patstat2019b_test] SET ANSI_NULLS OFF 
GO
ALTER DATABASE [patstat2019b_test] SET ANSI_PADDING OFF 
GO
ALTER DATABASE [patstat2019b_test] SET ANSI_WARNINGS OFF 
GO
ALTER DATABASE [patstat2019b_test] SET ARITHABORT OFF 
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_CLOSE OFF 
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_CREATE_STATISTICS ON 
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_SHRINK OFF 
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_UPDATE_STATISTICS ON 
GO
ALTER DATABASE [patstat2019b_test] SET CURSOR_CLOSE_ON_COMMIT OFF 
GO
ALTER DATABASE [patstat2019b_test] SET CURSOR_DEFAULT  GLOBAL 
GO
ALTER DATABASE [patstat2019b_test] SET CONCAT_NULL_YIELDS_NULL OFF 
GO
ALTER DATABASE [patstat2019b_test] SET NUMERIC_ROUNDABORT OFF 
GO
ALTER DATABASE [patstat2019b_test] SET QUOTED_IDENTIFIER OFF 
GO
ALTER DATABASE [patstat2019b_test] SET RECURSIVE_TRIGGERS OFF 
GO
ALTER DATABASE [patstat2019b_test] SET  DISABLE_BROKER 
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_UPDATE_STATISTICS_ASYNC OFF 
GO
ALTER DATABASE [patstat2019b_test] SET DATE_CORRELATION_OPTIMIZATION OFF 
GO
ALTER DATABASE [patstat2019b_test] SET TRUSTWORTHY OFF 
GO
ALTER DATABASE [patstat2019b_test] SET ALLOW_SNAPSHOT_ISOLATION OFF 
GO
ALTER DATABASE [patstat2019b_test] SET PARAMETERIZATION SIMPLE 
GO
ALTER DATABASE [patstat2019b_test] SET  READ_WRITE 
GO
ALTER DATABASE [patstat2019b_test] SET RECOVERY FULL 
GO
ALTER DATABASE [patstat2019b_test] SET  MULTI_USER 
GO
ALTER DATABASE [patstat2019b_test] SET PAGE_VERIFY CHECKSUM  
GO
ALTER DATABASE [patstat2019b_test] SET DB_CHAINING OFF 
2) Create all table(s) (similar to what you find in the documentation delivered with the data)

Code: Select all

USE [patstat2019b_test]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tls201_appln](
	[appln_id] [int] NOT NULL DEFAULT ('0'),
	[appln_auth] [char](2) NOT NULL DEFAULT (''),
	[appln_nr] [varchar](15)  NOT NULL DEFAULT (''),
	[appln_kind] [char](2) NOT NULL DEFAULT ('  '),
	[appln_filing_date] [date] NOT NULL DEFAULT ('9999-12-31'),
	[appln_filing_year] [smallint] NOT NULL DEFAULT '9999',
	[appln_nr_epodoc] [varchar](20)  NOT NULL DEFAULT (''),
	[appln_nr_original] [varchar](100) NOT NULL DEFAULT (''),
	[ipr_type] [char](2) NOT NULL DEFAULT (''),
	[receiving_office] [char](2) NOT NULL DEFAULT (''),
	[internat_appln_id] [int] NOT NULL DEFAULT ('0'),
	[int_phase] [char](1) NOT NULL DEFAULT ('N'),
	[reg_phase] [char](1) NOT NULL DEFAULT ('N'),
	[nat_phase] [char](1) NOT NULL DEFAULT ('N'),
	[earliest_filing_date] [date] NOT NULL DEFAULT ('9999-12-31'),
	[earliest_filing_year] [smallint] NOT NULL DEFAULT '9999',
	[earliest_filing_id] [int] NOT NULL DEFAULT '0',
	[earliest_publn_date] [date] NOT NULL DEFAULT ('9999-12-31'),
	[earliest_publn_year] [smallint] NOT NULL DEFAULT '9999',
	[earliest_pat_publn_id] [int] NOT NULL DEFAULT '0',
	[granted] [char](1) NOT NULL DEFAULT ('N'),
	[docdb_family_id] [int] NOT NULL DEFAULT ('0'),
	[inpadoc_family_id] [int] NOT NULL DEFAULT ('0'),
	[docdb_family_size] [smallint] NOT NULL default '0',
	[nb_citing_docdb_fam] [smallint] NOT NULL default '0',
	[nb_applicants] [smallint] NOT NULL default '0',
	[nb_inventors] [smallint] NOT NULL default '0',
PRIMARY KEY CLUSTERED 
(
	[appln_id] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
3)Launch bulk data query for each of the unzipped files.
Observations: the row terminator according to our documentation should be \r\n;
MS SQL server 2017 adds this automatically when the FORMAT = 'CSV' is specified. If you add it anyway, it will give an error !
FIELDQUOTE removes the quotes from text fields on import; this does not work with pre-2017 MS SQL server editions. And CODEPAGE = '65001' needs to be explicitly defined.

Code: Select all

BULK INSERT tls201_appln
FROM '\\172.21.1.151\TempData\tls201_part01\tls201_part01.txt'
WITH (FIRSTROW = 2,
    FIELDTERMINATOR = ',',
    ROWTERMINATOR='\n',
FIELDQUOTE = '"',
BATCHSIZE=250000,
FORMAT = 'CSV',
CODEPAGE = '65001',
    MAXERRORS=2);	
Another method to load the data is to use the wizard; right click on the data base name, then take "IMPORT data" option and work yourself through the wizard. (import flat file will not work as it only allows to create a table from a single file, and not to "insert" more records from the other files)
The IMPORT data allows you to let the wizard define type and length of columns in the table. You can in principle even skip the separate step of creating the tables because the wizard will do it for you.
But don't forget to define that you want to ADD records starting from the second CSV file onwards for each table.
You WILL HAVE TO CHECK/ADAPT field length and type to the settings as defined in the data catalog. (see also attached file)
PATSTAT_field_data_types.xlsx
(19.08 KiB) Downloaded 42 times
In the attached excel, you will also find an overview of all tables and attributes as defined on our own local PATSTAT setup on a MS SQL server.
I added the query to create the data in the excel sheet so that you can run that on your MS SQL system and check for possible differences. In principle we do not provide loading scripts because experience has proven that it becomes a rather tedious exercise to discuss server settings, platforms and release updates on MS SQL. (And we are better at understanding Patent data then server installations. :) )
We hope you find it useful:
overview_PATSTAT_tables_columns.xlsx
(34.21 KiB) Downloaded 275 times
Attachments
PATSTAT_field_data_types.xlsx
(19.08 KiB) Downloaded 300 times
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org


Post Reply