At the EPO we use MS SQL Enterprise edition 2017.
This means we can use the Bulk data tools to import the data. The scripts might not work on earlier versions of MS SQL server. So, this is what works for us:
1) create database (similar to what you find in the documentation):
Code: Select all
USE [master]
GO
CREATE DATABASE [patstat2019b_test] ON PRIMARY
( NAME = N'patstat2019b_test_dat', FILENAME = N'G:\workdata\DB\patstat2019b_testdat.mdf' , SIZE = 200MB , MAXSIZE = UNLIMITED , FILEGROWTH = 51200KB )
LOG ON
( NAME = N'patstat2019b_test_log', FILENAME = N'G:\workdata\DB\patstat2019b_testlog.ldf' , SIZE = 10MB , MAXSIZE = UNLIMITED , FILEGROWTH = 15360KB )
COLLATE SQL_Latin1_General_CP1_CI_AS
GO
EXEC dbo.sp_dbcmptlevel @dbname=N'patstat2019b_test', @new_cmptlevel=100
GO
IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [patstat2019b_test].[dbo].[sp_fulltext_database] @action = 'enable'
end
GO
ALTER DATABASE [patstat2019b_test] SET ANSI_NULL_DEFAULT OFF
GO
ALTER DATABASE [patstat2019b_test] SET ANSI_NULLS OFF
GO
ALTER DATABASE [patstat2019b_test] SET ANSI_PADDING OFF
GO
ALTER DATABASE [patstat2019b_test] SET ANSI_WARNINGS OFF
GO
ALTER DATABASE [patstat2019b_test] SET ARITHABORT OFF
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_CLOSE OFF
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_CREATE_STATISTICS ON
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_SHRINK OFF
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_UPDATE_STATISTICS ON
GO
ALTER DATABASE [patstat2019b_test] SET CURSOR_CLOSE_ON_COMMIT OFF
GO
ALTER DATABASE [patstat2019b_test] SET CURSOR_DEFAULT GLOBAL
GO
ALTER DATABASE [patstat2019b_test] SET CONCAT_NULL_YIELDS_NULL OFF
GO
ALTER DATABASE [patstat2019b_test] SET NUMERIC_ROUNDABORT OFF
GO
ALTER DATABASE [patstat2019b_test] SET QUOTED_IDENTIFIER OFF
GO
ALTER DATABASE [patstat2019b_test] SET RECURSIVE_TRIGGERS OFF
GO
ALTER DATABASE [patstat2019b_test] SET DISABLE_BROKER
GO
ALTER DATABASE [patstat2019b_test] SET AUTO_UPDATE_STATISTICS_ASYNC OFF
GO
ALTER DATABASE [patstat2019b_test] SET DATE_CORRELATION_OPTIMIZATION OFF
GO
ALTER DATABASE [patstat2019b_test] SET TRUSTWORTHY OFF
GO
ALTER DATABASE [patstat2019b_test] SET ALLOW_SNAPSHOT_ISOLATION OFF
GO
ALTER DATABASE [patstat2019b_test] SET PARAMETERIZATION SIMPLE
GO
ALTER DATABASE [patstat2019b_test] SET READ_WRITE
GO
ALTER DATABASE [patstat2019b_test] SET RECOVERY FULL
GO
ALTER DATABASE [patstat2019b_test] SET MULTI_USER
GO
ALTER DATABASE [patstat2019b_test] SET PAGE_VERIFY CHECKSUM
GO
ALTER DATABASE [patstat2019b_test] SET DB_CHAINING OFF
Code: Select all
USE [patstat2019b_test]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tls201_appln](
[appln_id] [int] NOT NULL DEFAULT ('0'),
[appln_auth] [char](2) NOT NULL DEFAULT (''),
[appln_nr] [varchar](15) NOT NULL DEFAULT (''),
[appln_kind] [char](2) NOT NULL DEFAULT (' '),
[appln_filing_date] [date] NOT NULL DEFAULT ('9999-12-31'),
[appln_filing_year] [smallint] NOT NULL DEFAULT '9999',
[appln_nr_epodoc] [varchar](20) NOT NULL DEFAULT (''),
[appln_nr_original] [varchar](100) NOT NULL DEFAULT (''),
[ipr_type] [char](2) NOT NULL DEFAULT (''),
[receiving_office] [char](2) NOT NULL DEFAULT (''),
[internat_appln_id] [int] NOT NULL DEFAULT ('0'),
[int_phase] [char](1) NOT NULL DEFAULT ('N'),
[reg_phase] [char](1) NOT NULL DEFAULT ('N'),
[nat_phase] [char](1) NOT NULL DEFAULT ('N'),
[earliest_filing_date] [date] NOT NULL DEFAULT ('9999-12-31'),
[earliest_filing_year] [smallint] NOT NULL DEFAULT '9999',
[earliest_filing_id] [int] NOT NULL DEFAULT '0',
[earliest_publn_date] [date] NOT NULL DEFAULT ('9999-12-31'),
[earliest_publn_year] [smallint] NOT NULL DEFAULT '9999',
[earliest_pat_publn_id] [int] NOT NULL DEFAULT '0',
[granted] [char](1) NOT NULL DEFAULT ('N'),
[docdb_family_id] [int] NOT NULL DEFAULT ('0'),
[inpadoc_family_id] [int] NOT NULL DEFAULT ('0'),
[docdb_family_size] [smallint] NOT NULL default '0',
[nb_citing_docdb_fam] [smallint] NOT NULL default '0',
[nb_applicants] [smallint] NOT NULL default '0',
[nb_inventors] [smallint] NOT NULL default '0',
PRIMARY KEY CLUSTERED
(
[appln_id] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Observations: the row terminator according to our documentation should be \r\n;
MS SQL server 2017 adds this automatically when the FORMAT = 'CSV' is specified. If you add it anyway, it will give an error !
FIELDQUOTE removes the quotes from text fields on import; this does not work with pre-2017 MS SQL server editions. And CODEPAGE = '65001' needs to be explicitly defined.
Code: Select all
BULK INSERT tls201_appln
FROM '\\172.21.1.151\TempData\tls201_part01\tls201_part01.txt'
WITH (FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR='\n',
FIELDQUOTE = '"',
BATCHSIZE=250000,
FORMAT = 'CSV',
CODEPAGE = '65001',
MAXERRORS=2);
The IMPORT data allows you to let the wizard define type and length of columns in the table. You can in principle even skip the separate step of creating the tables because the wizard will do it for you.
But don't forget to define that you want to ADD records starting from the second CSV file onwards for each table.
You WILL HAVE TO CHECK/ADAPT field length and type to the settings as defined in the data catalog. (see also attached file) In the attached excel, you will also find an overview of all tables and attributes as defined on our own local PATSTAT setup on a MS SQL server.
I added the query to create the data in the excel sheet so that you can run that on your MS SQL system and check for possible differences. In principle we do not provide loading scripts because experience has proven that it becomes a rather tedious exercise to discuss server settings, platforms and release updates on MS SQL. (And we are better at understanding Patent data then server installations. )
We hope you find it useful: