Data import error of tls206_part02.csv

Here you can post your opinions, ask questions and share experiences on the PATSTAT product line. Please always indicate the PATSTAT edition (e.g. 2015 Autumn Edition) and the database (e.g. PATSTAT Online, MySQL, MS SQL Server, ...) you are using.
Post Reply

frankcpp
Posts: 2
Joined: Mon Mar 29, 2021 3:33 pm

Data import error of tls206_part02.csv

Post by frankcpp » Mon Mar 29, 2021 3:44 pm

Dear all,

Environment:
PATSTAT Autumn 2020
MS SQL Server 2019 on Windows 10

I used the build-in scripts (createDatabaseSQLserver.sql and alltablescreate.sql) to create the database and tables. When I imported tls206_part02.csv, I encountered some problems. My command and error log are provided as follows. I will be very grateful for your advice or help.

Table: dbo.tls206_person
Data: tls206_part02.csv

BULK INSERT dbo.tls206_person FROM 'D:\PATSTAT_Autumn_2020\data\tls206_part02.csv' WITH (FORMAT = 'CSV', FIRSTROW = 2, TABLOCK, ERRORFILE = 'F:\Error\tls206_part02.csv');

Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 8092915, column 2 (person_name).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 8179139, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 8594950, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 8684855, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 9337528, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 10067197, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 10452815, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 10470125, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 10837299, column 2 (person_name).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 10862858, column 2 (person_name).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 10862902, column 2 (person_name).
Msg 4865, Level 16, State 1, Line 85
Cannot bulk load because the maximum number of errors (10) was exceeded.
Msg 7399, Level 16, State 1, Line 85
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 85
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

Many thanks,
Frank


mkracker
Posts: 119
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Data import error of tls206_part02.csv

Post by mkracker » Tue Mar 30, 2021 9:37 am

Hi Frank,

It is difficult to comment regarding data loading, because it depends on the tools and the configuration used. So I had a look at the data to see what's so special for the data you reported. The first error in your log file refers to row 8092915, column 2 (person_name). I checked this row in the csv file: It has the PERSON_ID = 58108440 and an unusual long person name in Cyrillic. I retrieved this record in my PATSTAT database:

select len(person_name) as NumberOfCharacters, datalength(person_name) as NumberOfBytes, *
from tls206_person
where person_id = 58108440

I see that the PERSON_NAME has 326 characters with 652 bytes. The CSV files are UTF-8 encoded, so Cyrillic characters to my knowledge need 2 bytes, ASCII characters (blanks, quotes, ..) need 1 byte.
I suppose your DB column PERSON_NAME is defined as nvarchar(500), as suggested in the PATSTAT Data Catalog.
So I assume that your data import does not handle UTF-8 encodings correctly.

You reported that you are using MS SQL Sever version 2019 - which I never used. In this version there are new UTF-8 collations, so things may work differently.

Can anybody else with more experience with data loading into MS SQL server 2019 give advise?
-------------------------------------------
Martin Kracker / EPO


frankcpp
Posts: 2
Joined: Mon Mar 29, 2021 3:33 pm

Re: Data import error of tls206_part02.csv

Post by frankcpp » Tue Mar 30, 2021 12:55 pm

Hi Martin,

I have uninstalled MS SQL 2019 and changed it to MS SQL 2017. However, the problems are still the same. I used the built-in scripts to create the database and tables.

The collation defined in createDatabaseSQLserver.sql is: SQL_Latin1_General_CP1_CI_AS. And I didn't change it.

My data import command is very simple: BULK INSERT dbo.tls206_person FROM 'D:\PATSTAT_Autumn_2020\data\tls206_part02.csv' WITH (FORMAT = 'CSV', FIRSTROW = 2, TABLOCK, ERRORFILE = 'F:\Error\tls206_part02.csv');

Could you please tell me which version you are using?

Thanks,
Frank


mkracker
Posts: 119
Joined: Wed Sep 04, 2013 6:17 am
Location: Vienna

Re: Data import error of tls206_part02.csv

Post by mkracker » Thu Apr 01, 2021 7:00 am

Hi Frank,

We work with an MS SQL Server 2017 database, from which we are extracting the CSV files,. So I myself have no hands-on experience in loading CSV files into the database.

Nevertheless, according to the documentation of the BULK INSERT https://docs.microsoft.com/en-us/sql/t- ... rver-ver15 you might need to add something like CODEPAGE = '65001' to your statement.

I hope this helps,
Martin
-------------------------------------------
Martin Kracker / EPO


Post Reply