Dear all,
Environment:
PATSTAT Autumn 2020
MS SQL Server 2019 on Windows 10
I used the build-in scripts (createDatabaseSQLserver.sql and alltablescreate.sql) to create the database and tables. When I imported tls206_part02.csv, I encountered some problems. My command and error log are provided as follows. I will be very grateful for your advice or help.
Table: dbo.tls206_person
Data: tls206_part02.csv
BULK INSERT dbo.tls206_person FROM 'D:\PATSTAT_Autumn_2020\data\tls206_part02.csv' WITH (FORMAT = 'CSV', FIRSTROW = 2, TABLOCK, ERRORFILE = 'F:\Error\tls206_part02.csv');
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 8092915, column 2 (person_name).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 8179139, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 8594950, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 8684855, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 9337528, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 10067197, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 10452815, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 10470125, column 3 (person_name_orig_lg).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 10837299, column 2 (person_name).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 10862858, column 2 (person_name).
Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 10862902, column 2 (person_name).
Msg 4865, Level 16, State 1, Line 85
Cannot bulk load because the maximum number of errors (10) was exceeded.
Msg 7399, Level 16, State 1, Line 85
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 85
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
Many thanks,
Frank
Data import error of tls206_part02.csv
Re: Data import error of tls206_part02.csv
Hi Frank,
It is difficult to comment regarding data loading, because it depends on the tools and the configuration used. So I had a look at the data to see what's so special for the data you reported. The first error in your log file refers to row 8092915, column 2 (person_name). I checked this row in the csv file: It has the PERSON_ID = 58108440 and an unusual long person name in Cyrillic. I retrieved this record in my PATSTAT database:
select len(person_name) as NumberOfCharacters, datalength(person_name) as NumberOfBytes, *
from tls206_person
where person_id = 58108440
I see that the PERSON_NAME has 326 characters with 652 bytes. The CSV files are UTF-8 encoded, so Cyrillic characters to my knowledge need 2 bytes, ASCII characters (blanks, quotes, ..) need 1 byte.
I suppose your DB column PERSON_NAME is defined as nvarchar(500), as suggested in the PATSTAT Data Catalog.
So I assume that your data import does not handle UTF-8 encodings correctly.
You reported that you are using MS SQL Sever version 2019 - which I never used. In this version there are new UTF-8 collations, so things may work differently.
Can anybody else with more experience with data loading into MS SQL server 2019 give advise?
It is difficult to comment regarding data loading, because it depends on the tools and the configuration used. So I had a look at the data to see what's so special for the data you reported. The first error in your log file refers to row 8092915, column 2 (person_name). I checked this row in the csv file: It has the PERSON_ID = 58108440 and an unusual long person name in Cyrillic. I retrieved this record in my PATSTAT database:
select len(person_name) as NumberOfCharacters, datalength(person_name) as NumberOfBytes, *
from tls206_person
where person_id = 58108440
I see that the PERSON_NAME has 326 characters with 652 bytes. The CSV files are UTF-8 encoded, so Cyrillic characters to my knowledge need 2 bytes, ASCII characters (blanks, quotes, ..) need 1 byte.
I suppose your DB column PERSON_NAME is defined as nvarchar(500), as suggested in the PATSTAT Data Catalog.
So I assume that your data import does not handle UTF-8 encodings correctly.
You reported that you are using MS SQL Sever version 2019 - which I never used. In this version there are new UTF-8 collations, so things may work differently.
Can anybody else with more experience with data loading into MS SQL server 2019 give advise?
-------------------------------------------
Martin Kracker / EPO
Martin Kracker / EPO
Re: Data import error of tls206_part02.csv
Hi Martin,
I have uninstalled MS SQL 2019 and changed it to MS SQL 2017. However, the problems are still the same. I used the built-in scripts to create the database and tables.
The collation defined in createDatabaseSQLserver.sql is: SQL_Latin1_General_CP1_CI_AS. And I didn't change it.
My data import command is very simple: BULK INSERT dbo.tls206_person FROM 'D:\PATSTAT_Autumn_2020\data\tls206_part02.csv' WITH (FORMAT = 'CSV', FIRSTROW = 2, TABLOCK, ERRORFILE = 'F:\Error\tls206_part02.csv');
Could you please tell me which version you are using?
Thanks,
Frank
I have uninstalled MS SQL 2019 and changed it to MS SQL 2017. However, the problems are still the same. I used the built-in scripts to create the database and tables.
The collation defined in createDatabaseSQLserver.sql is: SQL_Latin1_General_CP1_CI_AS. And I didn't change it.
My data import command is very simple: BULK INSERT dbo.tls206_person FROM 'D:\PATSTAT_Autumn_2020\data\tls206_part02.csv' WITH (FORMAT = 'CSV', FIRSTROW = 2, TABLOCK, ERRORFILE = 'F:\Error\tls206_part02.csv');
Could you please tell me which version you are using?
Thanks,
Frank
Re: Data import error of tls206_part02.csv
Hi Frank,
We work with an MS SQL Server 2017 database, from which we are extracting the CSV files,. So I myself have no hands-on experience in loading CSV files into the database.
Nevertheless, according to the documentation of the BULK INSERT https://docs.microsoft.com/en-us/sql/t- ... rver-ver15 you might need to add something like CODEPAGE = '65001' to your statement.
I hope this helps,
Martin
We work with an MS SQL Server 2017 database, from which we are extracting the CSV files,. So I myself have no hands-on experience in loading CSV files into the database.
Nevertheless, according to the documentation of the BULK INSERT https://docs.microsoft.com/en-us/sql/t- ... rver-ver15 you might need to add something like CODEPAGE = '65001' to your statement.
I hope this helps,
Martin
-------------------------------------------
Martin Kracker / EPO
Martin Kracker / EPO
-
- Posts: 3
- Joined: Thu May 06, 2021 1:50 pm
Re: Data import error of tls206_part02.csv
Hi people,
I encountered an identical error while importing CSV tls206_part02.csv
Have you found any solution in the meantime?
Best regards,
Adnan
I encountered an identical error while importing CSV tls206_part02.csv
Have you found any solution in the meantime?
Best regards,
Adnan
-
- Posts: 440
- Joined: Thu Feb 22, 2007 5:33 pm
- Contact:
Re: Data import error of tls206_part02.csv
Hello Adnan,
here is the script that works fine on our MS SQL server to create the table: (from the scripts in the documentation):
And to bulk load data from the second part of the tls206_person file on the above table:
This took about 2.5 minutes, no errors.
here is the script that works fine on our MS SQL server to create the table: (from the scripts in the documentation):
Code: Select all
USE [patstat2021a]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tls206_person](
[person_id] [int] NOT NULL DEFAULT ('0'),
[person_name] [nvarchar](500) NOT NULL DEFAULT (''),
[person_name_orig_lg] [nvarchar](500) NOT NULL DEFAULT (''),
[person_address] [nvarchar](1000) NOT NULL DEFAULT (''),
[person_ctry_code] [char](2) NOT NULL DEFAULT (''),
[nuts] [varchar](5) NOT NULL DEFAULT '',
[nuts_level] [tinyint] NOT NULL DEFAULT ('9'),
[doc_std_name_id] [int] NOT NULL DEFAULT ('0'),
[doc_std_name] [nvarchar](500) NOT NULL DEFAULT (''),
[psn_id] [int] NOT NULL DEFAULT ('0'),
[psn_name] [nvarchar](500) NOT NULL DEFAULT (''),
[psn_level] [tinyint] NOT NULL DEFAULT ('0'),
[psn_sector] [varchar](50) NOT NULL DEFAULT (''),
[han_id] [int] NOT NULL DEFAULT ('0'),
[han_name] [nvarchar](500) NOT NULL DEFAULT (''),
[han_harmonized] [int] NOT NULL DEFAULT ('0'),
PRIMARY KEY CLUSTERED
(
[person_id] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Code: Select all
BULK INSERT tls206_person
FROM '\\172.21.1.151\TempData\tls206_part02\tls206_part02.csv'
WITH (FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR='\n',
CODEPAGE = '65001',
FIELDQUOTE = '"',
BATCHSIZE=250000,
FORMAT = 'CSV',
MAXERRORS=2);
PATSTAT Support Team
EPO - Vienna
patstat @ epo.org
EPO - Vienna
patstat @ epo.org
-
- Posts: 3
- Joined: Thu May 06, 2021 1:50 pm
Re: Data import error of tls206_part02.csv
@EPO / PATSAT Support - Thank you very much for your fast reply.
I also use similar scripts for IMPORT, but in this case for the tls206_person table it didn't help me much. Here is what one row looked like in this particular CSV file:
After a deep analysis I was able to solve this issue. There was a problem with Encoding, at the operating system level. The solution is as follows:
For Windows 10 operating systems: If you notice that Encoding or the display of special characters is incorrect (this especially applies to text fields) you must do the following:
Control Panel > Region > Tab Administrative
Hit button "Change system locale".
And check the checkbox labeled "Beta: Use Unicode UTF-8 for worldwide language support".
Hit the ok button. You will be asked to restart the computer.
After this change, the import went smoothly. And CSV records were displayed correctly.
I also use similar scripts for IMPORT, but in this case for the tls206_person table it didn't help me much. Here is what one row looked like in this particular CSV file:
After a deep analysis I was able to solve this issue. There was a problem with Encoding, at the operating system level. The solution is as follows:
For Windows 10 operating systems: If you notice that Encoding or the display of special characters is incorrect (this especially applies to text fields) you must do the following:
Control Panel > Region > Tab Administrative
Hit button "Change system locale".
And check the checkbox labeled "Beta: Use Unicode UTF-8 for worldwide language support".
Hit the ok button. You will be asked to restart the computer.
After this change, the import went smoothly. And CSV records were displayed correctly.
-
- Posts: 1
- Joined: Wed Mar 23, 2022 8:52 am
Re: Data import error of tls206_part02.csv
Hi guys,
You said that you're using Microsoft SQL Server 2019 - which I've never used. There are new UTF-8 collations in this release, so things may operate differently.
You said that you're using Microsoft SQL Server 2019 - which I've never used. There are new UTF-8 collations in this release, so things may operate differently.