Tuesday 19 May 2015

Issues while Importing data from Excel to SQL using SSIS - 2 - Alphanumeric data in a column

This is something I had faced long long time ago... 

This is what happened,I was given one excel sheet to import into database table. I created one package excel source and OLEDB destination..  Yes, I knowwwwww you geniuses, I could have done that using Import/Export wizard which also creates package. But I wanted to create package by myself … so.. created a package and ran it. Data got imported so without even querying the table I conveyed that data import is done and can be verified… To my surprise, I got an immediate reply with my senior in CC stating imported data is incorrect. In one of the column the text data was replaced by NULL during import … 

Now, in order to test what went wrong during import process I had to run it all over again and geniuses you know what I have package ready with me…   When verified I found out that that particular column had mixed type of values (numeric and character) so ideally excel should have treated this column of a TEXT data type. However, interestingly it did not and it treated it as a numeric column … Question is why did this happen? If we find out how does EXCEL decides on the data type of the column? We will have the solution. 



We all know that each installation does some registry entries/changes in the windows, which controls the future conduct of that software.. Therefore let us go to registry (as MSoffice suit was installed) and find out if we can gather any evidences..

Please open Run prompt  – type regedit.exe and registry editor will be displayed.. Then traverse to following path 

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel 


You will see following registry entries 

Picture

The highlighted entries 


  • ImportMixedTypes  - What should be the data type of the column with mixed type of data? Default is TEXT.
  • TypeGuessRows – The no. of rows analyzed by EXCEL before deciding the data type of the column. Default value is 8.


http://msdn.microsoft.com/en-us/library/ms141683.aspx

We just came to know that, by default excel treats the data type of the column with mixed types of data as TEXT but in our case it did not. We also came to know that excel scans first 8 rows before deciding the data type of the column. 

Let us check out the first 8 rows of the column which did not import properly…

1000
2000
ASBD
230
AHGS
1000
asd
1.25

Out of 8 rows 5 rows are of numeric and 3 contains the character.  TypeGuessRows property overrides the ImportMixedTypes property to decide the data type of the column. Thus resulted in the all the text values displayed as NULL. (In the below screenshot it can be observed)

Picture

Now, how do we make Excel to consider this columns data type as TEXT? Well, if we make EXCEL work in IMPORT mode then we should be good. Did I hear what IMPORT mode in EXCEL is?  Import mode is something where we explicitly tell EXCEL driver to consider the ImportMixedTypes registry setting. Now the question to ask is How do I do that ? It can be done by adding IMEX to the connection string of the EXCEL manager

Right Click on the Excel connection Manager -> Go to properties -> Connection string 

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=Excel file path\Test_IMEX.xlsx;Extended Properties="EXCEL 12.0;HDR=YES;"

After adding IMEX=1 it should look like the one given below

Provider=Microsoft.Jet.OLEDB.4.0;Data Source= Excel file path\Test_IMEX.xlsx;Extended Properties="EXCEL 12.0;HDR=YES;IMEX=1;"

Save it and re-run the package
Picture

Now,as you can see the text values displayed as they are. This way we can make EXCEL treat column as Excel. But similar setting won’t work if the first 8 rows of same data type say varchar / numeric and rest of rows have mixed type of data. In that case, EXCEL will treat the column datatype as char/numeric. Thus it is good practice to set TypeGuessRows=0. But if you have millions of rows in the sheet then it will hamper the performance since EXCEL will have go over each row from the sheet and then decide the data type of the column. 

Note : 

If you have the Excel provider as Microsoft.ACE.OLEDB.12.0; then you will have to check the registry setting of the EXCEL @

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\AccessConnectivityEngine\Engines\Excel


No comments:

Post a Comment

bloggerwidgets