none
PowerShell load data from excel to SQL Server table

    Question

  • I am dynamically loading different files from excel files to SQL server tables. Some of the files have millions of rows. Since this is a dynamic process I have decided to use PowerShell. The script below works perfectly well for small files. Big files are partially loaded before the process errors.  I was wondering if someone can help me improve my script so that it can load in batches (say 100 000 rows per time) instead of trying to get everything into memory first and then bulk load it to the destination table. I have seen a few examples online where the process is set to loop on the data files line by line. While that works, it however takes AGES.

     

    $filepath = 'directory_path\my_excel_file.xls'

    #Connection String for Excel 2007:

    $connString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=`"$filepath`";Extended Properties=`"Excel 12.0 Xml;HDR=YES`";"

    #Connection String for Excel 2003:

    #$connString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=`"$filepath`";Extended Properties=`"Excel 8.0;HDR=Yes;IMEX=1`";"

    $qry = 'select * from [Sheet1$]'

    $sqlserver = 'my_server_name'

    $dbname = 'my_target_database'

    #Create a table in destination database with the with referenced columns and table name.

    $tblname = 'temp_PATALGCD'

    #######################

    function Get-ExcelData

    {

        param($connString, $qry='select * from [sheet1$]')

        $conn = new-object System.Data.OleDb.OleDbConnection($connString)

        $conn.open()

        $cmd = new-object System.Data.OleDb.OleDbCommand($qry,$conn)

        $da = new-object System.Data.OleDb.OleDbDataAdapter($cmd)

        $dt = new-object System.Data.dataTable

        [void]$da.fill($dt)

        $conn.close()

        $dt

    } #Get-ExcelData

    #######################

    function Write-DataTableToDatabase

    {

        param($dt,$destServer,$destDb,$destTbl)

        $connectionString = "Data Source=$destServer;Integrated Security=true;Initial Catalog=$destdb;"

        $bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $connectionString

        $bulkCopy.DestinationTableName = "$destTbl"

        $bulkCopy.WriteToServer($dt)

    }# Write-DataTableToDatabase

    #######################

    $dt = Get-ExcelData $connString $qry

    Write-DataTableToDatabase $dt $sqlserver $dbname $tblname

     


    Mpumelelo

    • Moved by ArthurZ Monday, February 1, 2016 3:09 PM Does not appear SSIS related
    Monday, February 1, 2016 2:47 PM

Answers

  • Try it like this to see if it will adjust on its own.

    $filepath = 'my_excel_file.xls'
    $excelConnection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=$filepath;Extended Properties='Excel 12.0 Xml;HDR=YES;IMEX=1'"
    $sqlConnection = 'Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=my_database_name;Data Source=my_server_name;'
    $excelQuery = 'select * from [Sheet1$]'
    $tablename = 'temp_PATALLERGY'
    
    Try {
    	$conn = New-Object System.Data.OleDb.OleDbConnection($excelConnection)
    	$conn.open()
    	$cmd = $conn.CreateCommand()
    	$cmd.CommandText = $excelQuery
    	$rdr = $cmd.ExecuteReader()
    	$sqlbc = New-Object System.Data.SqlClient.SqlBulkCopy($sqlConnection, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
    	$sqlbc.DestinationTableName = $tableName
    	#$sqlbc.Batchsize = 1000
    	#$sqlbc.BulkCopyTimeout = 60
    	#$sqlbc.EnableStreaming=$true
    	
    	# add all  columns - you can  add as few  as you like.
    	for ($i = 0; $i -lt $rdr.FieldCount; $i++) {
    		$fname = $rdr.GetName($i)
    		Write-Host $fname -ForegroundColor green
    		[void]$sqlbc.ColumnMappings.Add($fname, $fname)
    	}
    	
    	# write all of the data to the table
    	Try {
    		$sqlbc.WriteToServer($rdr)
    	} Catch {
    		Write-Host "$_" -Fore red -back white
    	} Finally {
    		$rdr.Close()	
    	}
    } Catch {
    	Write-Host "$_" -ForegroundColor red
    }
    
    

    I generally have never set the batch size.

    If this works then try uncommenting streamingreader as that should be what makes the batches complete.  Issue is that if patches are missing or database is not set up correctly the streaming reader might fail.


    \_(ツ)_/

    • Marked as answer by Mpumelelo S Wednesday, February 3, 2016 2:09 PM
    Tuesday, February 2, 2016 7:16 PM
    Moderator
  • You need to use PowerShell 4 to use EnableStreaming It is only valid with Net 4.5.  You also need to be sure you have the latest patches and SPs on the client system. 

    Here si more background: https://blog.netnerds.net/2015/01/powershell-high-performance-techniques-for-importing-csv-to-sql-server/


    \_(ツ)_/




    • Edited by jrvModerator Tuesday, February 2, 2016 7:25 PM
    • Marked as answer by Mpumelelo S Wednesday, February 3, 2016 2:09 PM
    Tuesday, February 2, 2016 7:23 PM
    Moderator

All replies

  • Use BCP to load large data files. It is extremely fast  On a server we can load millions of rows.  The load can exceed 100,000 records/second. 

    Search for the docs on BCP.  They are in the SQLServer document set which is downloadable from MS.


    \_(ツ)_/

    Monday, February 1, 2016 6:00 PM
    Moderator
  • Here is how to use a data reader to do a bulk upload.  It does not load the data into memory. It just reads the file a few records at a time.  YOU can set options on the bulk copy to determine how big the batches are.  Batches need to be tuned to the server and data so look up and read how they affect the process.

    # Copy-SQLBulk.ps1
    # This shows how to use BulkCopy in PowerShell by uploading a spreadsheet to an MSSQLServer data table.
    
    $filepath = 'C:\scripts\All.xlsx'
    $excelConnection="Provider=Microsoft.ACE.OLEDB.12.0;Data Source=$filepath;Extended Properties='Excel 12.0 Xml;HDR=YES;IMEX=1'"
    $sqlConnection='Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=issue;Data Source=OMEGA\SQLEXPRESS;'
    $excelQuery='select * from [SearchResult$]'
    $tablename='SearchResult'
    
    Try{
        $conn = New-Object System.Data.OleDb.OleDbConnection($excelConnection) 
        $conn.open()
        $cmd=$conn.CreateCommand()
        $cmd.CommandText=$excelQuery
        $rdr=$cmd.ExecuteReader()
        
        # create the BC object
        $sqlbc=[System.Data.SqlClient.SqlBulkCopy]$sqlConnection
        $sqlbc.DestinationTableName=$tableName
        
        # add all columns - you can add as few  as you like.
        for($i=0; $i -lt $rdr.FieldCount;$i++){
            $fname=$rdr.GetName($i)
            Write-Host $fname -ForegroundColor green
            [void]$sqlbc.ColumnMappings.Add($fname, $fname)
        }
        
        # write all of the data to the table
        $sqlbc.WriteToServer($rdr)
    }
    Catch{
        Write-Host "$_" -ForegroundColor red
    }
    
    $sqlbc.Close()
    $conn.Close()
    
    

    Notice that SQLBulkCopy can take a reader as well as a table object.  The reader is much more efficient and only buffers as much as needed to keep up with the server.  THre is no reasonable limit to the size of the file.


    \_(ツ)_/


    • Edited by jrvModerator Monday, February 1, 2016 6:16 PM
    • Marked as answer by Mpumelelo S Tuesday, February 2, 2016 3:10 PM
    • Unmarked as answer by Mpumelelo S Wednesday, February 3, 2016 2:11 PM
    Monday, February 1, 2016 6:11 PM
    Moderator
  • Thank you jrv. Just seen your responses and the day is over where I am. I will try your suggested solutions tomorrow at work. However, I have one question about BCP. Can BCP work on excel spreadsheets? Actually, before your reply came in about BCP I was already thinking about something along those lines. I’m planning to programmatically convert excel files to tab delimited text files from where I will run BCP queries. However, if it is possible to run BCP directly on excel files that will be even better for me. I would appreciate a sample BCP script for excel files.

    Kind regards,


    Mpumelelo

    • Edited by Mpumelelo S Monday, February 1, 2016 7:55 PM
    Monday, February 1, 2016 7:52 PM
  • Simple answer - no.

    SQLServer can load Excel file with SSIS.  If this is s0omething yu do often then I recommend using SSIS to stage the data.  It is extremely powerful and can be tightly managed and secured under SQS.


    \_(ツ)_/

    Monday, February 1, 2016 8:11 PM
    Moderator
  • I am very much aware about the abilities of SSIS. But my situation is that I have more than 500 files to load, all with different structures. The most convenient way of dealing with that is loading those files dynamically to 500+ destination tables. I could have done that using C# on the Script Task in SSIS, but my C# skills are not that great. I managed though to come up with a C# script but, just like what happened in PowerShell, the system complained of memory. The datatable was too big to be kept in memory.

    I think I will try that PowerShell script that you have provided. I will also try converting the excel files to tab delimited files and then use BCP. I believe one of those options will work.

    Thanks,


    Mpumelelo

    Monday, February 1, 2016 8:48 PM
  • The SQLBulk creates the target table based on the source,  SSIS does the same thing.  A simple table loader is nearly identical to the code I posted above.  The same code works on all tables.

    I have set up loaders that load 400 to 700+ million records per day into nearly one hundred tables as a staging server for a data warehouse data transform system.  Files were of all types including Excel.


    \_(ツ)_/

    Monday, February 1, 2016 9:15 PM
    Moderator
  • I have tried to implement the PowerShell script that you provided. It looks like its working. However, I'm receiving this error:

    Exception calling "WriteToServer" with "1" argument(s): "Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding."


    Mpumelelo

    Tuesday, February 2, 2016 2:14 PM
  •  

    you need to either increase the timeout or decrease the batch size.  Look upi the SQLBulkCopy options.

    https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy_properties(v=vs.110).aspx


    \_(ツ)_/


    Tuesday, February 2, 2016 2:25 PM
    Moderator
  • I'm sorry this might sound like a dumb question. But is this (increase of timeout) something that I can do on the script that you have provided or its a .NET operation that I have to carry out? Either way, I'm not sure how I can do it. I know that in the Execute Process Task I can adjust the TimeOut settings. I'm however testing this script directly on the PowerShell command window first to make sure that it is working first before using the Execute Process Task to fire it.

    Thanks,


    Mpumelelo

    Tuesday, February 2, 2016 2:38 PM
  • $sqlbc=New-Object System.Data.SqlClient.SqlBulkCopy($sqlConnection,[System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
    $sqlbc.EnableStreaming=$true
    $sqlbc.Batchsize=1000
    $sqlbc.BulkCopyTimeout=60
    
    Using a table lock will reduce resource usage and improve performance.  Adjust batch size up for best performance.  Allowing streaming should stop the timeout.



    \_(ツ)_/

    • Marked as answer by Mpumelelo S Tuesday, February 2, 2016 3:10 PM
    • Unmarked as answer by Mpumelelo S Wednesday, February 3, 2016 2:10 PM
    Tuesday, February 2, 2016 2:39 PM
    Moderator
  • Thanks. Let me try this. I will be back with the results.

    Mpumelelo

    Tuesday, February 2, 2016 2:41 PM
  • Another error is coming up now:


    Mpumelelo


    • Edited by Mpumelelo S Tuesday, February 2, 2016 2:52 PM
    Tuesday, February 2, 2016 2:51 PM
  • Its working now. Many thanks jrv!!! Your help has been very much useful. This is the final code that I have used:

    # Copy-SQLBulk.ps1

    # This shows how to use BulkCopy in PowerShell by uploading a spreadsheet to an MSSQLServer data table.

     

    $filepath = 'my_excel_file.xls'

    $excelConnection="Provider=Microsoft.ACE.OLEDB.12.0;Data Source=$filepath;Extended Properties='Excel 12.0 Xml;HDR=YES;IMEX=1'"

    $sqlConnection='Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=my_database_name;Data Source=my_server_name;'

    $excelQuery='select * from [Sheet1$]'

    $tablename='temp_PATALLERGY'

     

    Try{

        $conn = New-Object System.Data.OleDb.OleDbConnection($excelConnection)

        $conn.open()

        $cmd=$conn.CreateCommand()

        $cmd.CommandText=$excelQuery

        $rdr=$cmd.ExecuteReader()

       

        # create the BC object

        #$sqlbc=[System.Data.SqlClient.SqlBulkCopy]$sqlConnection

       

        $sqlbc=New-Object System.Data.SqlClient.SqlBulkCopy($sqlConnection,[System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)

          #$sqlbc.EnableStreaming=$true

          $sqlbc.Batchsize=1000

          $sqlbc.BulkCopyTimeout=60

          $sqlbc.DestinationTableName=$tableName

       

        # add all columns - you can add as few  as you like.

        for($i=0; $i -lt $rdr.FieldCount;$i++){

            $fname=$rdr.GetName($i)

            Write-Host $fname -ForegroundColor green

            [void]$sqlbc.ColumnMappings.Add($fname, $fname)

        }

       

        # write all of the data to the table

        $sqlbc.WriteToServer($rdr)

    }

    Catch{

        Write-Host "$_" -ForegroundColor red

    }

     

    $sqlbc.Close()

    $conn.Close()


    Mpumelelo

    • Marked as answer by Mpumelelo S Tuesday, February 2, 2016 3:10 PM
    • Unmarked as answer by Mpumelelo S Wednesday, February 3, 2016 2:10 PM
    Tuesday, February 2, 2016 3:10 PM
  • Great.  It should be extremely fast as long as the server is not too busy.

    \_(ツ)_/

    Tuesday, February 2, 2016 4:00 PM
    Moderator
  • Unfortunately I’m back. The script is working, syntactically, that is. However, it looks like it is inconsistent in the way that it loads data. The table below shows the results that I have received from three different SSIS test runs of the data coming from the same file that has 921,872 rows.

    Rows loaded

    Batch size

    Just over 400,000

    10,000

    200,000 exactly

    50,000

    280,000 exactly

    35,000

     

    What could be the reason for the different load sizes?


    Mpumelelo

    Tuesday, February 2, 2016 5:51 PM
  • No idea.  What are you setting the batch size to?
    What difference does the batch size make?  Where are you getting this information?

    \_(ツ)_/

    Tuesday, February 2, 2016 6:00 PM
    Moderator
  • The information is coming from an excel spreadsheet (.xls). The batch sizes are shown on my previous message. Looks like the smaller the batch size the more the data that is loaded.

    Thanks,


    Mpumelelo

    Tuesday, February 2, 2016 6:03 PM
  • You are not clear.  How are you setting the batch  size?  How are you getting the table you posted?

    \_(ツ)_/

    Tuesday, February 2, 2016 6:07 PM
    Moderator
  • On the snippet below, I change the batch size accordingly. For example, at the moment the batch size shown is 1000. I simply change that to another figure, say 10,000. Hope that’s how it should be.

     

         $sqlbc.Batchsize=1000

          $sqlbc.BulkCopyTimeout=60

          $sqlbc.DestinationTableName=$tableName

     


    Mpumelelo

    Tuesday, February 2, 2016 6:48 PM
  • That table that I posted is a manually created object just for clarity. That is, meant to express or define my point.

     

    Rows loaded

    Batch size

    First test run

    Just over 400,000

    10,000

    Second test run

    200,000 exactly

    50,000

    Third test run

    280,000 exactly

    35,000

     

    Thanks,

    Mpumelelo


    • Edited by Mpumelelo S Tuesday, February 2, 2016 6:56 PM
    Tuesday, February 2, 2016 6:53 PM
  • So you are saying that if you set the batch size it doesn't load all of the rows.  I cannot understand that if that is what you are trying to say.


    \_(ツ)_/

    Tuesday, February 2, 2016 7:02 PM
    Moderator
  • It appears like if I set the batch size to a bigger value the process finishes running prematurely. The bigger the batch size, the lesser are the total number of rows inserted.

    Mpumelelo

    Tuesday, February 2, 2016 7:06 PM
  • Try it like this to see if it will adjust on its own.

    $filepath = 'my_excel_file.xls'
    $excelConnection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=$filepath;Extended Properties='Excel 12.0 Xml;HDR=YES;IMEX=1'"
    $sqlConnection = 'Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=my_database_name;Data Source=my_server_name;'
    $excelQuery = 'select * from [Sheet1$]'
    $tablename = 'temp_PATALLERGY'
    
    Try {
    	$conn = New-Object System.Data.OleDb.OleDbConnection($excelConnection)
    	$conn.open()
    	$cmd = $conn.CreateCommand()
    	$cmd.CommandText = $excelQuery
    	$rdr = $cmd.ExecuteReader()
    	$sqlbc = New-Object System.Data.SqlClient.SqlBulkCopy($sqlConnection, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
    	$sqlbc.DestinationTableName = $tableName
    	#$sqlbc.Batchsize = 1000
    	#$sqlbc.BulkCopyTimeout = 60
    	#$sqlbc.EnableStreaming=$true
    	
    	# add all  columns - you can  add as few  as you like.
    	for ($i = 0; $i -lt $rdr.FieldCount; $i++) {
    		$fname = $rdr.GetName($i)
    		Write-Host $fname -ForegroundColor green
    		[void]$sqlbc.ColumnMappings.Add($fname, $fname)
    	}
    	
    	# write all of the data to the table
    	Try {
    		$sqlbc.WriteToServer($rdr)
    	} Catch {
    		Write-Host "$_" -Fore red -back white
    	} Finally {
    		$rdr.Close()	
    	}
    } Catch {
    	Write-Host "$_" -ForegroundColor red
    }
    
    

    I generally have never set the batch size.

    If this works then try uncommenting streamingreader as that should be what makes the batches complete.  Issue is that if patches are missing or database is not set up correctly the streaming reader might fail.


    \_(ツ)_/

    • Marked as answer by Mpumelelo S Wednesday, February 3, 2016 2:09 PM
    Tuesday, February 2, 2016 7:16 PM
    Moderator
  • Thank you jrv. I will try that tomorrow at work.

    Mpumelelo

    Tuesday, February 2, 2016 7:19 PM
  • You need to use PowerShell 4 to use EnableStreaming It is only valid with Net 4.5.  You also need to be sure you have the latest patches and SPs on the client system. 

    Here si more background: https://blog.netnerds.net/2015/01/powershell-high-performance-techniques-for-importing-csv-to-sql-server/


    \_(ツ)_/




    • Edited by jrvModerator Tuesday, February 2, 2016 7:25 PM
    • Marked as answer by Mpumelelo S Wednesday, February 3, 2016 2:09 PM
    Tuesday, February 2, 2016 7:23 PM
    Moderator
  • Thank you. I will check if my system has those specs.

    Mpumelelo

    Tuesday, February 2, 2016 7:34 PM
  • I have checked the specs in my system. The .NET framework is okay but PowerShell version has been 2.0. I have upgraded it version 4.0. After that I ran your latest script without uncommenting anything since you said you have never had to change anything on it. Unfortunately it timed-out. I then commented out the batch and timeout lines together with EnableStreaming. I increased the batch to 80,000 and increased timeout to 480. Everything worked alright. All the rows loaded as expected.

    Many thanks.


    Mpumelelo

    Wednesday, February 3, 2016 2:08 PM
  • You misunderstand.  You have to update the SQLClient as it may well have issues as all versions og the client have many updates.  The batch behavior tells me that it is likely that either the server or the client has missing patches.

    You also have to manually install Net 4.5 and its service packs so the streaming can be used.

    https://www.microsoft.com/en-us/download/details.aspx?id=42642


    \_(ツ)_/


    • Edited by jrvModerator Wednesday, February 3, 2016 2:18 PM
    Wednesday, February 3, 2016 2:15 PM
    Moderator
  • I checked the .NET that we have. Its 4.5. That is why in my reply I have said that the .NET framework is okay. Apologies, I did not clarify that I checked the .NET version. I can confirm that it is 4.5. PowerShell was version 2 and have now upgraded it to version 4. The good thing is that everything is working now after upgrading PowerShell, including streaming (EnableStreaming) which hitherto did not work. All the data has been loaded. Thank you for your help and patience.

    Mpumelelo


    • Edited by Mpumelelo S Wednesday, February 3, 2016 2:41 PM
    Wednesday, February 3, 2016 2:41 PM
  • THat is good.  You shouldn't need to batch with streaming.  Batching can slow things down although a failure can cause a huge rollback.  To gain more performance we can also turn of default transactions and check the table row count in the end.  If it is wrong then drop the table.

    The BCP command line utility has more options including the ability to capture rejected records.  I believe it is also somewhat faster.


    \_(ツ)_/

    Wednesday, February 3, 2016 2:46 PM
    Moderator
  • That’s interesting. I think what you have said sounds useful. How do I turn off default transactions and check table row count in the end using PowerShell. That sounds like important aspects that I can add to my code to improve performance.


    Mpumelelo

    Wednesday, February 3, 2016 2:53 PM
  • That’s interesting. I think what you have said sounds useful. How do I turn off default transactions and check table row count in the end using PowerShell. That sounds like important aspects that I can add to my code to improve performance.


    Mpumelelo

    Not sure I correctly remember how to do this.  It is set on the connection so it may be a decoration on the connectionstring.

    To check row count just use a SQL query. ?SELECT COUNT(*) FROM table_name"


    \_(ツ)_/

    Wednesday, February 3, 2016 2:57 PM
    Moderator
  • That is okay, thanks.

    Mpumelelo

    Wednesday, February 3, 2016 3:00 PM
  • That’s interesting. I think what you have said sounds useful. How do I turn off default transactions and check table row count in the end using PowerShell. That sounds like important aspects that I can add to my code to improve performance.


    Mpumelelo

    Use this option: SqlBulkCopyOptions.UseInternalTransaction

    As best I can remember this removes the default transaction on the connection and only uses a BC transactions state which I do not have much info on.  It should allow all posts but it will not fail with an exception which  I why you have to check the row count.


    \_(ツ)_/

    Wednesday, February 3, 2016 3:05 PM
    Moderator
  • Wednesday, February 3, 2016 3:06 PM
    Moderator
  • That is useful. Thank you.

    Mpumelelo

    Wednesday, February 3, 2016 3:22 PM
  • Hi Mpume, was the possible fix below implemented & working?

    I'm having a task whereby I have to import these onto the system also

    Thursday, June 14, 2018 6:33 AM