none
Loop through excel files and load second sheet data only

    Dotaz

  • Hi gurus,

    1. Basically I have some excel files in a folder. Each excel file has  5 tabs (sheets) in it.

    2. So, I have to loop through all these files and load second (#2 )sheet data named as 'SECOND SHEET 'into SQL table.I dont care about the other sheets data and all these files has same structure and same sheet name like mentioned above.

    3. Also, while looping through these sheets I want to skip first 3 rows.

    I have looked online and read many articles but I cudn't find anything that match this criteria.

    Note: I want to do this with SSIS and not with script component as I am not that good at it.

    Please help me with this.!!


    • Upravený Spartaa 9. dubna 2012 17:54
    9. dubna 2012 17:25

Odpovědi

  • Step1: Have 2 string variables FullPath and SheetName with default value as SecondSheet

    Step2: Have a for each loop container anc configure it to point to the appropriate folder using the Collection tab. In the Files section use *.xls and select the "Fully Qualified" option for Retrieve file name section. Use Variable Mappings tab to save the Fully qualified name into FullPath variable (using drop down box) by setting the index as 0.

    Step3: Within for each loop take a data flow task. Inside data flow task take an excel source. Rt click the excel source->Edit. Click on new to create an excel connection manager. Create the excel connection manager by specifying one of correct excel file. Once you have created the excel connection manager, select the data access mode (in excel source editor) as "Table name or view name variable" and select the Sheetname variable from the dropdown box.

    Step4: Select the excel connection manager created in Step3, hit F4 go to expressions, click on ellipsis, select connection string from drop down and set following expression:

    "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="+ @[User::FullPath] +";Extended Properties=" +"\"Excel 8.0;HDR=YES\""+";"

    Make sure that you have a proper default value set for variable FullPath so that a valid connection string gets created.

    Above steps would allow you to fetch all the records from SecondSheet from all the excel files.

    In order to skip first four rows, is there any key column in your data or some specific value(s) that differentiate first 4  rows from the rest? If theare such values, a conditional split can be used to filter the unwanted records.


    Nitesh Rai- Please mark the post as answered if it answers your question


    • Upravený Nitesh Rai 10. dubna 2012 16:05
    • Označen jako odpověď Spartaa 13. dubna 2012 19:07
    10. dubna 2012 16:04
  • Hi Spartaa,

    What i believe your requirement is

    1. Loading multiple EXCEL files.

    2. Importing data only for Second Sheet of Excel files named "SECOND SHEET".

    3. Want to skip first 3 row.

    Please follow below steps.

    1. Use for each loop for loading multiple excel files from same folder(I believe u have done this)

    2. For selecting only SECOND Sheet.

       2. 1 . Go to Data Flow Task

       2.2. Under Connection Manager, you will find "Name of Excel Sheet".

       2.3. Select "SecondSheet" from drop down.

    above step will always look for second sheet.

    3. For data retreving from third row, do the following

      3.1.  Go to Data Flow task

      3.2.  Right Click Excel Source, select Properties

      3.3.  On Openset give value SecondSheet$A3

           Have  a look on link for further ..

        http://geekepisodes.com/sqlbi/2010/ssis-skipping-rows-and-columns-in-excel/


    Amit
    Please mark as answer if helpful
    http://fascinatingsql.wordpress.com/


    12. dubna 2012 10:33

Všechny reakce

  • Hi,

    Please refer the link below,

    http://www.bidn.com/blogs/mikedavis/ssis/625/loop-through-excel-file-in-ssis

    http://msdn.microsoft.com/en-us/library/ms345182(v=sql.90).aspx

    also working with excel worksheets in below link

    http://stackoverflow.com/questions/7411741/looping-through-excel-files-and-tables-in-ssis-using-foreach-loop-containers

    hope this helps,


    Thanks,
    Suresh
    Please use Mark or Propose as Answer if my post solved your problem and use Vote As Helpful if a post was useful.


    • Upravený svgSuresh 9. dubna 2012 17:32 added more links
    9. dubna 2012 17:29
  • Hi,

    Please refer the link below,

    http://www.bidn.com/blogs/mikedavis/ssis/625/loop-through-excel-file-in-ssis

    http://msdn.microsoft.com/en-us/library/ms345182(v=sql.90).aspx

    also working with excel worksheets in below link

    http://stackoverflow.com/questions/7411741/looping-through-excel-files-and-tables-in-ssis-using-foreach-loop-containers

    hope this helps,

    HI SvgSuresh,

    I have gone through all these links well before. Also, I am familiar with loading multiple files having single sheet. But nothing above suits my requirement. Hope you understand my problem.

    9. dubna 2012 17:51
  • Can anyone help me with this??
    10. dubna 2012 12:51
  • Hi Spartaa,

    Is the sheet to be loaded has the same name in all the excel files? (as you mentioned "Second Sheet").


    Nitesh Rai- Please mark the post as answered if it answers your question

    10. dubna 2012 14:15
  • I believe you have same sheet name. Have a variable (sheetname) with value as "Second Sheet". Have a DFT within a for each loop. In DFT, take a excel source and configure it with data access mode as "Table name or view name variable" and select the sheetname variable.

    Set an expression for excel source connection string, so that connection string points to the excel file that is being iterated by for-each-loop container, in order to fetch the data from appropriate excel file.


    Nitesh Rai- Please mark the post as answered if it answers your question

    10. dubna 2012 14:18
  • I believe you have same sheet name. Have a variable (sheetname) with value as "Second Sheet". Have a DFT within a for each loop. In DFT, take a excel source and configure it with data access mode as "Table name or view name variable" and select the sheetname variable.

    Set an expression for excel source connection string, so that connection string points to the excel file that is being iterated by for-each-loop container, in order to fetch the data from appropriate excel file.


    Nitesh Rai- Please mark the post as answered if it answers your question

    Hi Nitesh,

    Yes, As you believed all my excel files has same name as 'SECOND SHEET'. But here, I also have to skip first 4 rows from that sheet everytime.

    So, I created variable called 'sheetname' ='[SECOND SHEET]$a5:j50000'  as the value. But I was getting follwoing error.: I even tried removing the range and just giving value as 'SECOND SHEET' still same error.

    Error: Variable "User::Sheetname" does not contain a valid data object

    Before all this I have tell you that. I designed a OUTER_foreachloop container with "Foreach ADO.NET Schema Rowset Enumerator" and inner for eachloop with "foreach file enumarator". and have DFT in the inner For each loop container,. And have three variables at present

    1) Filepath = complete path with file name at the end(till .xlsx)

    2) Sheetname= [SECOND SHEET]$a5:j5000

    3) Filename=exp: substring(User::filepath- To get only the file name)

    Can you you help me with this.??

    10. dubna 2012 15:16
  • Also, While configuring  ADO.NET connection manager im getting the following error:

    TITLE: Connection Manager
    ------------------------------

    Test connection failed because of an error in initializing provider. External table is not in the expected format.

    ------------------------------
    BUTTONS:

    OK
    ------------------------------

    So, the this might causing the problem. But I am not sure what the problem exactly. I have followed some posts (I guess some by you).

    Help me with this.Thanks

    10. dubna 2012 15:23
  • Step1: Have 2 string variables FullPath and SheetName with default value as SecondSheet

    Step2: Have a for each loop container anc configure it to point to the appropriate folder using the Collection tab. In the Files section use *.xls and select the "Fully Qualified" option for Retrieve file name section. Use Variable Mappings tab to save the Fully qualified name into FullPath variable (using drop down box) by setting the index as 0.

    Step3: Within for each loop take a data flow task. Inside data flow task take an excel source. Rt click the excel source->Edit. Click on new to create an excel connection manager. Create the excel connection manager by specifying one of correct excel file. Once you have created the excel connection manager, select the data access mode (in excel source editor) as "Table name or view name variable" and select the Sheetname variable from the dropdown box.

    Step4: Select the excel connection manager created in Step3, hit F4 go to expressions, click on ellipsis, select connection string from drop down and set following expression:

    "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="+ @[User::FullPath] +";Extended Properties=" +"\"Excel 8.0;HDR=YES\""+";"

    Make sure that you have a proper default value set for variable FullPath so that a valid connection string gets created.

    Above steps would allow you to fetch all the records from SecondSheet from all the excel files.

    In order to skip first four rows, is there any key column in your data or some specific value(s) that differentiate first 4  rows from the rest? If theare such values, a conditional split can be used to filter the unwanted records.


    Nitesh Rai- Please mark the post as answered if it answers your question


    • Upravený Nitesh Rai 10. dubna 2012 16:05
    • Označen jako odpověď Spartaa 13. dubna 2012 19:07
    10. dubna 2012 16:04
  • Step1: Have 2 string variables FullPath and SheetName with default value as SecondSheet

    Step2: Have a for each loop container anc configure it to point to the appropriate folder using the Collection tab. In the Files section use *.xls and select the "Fully Qualified" option for Retrieve file name section. Use Variable Mappings tab to save the Fully qualified name into FullPath variable (using drop down box) by setting the index as 0.

    Step3: Within for each loop take a data flow task. Inside data flow task take an excel source. Rt click the excel source->Edit. Click on new to create an excel connection manager. Create the excel connection manager by specifying one of correct excel file. Once you have created the excel connection manager, select the data access mode (in excel source editor) as "Table name or view name variable" and select the Sheetname variable from the dropdown box.

    Step4: Select the excel connection manager created in Step3, hit F4 go to expressions, click on ellipsis, select connection string from drop down and set following expression:

    "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="+ @[User::FullPath] +";Extended Properties=" +"\"Excel 8.0;HDR=YES\""+";"

    Make sure that you have a proper default value set for variable FullPath so that a valid connection string gets created.

    Above steps would allow you to fetch all the records from SecondSheet from all the excel files.

    In order to skip first four rows, is there any key column in your data or some specific value(s) that differentiate first 4  rows from the rest? If theare such values, a conditional split can be used to filter the unwanted records.


    Nitesh Rai- Please mark the post as answered if it answers your question


    Thanks for your explanation. I will try that now.

    And

    There is no such Key column to identify like that. Also, that sheet has some blank rows which I need to avoid. Can I use row count transformation and eliiminate blank rows. Or Is there any other work around for this.?

    10. dubna 2012 17:27
  • I just tried as you said and I was getting the folliowing message when trying preview in the excel source component.

    "Index and length must refer to a location within the string.
    Parameter name: length (mscorlib)"

    10. dubna 2012 17:38
  • I just tried as you said and I was getting the folliowing message when trying preview in the excel source component.

    "Index and length must refer to a location within the string.
    Parameter name: length (mscorlib)"

    Check this link: http://stackoverflow.com/questions/2048681/sql-server-2005-ssis-data-transfer-sql-to-excel-corrupts-excel-file

     To avoid the blank rows, you can use a conditional split transform and check for blank values in each column and thus filter the records.


    Nitesh Rai- Please mark the post as answered if it answers your question

    10. dubna 2012 21:29
  • I just tried as you said and I was getting the folliowing message when trying preview in the excel source component.

    "Index and length must refer to a location within the string.
    Parameter name: length (mscorlib)"

    Check this link: http://stackoverflow.com/questions/2048681/sql-server-2005-ssis-data-transfer-sql-to-excel-corrupts-excel-file

    Hi Nitesh,

    I was able to load the first file successfully but while loading the second file in the valiation stage itself I encountered the following error:

     [Connection manager "Excel Connection Manager"] Error: SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80004005.
    An OLE DB record is available.  Source: "Microsoft JET Database Engine"  Hresult: 0x80004005  Description: "External table is not in the expected format.".

    I even tried setting the delay validation =true for DFT.

    But all the sheets in all the files are having same n.of columns and same structure.

    Please help me with this one.

    11. dubna 2012 15:11
  • Hi Spartaa,

    What i believe your requirement is

    1. Loading multiple EXCEL files.

    2. Importing data only for Second Sheet of Excel files named "SECOND SHEET".

    3. Want to skip first 3 row.

    Please follow below steps.

    1. Use for each loop for loading multiple excel files from same folder(I believe u have done this)

    2. For selecting only SECOND Sheet.

       2. 1 . Go to Data Flow Task

       2.2. Under Connection Manager, you will find "Name of Excel Sheet".

       2.3. Select "SecondSheet" from drop down.

    above step will always look for second sheet.

    3. For data retreving from third row, do the following

      3.1.  Go to Data Flow task

      3.2.  Right Click Excel Source, select Properties

      3.3.  On Openset give value SecondSheet$A3

           Have  a look on link for further ..

        http://geekepisodes.com/sqlbi/2010/ssis-skipping-rows-and-columns-in-excel/


    Amit
    Please mark as answer if helpful
    http://fascinatingsql.wordpress.com/


    12. dubna 2012 10:33
  • Hi Spartaa,

    What i believe your requirement is

    1. Loading multiple EXCEL files.

    2. Importing data only for Second Sheet of Excel files named "SECOND SHEET".

    3. Want to skip first 3 row.

    Please follow below steps.

    1. Use for each loop for loading multiple excel files from same folder(I believe u have done this)

    2. For selecting only SECOND Sheet.

       2. 1 . Go to Data Flow Task

       2.2. Under Connection Manager, you will find "Name of Excel Sheet".

       2.3. Select "SecondSheet" from drop down.

    above step will always look for second sheet.

    3. For data retreving from third row, do the following

      3.1.  Go to Data Flow task

      3.2.  Right Click Excel Source, select Properties

      3.3.  On Openset give value SecondSheet$A3

           Have  a look on link for further ..

        http://geekepisodes.com/sqlbi/2010/ssis-skipping-rows-and-columns-in-excel/

    Hi Amit,

    Everything seems to work fine until I enter the expression for the excel connection manager for looping throgh the folder.

    "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="+ @[User::FullPath] +";Extended Properties=" +"\"Excel 8.0;HDR=YES\""+";"

    As soon as I entered this expression the Excel source dataflow component showing red mark with the following error.:

    An OLE DB record is available.  Source: "Microsoft JET Database Engine"  Hresult: 0x80004005  Description: "External table is not in the expected format.".

     I tried to changes everything I cud but it keep giving me some or the error .

    Please help me witht his.!!

    12. dubna 2012 12:44
  • HI Spartaa,

    I am not sure about the issue you have, but try to add IMEX property to be 1 in COnnection

    "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="+ @[User::FullPath] +";Extended Properties=" +"\"Excel 8.0;HDR=YES;IMEX=1\""+";"

    Hope it will work, try for the above.


    Amit
    Please mark as answer if helpful
    http://fascinatingsql.wordpress.com/


    13. dubna 2012 8:06
  • HI Spartaa,

    I am not sure about the issue you have, but try to add IMEX property to be 1 in COnnection

    "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="+ @[User::FullPath] +";Extended Properties=" +"\"Excel 8.0;HDR=YES;IMEX=1\""+";"

    Hope it will work, try for the above.


    Amit
    Please mark as answer if helpful
    http://fascinatingsql.wordpress.com/


    Hi Amit,

    I tried this but it did not solve the issue.
    But I found an alternative to get out of this!!

    Thanks Nitesh and Ahit for your knowledge sharing. !!

    13. dubna 2012 19:06
  • hi spartaa,

    can u tel me what alternative u have???


    Amit
    Please mark as answer if helpful
    http://fascinatingsql.wordpress.com/

    15. dubna 2012 14:46
  • Hi Amit,

    Please add the following steps to after Spartaa's steps,

    Create an Ado.net connection with following parameters

    Provider-Microsoft Jey 4.0 OLE Db provider

    DatabaseFilename- the file you wanted to copy from

    Click\ Select "All"

    Set Exteded  properties to - Excel 8.0

    Test connection

    On Foreach loop Editor(Considered you run the DFT within FELoop)

    go to Collection -

    select Enumerator- For each EDO.NET Schema Rowset...

    Enumerator Config:

    Select the Ado.net connection manager

    Schema-Tables

    Under variable mapping- Select the varibale saves the sheet value, set Index =2

    Try This, This would help you.

    Cheers.


    JN

    25. dubna 2012 14:44