none
How to conditionally re-execute a Data Flow Task?

    Dotaz

  • Hi All,

    Question: How can I re-execute a data flow task in the same package execution if the DFT transported zero rows?

    Background: This source table has accumulated hundreds of millions of rows in the last 7 years and I want to move it to a new database in chunks, so each execution of the package will copy only one month of data at a time.  The problem is that some years have only 11 months of data.  In these cases, when the package is called, it writes the latest month loaded to a variable and then queries the data source for the next month of data.  If there's no data for that month, the package completes successfully.  The next time it runs, it repeats the process identically, finding the same latest month load date and then copying zero rows.  And then again and again, never progressing past that one empty month.

    Here's where I want to step in.  I've got the DFT rowcount in a variable but I can't figure out how to loop the execution back around to make the data flow task execute again. 

    I could execute a count query against the source using an Execute SQL Task before the DFT, but under normal circumstances those counts are so high that the count query takes a long time to complete.  At any rate, I'd like to avoid the count query if I can.

    Is there a way, or am I looking at this all wrong?

    Thanks,
    Eric B.

    středa 6. června 2018 20:50

Odpovědi

  • Thanks to everybody for their suggestions.  In the end, I think the true answer to my question is to put the data flow task into a loop task. I can't elaborate any further because I decided to solve my problem in a completely different way, thus avoiding any need for repeating a DFT. Using "WITH TIES," I automatically pull the rows for the next appropriate date period. I use a Script Task inside the DFT to determine the actual date of those rows. That date gets written to a very small log table. That stored date then gets read and used in the next execution of the package to filter the source query, the same query that uses "WITH TIES." This worked very well for me, eliminating any need to run a DFT that moves zero rows.
    čtvrtek 14. června 2018 16:25

Všechny reakce

  • Get the following result set first:

    SELECT YEAR(DateTimeColumn) AS Year, MONTH(DateTimeColumn) AS Month

    FROM YourTable

    GROUP BY YEAR(DateTimeColumn), MONTH(DateTimeColumn)

    And then loop the record set for the data flow task.


    A Fan of SSIS, SSRS and SSAS

    středa 6. června 2018 21:12
  • You even can get the MAX and MIN of DateTimeColumn for each month and then put them to the variables. So Your WHERE clause can be WHERE DateTimeColumn >= @Min AND DateTimeColumn <= @Max in the query for your data flow task.


    A Fan of SSIS, SSRS and SSAS

    středa 6. června 2018 21:19
  • Thanks, Guoxioung.  I can see how that would work, but it requires making an additional query against my source database.  This is time-consuming because the table is very large and also because there's no indexes on the DateTimeColumn.  I was hoping for a way to use the number of rows transferred as a means of determining whether or not to execute the DFT for another month.
    středa 6. června 2018 22:02
  • Hi SQL Server dude,

    -->>The next time it runs, it repeats the process identically, finding the same latest month load date and then copying zero rows. And then again and again, never progressing past that one empty month.

    How do you build your data source query? How do you check lasted month load date in SSIS?

    Due to performance issue, you cannot use count() function in your scenario, so would you please try to ignore the empty month and go ahead with the next month?

    If you can share sample data and desired result will be helpful.

    Regards,

    Pirlo Zhang 


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    čtvrtek 7. června 2018 6:12
    Moderátor
  • Thanks, Pirlo. Due to time constraints, I've decided to compare a list of dates to be imported to a list of dates that have been imported, then import the next date on the list.  It's adding a lot of time to the package execution duration, but I need to get this done soon. I realize one way I could save some of that time is by writing the last imported date to a table, then querying that table the next time the package runs as opposed to querying the destination table for the MAX(DateTimeColumn).

    Thanks,

    Eric B.

    čtvrtek 7. června 2018 15:31
  • You can create a view to hold four columns YEAR, MONTH, MIN and MAX. And then loop the view you can transfer data per month.

    CREATE VIEW MinAndMaxOfDateTimeByMonth

    AS

    SELECT YEAR(DateTimeColumn), MONTH(DateTimeColumn), MIN(DateTimeColumn) AS MinDateTime, MAX(DateTimeColumn) AS MaxDateTime

    FROM YourTable

    GROUP BY YEAR(DateTimeColumn), MONTH(DateTimeColumn)


    A Fan of SSIS, SSRS and SSAS


    • Upravený Guoxiong čtvrtek 7. června 2018 16:27
    čtvrtek 7. června 2018 16:23
  • I think you may be better of getting a list of months inside a execute sql task and storing it in a ssis package

    Then have a for each loop with ADO enumerator to iterate through it and then have the logic inside the loop to transfer data for the month.

    This way you don't have to worry for the gaps in the data neither have to rely upon the latest date to continue with next months load.


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    čtvrtek 7. června 2018 16:53
  • Thanks to everybody for their suggestions.  In the end, I think the true answer to my question is to put the data flow task into a loop task. I can't elaborate any further because I decided to solve my problem in a completely different way, thus avoiding any need for repeating a DFT. Using "WITH TIES," I automatically pull the rows for the next appropriate date period. I use a Script Task inside the DFT to determine the actual date of those rows. That date gets written to a very small log table. That stored date then gets read and used in the next execution of the package to filter the source query, the same query that uses "WITH TIES." This worked very well for me, eliminating any need to run a DFT that moves zero rows.
    čtvrtek 14. června 2018 16:25