none
Importing Data from The Most Recent Timestamped Flat File RRS feed

  • Question

  • Hello,

    I'm hoping my question is basic with an easy answer, however I am not able to find a solution on my own.  Here is my scenario:

    I have a directory containing a series of flat files which are timstamped (Capacity_2009_12_01.txt, for example).  These flat files are created every week and are kept for historical reasons.  I would like to build a package which pulls the data from the most recent flat file on a weekly basis. 

    My question:

    How do I configure the package so that it knows which flat file is the most recent?  Do I configure this in the flatfile source task flow?

    Any help will be appreciated, again I am not able to find this in a book or anywhere online.

    Thanks!!
    Wednesday, December 9, 2009 7:45 PM

Answers

  • The variable I'm talking about is an SSIS variable.  Depending on what you have selected in the Control Flow, it will be "scoped" to that - so if you have the Foreach or Script Task selected when you create the variable, you'll have to redo it later once you figure out you can't use it where you want to use it.

    Click on the Control Flow background, open the Variables window, and create a new variable.

    Other than that, the only thing missing is "the rest of your package" that does the real work to the file you just picked in that Foreach.
    Todd McDermid's Blog
    Thursday, December 10, 2009 12:06 AM
    Moderator
  • Since your files are timestamped, I am assuming that you are considering the filename (whcih are of same format) to decide if it is most recent or not. First I would save the last processed file's datetime stamp in some variable or in a table with one column as Processeddate of type bigint. Then I would compare the timestamp values of all the text file with this stored value. Based on that I would process my file(s).
    First task would be Execute SQL task with statement as:
    Select ProcessedDate From Table and then store the result in a string variable OldLastDate.
    Then take a foreach loop and loop through all the text files of format Capacity_*.txt and save the filename (select name only option) in a string variable FileName.
    Create one more string variable and set its "Evaluate As Expression" property as True. Set this expression for the variable: SUBSTRING(REPLACE(@[User::FileName],"_",""),9,LEN( @[User::FileName])-8)
    Inside For Each Loop, take a script task and connect it to the actual data flow task. On the green connector double click and select:
    Evaluation Operation as Expression and Constraint
    Value                       as Suuccess
    Expression               as @[User::NewLastDate]>@[OldLastDate]
    Then after For Each Loop, take one more Execute SQL Task and Update the table which stores the last processed date as: UPDATE Table SET ProcessedDate = ? and do the parameter mapping using NewLastDate variable.

    Note: Suppose you have 3 files in the folder:
    Capacity_2009_12_01.txt, Capacity_2009_11_23.txt and Capacity_2009_11_15.txt. Out of these 3 files 2 are already processed except Capacity_2009_12_01.txt. So you have to make a manual entry in the Table for ProcessedDate as 20091131. After that the package will take care of updating the table and filtering the files.


    Nitesh Rai- Please mark the post as answered if it answers your question
    Thursday, December 10, 2009 9:11 AM

All replies

  • Your problem comes down to the fact that there are very few "directory" filtering functions available in SSIS - not that I'd know how you'd make such an object that would handle the multitude of features people would probably want from it.  Your answer isn't easy...

    However, here's one way you could get what you need:

    Add a variable to your project - date typed, called "newest timestamp"
    Add a Foreach Loop container, configure it to use the Foreach File Enumerator.  This will include creating a new SSIS variable to store the file name.
    Place a Script Task inside the loop.  Set the Script to be able to read/write to the "newest timestamp" variable, and readonly the file name variable.  Inside the script itself, you'll have to do something like this (in C#):
    string fileName;
    DateTime fileDate;
    DateTime newestFileDate;
    
    fileName = (string)Dts.Variables["FileName"].Value;
    newestFileDate = (DateTime)Dts.Variables["NewestDate"].Value;
    
    fileDate = new DateTime(Convert.ToInt32(fileName.Substring(10,4)), Convert.ToInt32(fileName.Substring(15,2)), Convert.ToInt32(fileName.Substring(18,2)));
    
    if (fileDate > newestFileDate)
    {
        Dts.Variables["NewestDate"].Value = fileDate;
    }
    Now do your "processing" after the Foreach Loop.  You may want to reconstruct the "newest filename" from the newest timestamp variable (which you can do with Expressions).
    Todd McDermid's Blog
    Wednesday, December 9, 2009 8:03 PM
    Moderator
  • Thanks Todd.

    I'm working on this now, but i'm a total newbie to SSIS and don't understand exactly where the "newest timestamp" variable should go.  You mention this step before adding the ForEach container - is there something else which should be in the Control Flow?

    In other words, right now in my package I have a Foreach Loop Container with a Script Task control flow item inside of it.  What am I missing?

    Thanks!!
    Wednesday, December 9, 2009 11:34 PM
  • The variable I'm talking about is an SSIS variable.  Depending on what you have selected in the Control Flow, it will be "scoped" to that - so if you have the Foreach or Script Task selected when you create the variable, you'll have to redo it later once you figure out you can't use it where you want to use it.

    Click on the Control Flow background, open the Variables window, and create a new variable.

    Other than that, the only thing missing is "the rest of your package" that does the real work to the file you just picked in that Foreach.
    Todd McDermid's Blog
    Thursday, December 10, 2009 12:06 AM
    Moderator
  • Since your files are timestamped, I am assuming that you are considering the filename (whcih are of same format) to decide if it is most recent or not. First I would save the last processed file's datetime stamp in some variable or in a table with one column as Processeddate of type bigint. Then I would compare the timestamp values of all the text file with this stored value. Based on that I would process my file(s).
    First task would be Execute SQL task with statement as:
    Select ProcessedDate From Table and then store the result in a string variable OldLastDate.
    Then take a foreach loop and loop through all the text files of format Capacity_*.txt and save the filename (select name only option) in a string variable FileName.
    Create one more string variable and set its "Evaluate As Expression" property as True. Set this expression for the variable: SUBSTRING(REPLACE(@[User::FileName],"_",""),9,LEN( @[User::FileName])-8)
    Inside For Each Loop, take a script task and connect it to the actual data flow task. On the green connector double click and select:
    Evaluation Operation as Expression and Constraint
    Value                       as Suuccess
    Expression               as @[User::NewLastDate]>@[OldLastDate]
    Then after For Each Loop, take one more Execute SQL Task and Update the table which stores the last processed date as: UPDATE Table SET ProcessedDate = ? and do the parameter mapping using NewLastDate variable.

    Note: Suppose you have 3 files in the folder:
    Capacity_2009_12_01.txt, Capacity_2009_11_23.txt and Capacity_2009_11_15.txt. Out of these 3 files 2 are already processed except Capacity_2009_12_01.txt. So you have to make a manual entry in the Table for ProcessedDate as 20091131. After that the package will take care of updating the table and filtering the files.


    Nitesh Rai- Please mark the post as answered if it answers your question
    Thursday, December 10, 2009 9:11 AM