none
How to load data to several tables while passing id numbers, which was generated by SP

    Question

  • Environment: SQL server 2005 and Win 7.
    Tools: MS. SQL Management Studio 2008 R2.
    Data: Flat file source contains large data but here let say nine records in a csv format. The attributes are [SSN], [First name], and [Last name] listed in order:
    ('001456789',  'Jim', 'Kramer');
    ('008610321', 'Samanta', 'Cohen'),
    ('097754378', 'Ray', 'elvicio' )
    ('112233350', 'Mike', 'Smith'), 
    ('123456789', 'Dana', 'Kobo'), 
    ('123456789', 'Dalia', 'Budd'), 
    ('987654321', 'George', 'Jackson'), 
    ('987654321', 'Pamella', 'Slattery') , 
    ('987654321', 'Jose', 'Slattery'),

    Or this data is stored on Table and let call it "Table_imported"

    Table#1:

    DECLARE @Table_people TABLE 
    (
    SSN CHAR(9) NOT NULL Primary key
    pn_id CHAR(9) NOT NULL Reference @table_people (pn_id)
    ,first_nm VARCHAR(10)
    ,last_nm VARCHAR(20)
    )

    Table#2

    DECLARE @Table_SSN TABLE 
    (
    ssn CHAR(9) NOT NULL 
    ,pn_ID CHAR(9) NOT NULL
    )

    Stored procedure which is designed to generate randomly pn_id of 9 char length

    PROCEDURE [dbo].[proc_gen_pn_id]( 
    @new_pn_id	  CHAR(9) OUTPUT)
    AS
    BEGIN
    	DECLARE
    		@new_nbr	CHAR(9),
    		@user_cnt	INT,
    		@cnt 		INT,
    		@len		INT
    	SELECT	@cnt = 1
    	WHILE( @cnt = 1)
    	BEGIN
    SELECT	@new_nbr = convert(CHAR(9),convert(int,(RAND()*1000000000)))
    		SELECT @len = datalength(ltrim(rtrim(@new_nbr)))
    		WHILE (@len < 9)
    			BEGIN
    	SELECT @new_nbr = '0'+@new_nbr
    	SELECT @len = @len +1
    		END
    )
    Problem: Imagine you have 1000s (large scale) of records in a flat file (CSV format). How would you load those data into above tables. The process should be automated to save time and prevent errors. Creativity is welcomed!!

    Expected results:

    1. Table_ssn should contain the following data

    .....
    .....
    001456789   |  791011616
    008610321   |  565059001
    097754378   |  698173008
    112233350   |  964173440
    123456789   |  708173550
    123456789   |  259872340
    987654321   |  011034567
    987654321   |  098210001
    987654321   |  167952525

    2. Table_people

    .....
    .....
    001456789   |  791011616  | Jim      | Kramer
    008610321   |  565059001  | Samanta  | Cohen
    097754378   |  698173008  | Ray      | elvicio
    112233350   |  964173440  | Mike     | Smith
    123456789   |  708173550  | Dana     | Kobo
    123456789   |  259872340  | Dalia    | Budd
    987654321   |  011034567  | George   | Jackson
    987654321   |  098210001  | Pamella  | Slattery
    987654321   |  167952525  | Jose     | Slattery

    Please advise


    • Edited by Sandra VO Thursday, June 20, 2013 11:26 PM added line "this data is stored on Table and let call it "Table_imported""
    Thursday, June 20, 2013 10:37 PM

Answers

  • The basic question is: Why do you need a random ID and must it by CHAR(9)?

    It seesm that it is about protecting the access to the SSN's, thus using UNIQUEIDENTIFER should be sufficient:

     

    /*
      C:\Temp\Data.txt 
      ---
      001456789, Jim, Kramer
      008610321, Samanta, Cohen
      ---
    
      C:\Temp\Format.txt
      ---
      9.0
      3
      1       SQLCHAR       0       255     ","      1     Ssn         SQL_Latin1_General_CP1_CI_AS
      2       SQLCHAR       0       255     ","      2     FirstName   SQL_Latin1_General_CP1_CI_AS
      3       SQLCHAR       0       255     "\r\n"   3     LastName    SQL_Latin1_General_CP1_CI_AS
      ---
    */
    
    DECLARE @Persistent_People TABLE
        (
          pn_id UNIQUEIDENTIFIER NOT NULL
                                 PRIMARY KEY , -- Foreign key to @Persistent_Ssn.pn_id.
          first_nm NVARCHAR(255) NOT NULL , -- Maybe NULL, but imho shouldn't.
          last_nm NVARCHAR(255) NOT NULL
        );
    
    DECLARE @Persistent_Ssn TABLE
        (
          Ssn CHAR(9) NOT NULL , -- Unique constraint / index.
          pn_ID UNIQUEIDENTIFIER NOT NULL -- Primary Key.
        );
    
    DECLARE @Temporary_RawData TABLE
        (
          RandomID UNIQUEIDENTIFIER NULL ,
          Ssn NVARCHAR(255) NULL ,
          FirstName NVARCHAR(255) NULL ,
          LastName NVARCHAR(255) NULL
        );
    
    INSERT  INTO @Temporary_RawData
            ( RandomID ,
              Ssn ,
              FirstName ,
              LastName 
            )
            SELECT  NEWID() ,
                    O.Ssn ,
                    O.FirstName ,
                    O.LastName
            FROM    OPENROWSET(BULK 'C:\Temp\Data.txt', FORMATFILE= 'C:\Temp\Format.txt') O;
    
    SELECT  RD.*
    FROM    @Temporary_RawData RD;
    
    -- While UNIQUEIDENTIFIER creates a unique value with a very high proability,
    -- you should validate this before inserting it into the final tables.
    
    INSERT  INTO @Persistent_Ssn
            ( Ssn ,
              pn_ID 
            )
            SELECT  RD.Ssn ,
                    RD.RandomID
            FROM    @Temporary_RawData RD;
    
    INSERT  INTO @Persistent_People
            ( pn_id ,
              first_nm ,
              last_nm 
            )
            SELECT  RD.RandomID ,
                    RD.FirstName ,
                    RD.LastName
            FROM    @Temporary_RawData RD;
    
    SELECT  S.*
    FROM    @Persistent_Ssn S;
    SELECT  P.*
    FROM    @Persistent_People P;
    
    When you really need a CHAR(9) number, then you should use an update and a pick table instead of NEWID().
    • Proposed as answer by Naomi NModerator Friday, June 21, 2013 9:20 PM
    • Marked as answer by Sandra VO Monday, June 24, 2013 9:06 PM
    Friday, June 21, 2013 7:35 AM

All replies

  • 1. Import your CSV

    file to a temp table example #Temp1 using BCP IN or manullay you could do it by Task Import option availbale for your database   ( so that you could query that table )


    2.

    Instead of procedure , create a function which will return the random ph_id.


    3.

    Then you could do write a script to insert records to target tables from #temp table

    Example

    :  


    Insert

    into  @Table_SSN select column1 as SSN , dbo.Function() as ph_id from #Temp1


    Insert

    into  @Table_people select * from #Temp1



    Update

    t set t.ph_id  = s.ph_id


    From

    @Table_people t join @Table_SSN s on t.ssn_id = s.ssn_id

    Friday, June 21, 2013 7:27 AM
  • The basic question is: Why do you need a random ID and must it by CHAR(9)?

    It seesm that it is about protecting the access to the SSN's, thus using UNIQUEIDENTIFER should be sufficient:

     

    /*
      C:\Temp\Data.txt 
      ---
      001456789, Jim, Kramer
      008610321, Samanta, Cohen
      ---
    
      C:\Temp\Format.txt
      ---
      9.0
      3
      1       SQLCHAR       0       255     ","      1     Ssn         SQL_Latin1_General_CP1_CI_AS
      2       SQLCHAR       0       255     ","      2     FirstName   SQL_Latin1_General_CP1_CI_AS
      3       SQLCHAR       0       255     "\r\n"   3     LastName    SQL_Latin1_General_CP1_CI_AS
      ---
    */
    
    DECLARE @Persistent_People TABLE
        (
          pn_id UNIQUEIDENTIFIER NOT NULL
                                 PRIMARY KEY , -- Foreign key to @Persistent_Ssn.pn_id.
          first_nm NVARCHAR(255) NOT NULL , -- Maybe NULL, but imho shouldn't.
          last_nm NVARCHAR(255) NOT NULL
        );
    
    DECLARE @Persistent_Ssn TABLE
        (
          Ssn CHAR(9) NOT NULL , -- Unique constraint / index.
          pn_ID UNIQUEIDENTIFIER NOT NULL -- Primary Key.
        );
    
    DECLARE @Temporary_RawData TABLE
        (
          RandomID UNIQUEIDENTIFIER NULL ,
          Ssn NVARCHAR(255) NULL ,
          FirstName NVARCHAR(255) NULL ,
          LastName NVARCHAR(255) NULL
        );
    
    INSERT  INTO @Temporary_RawData
            ( RandomID ,
              Ssn ,
              FirstName ,
              LastName 
            )
            SELECT  NEWID() ,
                    O.Ssn ,
                    O.FirstName ,
                    O.LastName
            FROM    OPENROWSET(BULK 'C:\Temp\Data.txt', FORMATFILE= 'C:\Temp\Format.txt') O;
    
    SELECT  RD.*
    FROM    @Temporary_RawData RD;
    
    -- While UNIQUEIDENTIFIER creates a unique value with a very high proability,
    -- you should validate this before inserting it into the final tables.
    
    INSERT  INTO @Persistent_Ssn
            ( Ssn ,
              pn_ID 
            )
            SELECT  RD.Ssn ,
                    RD.RandomID
            FROM    @Temporary_RawData RD;
    
    INSERT  INTO @Persistent_People
            ( pn_id ,
              first_nm ,
              last_nm 
            )
            SELECT  RD.RandomID ,
                    RD.FirstName ,
                    RD.LastName
            FROM    @Temporary_RawData RD;
    
    SELECT  S.*
    FROM    @Persistent_Ssn S;
    SELECT  P.*
    FROM    @Persistent_People P;
    
    When you really need a CHAR(9) number, then you should use an update and a pick table instead of NEWID().
    • Proposed as answer by Naomi NModerator Friday, June 21, 2013 9:20 PM
    • Marked as answer by Sandra VO Monday, June 24, 2013 9:06 PM
    Friday, June 21, 2013 7:35 AM
  • @Stefan Hoffmann: I like your solution and I feel very close to load the data. However, it did not address the stored procedure or function to generate random pn id of 9 char length. Pn_id are FK associated with SSN, employee id, salary, evaluation...and it also designed to protect the access to the SSN's directly. using newID() does not match my request requirements, because it generate alpha-numeric values such as BD1FEEE9-813F-4835-9358-9FED264D8CE4. It take up space and hinder performance especially if I have 10000s of records. I am currently stuck to convert stored procedure to function. Over all, very creative approach and it will be a complete solution and ready to test it if you have created a function, which generates randomly pn_id (char(9)), while checking those numbers whether they exist in the @Table_ssn or @Table_people.

    Friday, June 21, 2013 4:03 PM
  • Create a table storing all numbers with a usage flag. Then pick a random number of the unused ones. Use an UPDATE and function for this. So you'll get optimal performance.
    Friday, June 21, 2013 8:56 PM
  • Stefan,

    I fear this solution may lead to concurrency problems. You need to first get the number and then update the table to make this number unavailable. This may be slightly tricky - what is your idea in details?


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Friday, June 21, 2013 9:22 PM
    Moderator
  • Stefan/Noami: how to first get the number and then update the table to make this number unavailable

    please advise

    Friday, June 21, 2013 9:33 PM
  • select top (1) @NewNumber = Number from dbo.Numbers where Used = 0 order by newid()

    update Numbers set Used = 1 where Number = @NewNumber

    -----------

    That's the idea, but it's a potential for collisions.


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Friday, June 21, 2013 9:48 PM
    Moderator
  • Thanks. It makes sense. I am also discourage to use newid() function and then update it for the reasons mentioned above!
    Friday, June 21, 2013 9:55 PM
  • The random approach needs has also its draw backs. As you may get duplicate numbers.

    As Sandra wants to batch update this can only be handled by locking the mapping table. Using a separate number table gives us a smaller table, which should result in a shorter locking period.


    And the trick is an UPDATE with an OUTPUT clause..
    Saturday, June 22, 2013 11:34 AM
  • In reference to random approach , which may cause dup records. The code checks whether the generated number exists in the inserted tables. As for the code, I wonder if you could explain the code or your thought process and why you chose to do so. I will help me better understand and learn from you.
    Monday, June 24, 2013 4:37 PM
  • What Stefan proposes is to have a big bucket of available numbers and draw from that bucket instead of generating the number every time. Once the number is taken, you can either mark it as taken (say, with a flag and a date, if needed) or just remove from the bucket. The bucket is assumed to be big enough to cover all the needs for the numbers. The trick Stefan also suggests is to use OUTPUT clause of UPDATE command to make update and return new number to be a single operation.

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Monday, June 24, 2013 5:16 PM
    Moderator
  • all the attributes/ values in the bucket file will be consumed and transferred. I just wanted to know how to load the data while capturing ... I just wanted to minimize complexity of the problem.

    Monday, June 24, 2013 5:53 PM