locked
add and populate new column in a fact table RRS feed

  • Question

  • Hi,

    I have a table with ~1m rows and it will grow .I need to add new columns and populate them.Some of these columns are fact data(transactional).The table contains 10 columns and no key columns.How do i populate new 2 columns that i want to add with data ?

    Do i copy it to secondary table ,alter structure ,truncate and reload.Then rename this copy table with original

    Or 

    is there another way to populate the new columns?

    Saturday, June 13, 2020 3:43 AM

Answers

  • Hi Ronen,

    I didnt understand this :

    * if the columns are in-row variable-length data type then it is important to fill the data in both columns together and not in two separate transactions (not like a fixed-length column when variable-length might take different space on the disk and each time to update an in-line variable-length column you might end up with creating a new row on the disk and moving all the data of the row under the scenes)

    Since it is a fact table,i cant update records to fill in those columns.So I guess i will have to go with :

    * In many cases the best approach is to: (1) copy the data to a Heap table or export it for example to CSV file. (2) truncate the table (3) add the columns. (4) rebuild table! (5) Import the data.

    Hi

    >> I didnt understand this

    It is related to advanced topic of how the severer actually works behind the scenes. To fully understand the reasons to my recommendation, You can search a recording of my lecture (there are versions in English and Hebrew) about the physical structure of a table. But again, this is level 500 lecture about undocumented information. The short point is that what you see as tables is nothing more then logical layer which the server creates for our use. For example, a table can have X columns, but the row in the disk will store X+Y columns while Y columns are hidden.

    The important point is the conclusion from knowing the internals fo the table and the conclusion is as I wrote :-)

    >> Since it is a fact table,i cant update records to fill in those columns.

    Not clear to me what is your procedure now. Why you cannot update a fact table? It is not the normal way we use fact table, but if needed for one time use since you need to add columns, so there is no technical issue that prevent it

    You said that you add 2 column to existing table. Why do you need more columns if no data will be filled there?

    If the table already include rows and you add 2 columns then I assumed that you will want to add data in these columns.

    If this is not the case then it change everything!

    For example, if there is no data in the table when you add the rows, then you should simply add the rows (in this case it is not important if you do it in one query or separate) -> REBUILD the table while ALTER it or after you add the column execute REBUILD. Next you can add the data

    >> So I guess i will have to go with :

    Yes, this is a good approach usually if the table already have data (mostly fit cases that there are many rows) and you need to add columns.


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    Tuesday, June 16, 2020 12:51 PM

All replies

  • Hi,

    I have a table with ~1m rows and it will grow .I need to add new columns and populate them.Some of these columns are fact data(transactional).The table contains 10 columns and no key columns.How do i populate new 2 columns that i want to add with data ?

    Do i copy it to secondary table ,alter structure ,truncate and reload.Then rename this copy table with original

    Or 

    is there another way to populate the new columns?

    Good day,

    some internals information...

    * Drop a column does not really drop the column but only marks it a "not exists", by marking the physical column name to null, while the leaf-offset stays the same (in simple words - the row structure stays the same and includes the space of the dropped column). In fact, you can even read the data of a dropped fixed-length columns from the disk (Using DBCC PAGE) from the rows which were written before the drop. To reclaim the data of Fixed-length you can to rebuild the table. reclaim some of the space of dropped variable-length columns can be done using Search Results Web results DBCC CLEANTABLE.

    * Creating new column in some cases can use the place of the a dropped column or add the columns at the end of the rows, but in most cases it will lead to re-write the entire table, since the structure of the rows will changed. We are working with row stores data which mean that it based on rows and each row has specific structure. If you change the structure of the rows and need more space for the rows as result then the server have to relocate the data and build new rows. This can cost a lot.

    >> How do i populate new 2 columns that i want to add with data ?

    * Do it when the server is less active.

    * Do it together and not in two steps! meaning add the two columns in one query

    * if the columns are in-row variable-length data type then it is important to fill the data in both columns together and not in two separate transactions (not like a fixed-length column when variable-length might take different space on the disk and each time to update an in-line variable-length column you might end up with creating a new row on the disk and moving all the data of the row under the scenes)

    * You might want to disable some relevant constraints and indexes before the update.

    * In many cases the best approach is to: (1) copy the data to a Heap table or export it for example to CSV file. (2) truncate the table (3) add the columns. (4) rebuild table! (5) Import the data.

    Do i copy it to secondary table ,alter structure ,truncate and reload.Then rename this copy table with original

    * Personally I do not like re-naming tables, but it can be done as explained here. It can have some dramatic unexpected dramatic impact.

    Note! I cannot advice you what to do in your specific case since for this I will need to fully familiar with your system. I can only give you points to take into consideration - my 2 cents.


    signature   Ronen Ariely
     [Pfersonal Site]    [Blog]    [Facebook]    [Linkedin]



    • Edited by pituachMVP Sunday, June 14, 2020 1:41 PM
    • Proposed as answer by Lily Lii Tuesday, June 16, 2020 5:40 AM
    Sunday, June 14, 2020 12:59 PM
  • Hi Ronen,

    I didnt understand this :

    * if the columns are in-row variable-length data type then it is important to fill the data in both columns together and not in two separate transactions (not like a fixed-length column when variable-length might take different space on the disk and each time to update an in-line variable-length column you might end up with creating a new row on the disk and moving all the data of the row under the scenes)

    Since it is a fact table,i cant update records to fill in those columns.So I guess i will have to go with :

    * In many cases the best approach is to: (1) copy the data to a Heap table or export it for example to CSV file. (2) truncate the table (3) add the columns. (4) rebuild table! (5) Import the data.

    Monday, June 15, 2020 1:43 AM
  • Hi msdnpublic1234,

    --Do you mean this? SELECT TOP(1) * INTO mytable_new FROM mytable; ALTER TABLE mytable_new ADD column_1 int, column_2 int; TRUNCATE TABLE mytable_new; INSERT INTO mytable_new (column1, column2, column3, ...) SELECT column1, column2, column3, ... FROM mytable; --or just SELECT *, 1 AS column_1,2 AS column_2 INTO mytable_new FROM mytable;

    UPDATE mytable_new set column_1=... ,column_2=...

    Best Regards,

    Lily


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Monday, June 15, 2020 6:00 AM
  • Hi Ronen,

    I didnt understand this :

    * if the columns are in-row variable-length data type then it is important to fill the data in both columns together and not in two separate transactions (not like a fixed-length column when variable-length might take different space on the disk and each time to update an in-line variable-length column you might end up with creating a new row on the disk and moving all the data of the row under the scenes)

    Since it is a fact table,i cant update records to fill in those columns.So I guess i will have to go with :

    * In many cases the best approach is to: (1) copy the data to a Heap table or export it for example to CSV file. (2) truncate the table (3) add the columns. (4) rebuild table! (5) Import the data.

    Hi

    >> I didnt understand this

    It is related to advanced topic of how the severer actually works behind the scenes. To fully understand the reasons to my recommendation, You can search a recording of my lecture (there are versions in English and Hebrew) about the physical structure of a table. But again, this is level 500 lecture about undocumented information. The short point is that what you see as tables is nothing more then logical layer which the server creates for our use. For example, a table can have X columns, but the row in the disk will store X+Y columns while Y columns are hidden.

    The important point is the conclusion from knowing the internals fo the table and the conclusion is as I wrote :-)

    >> Since it is a fact table,i cant update records to fill in those columns.

    Not clear to me what is your procedure now. Why you cannot update a fact table? It is not the normal way we use fact table, but if needed for one time use since you need to add columns, so there is no technical issue that prevent it

    You said that you add 2 column to existing table. Why do you need more columns if no data will be filled there?

    If the table already include rows and you add 2 columns then I assumed that you will want to add data in these columns.

    If this is not the case then it change everything!

    For example, if there is no data in the table when you add the rows, then you should simply add the rows (in this case it is not important if you do it in one query or separate) -> REBUILD the table while ALTER it or after you add the column execute REBUILD. Next you can add the data

    >> So I guess i will have to go with :

    Yes, this is a good approach usually if the table already have data (mostly fit cases that there are many rows) and you need to add columns.


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    Tuesday, June 16, 2020 12:51 PM