There is a table that has 13 million + rows on another server that I bring into my database so my queries run faster. I used to append new data every day, but realized some changes were happening for past data. Thinking about fragmentation on my database, it is better to drop and bring in all the data weekly or update the past data with a MERGE statement? My database is simple recovery model and I shrink the log file often (I do not use it for restores), so I am only concerned about the impacts on my database file. Thank you.
- Edited by Jay_Michael Sunday, July 21, 2013 7:54 PM
Regularly shrinking the log is not a good idea, even in SIMPLE recovery model. Read my blog about common myths about transaction log.
Fragmentation will occur if your update make rows wider, or your insert does not fall at the end of the index, AND the fillfactor was left at default of 100%. Fragmentation can be measured and also fixed e.g. with this fine Olla's jobs. Also, you set smaller fillfactor to prevent fast fragmentation.
But, why you worry so much about fragmentation? You could worry about performance (refreshing all 13m rows every time?), concurrency (tablock during data move). Maybe you could employ some of the technique to capture just the changes and copy them, e.g. Change Tracking - it will capture not only inserted and updated rows, but also deleted ones. Change Data Capture or some form of replication would probably be an overkill.
You could also tune the performance with multiple threads, multiple files in the filegroup, lock management (lock whole table upfront, or do the job in batches of <5000 rows not to lock the table and maintain concurrency during the process). There are tons of stuff that can be done, but this is just to give you an idea.
I appreciate all that insight. I am not concerned about performance while the updates or inserts are occurring. This is a personal data warehouse i use for reporting, so myself and maybe 2 or 3 others on my team are using this for reporting, and all that data movement/change occurs over night when no one is using the DB. I probably should have stated that in my first thread. That is why my main concern was fragmentation and slowing down my DB in other ways. I will read your blog and the other link to see if i can find ideas. Thank you.