I currently need to process about 400 txt files, equivalent to about 1TB of data in maximum 5-6 hours. I need to run a very simple scrip, where I would capture about 0.10% of the data. Something like a select function with some where clause. Originally I
was thinking to import this data in a SQL server however I'm not sure my computer could handle the workload in a reasonable time frame.
What would you suggest? Is there any other products that Microsoft offers to process extremely large flat files?
The solution you have mentioned should be the best one in my opinion. The performance bottleneck in this case are:
Load data from 400 text files, which is about 1 TB large. Each file should be about 2.5GB
Bulk insert data to SQL Server
Filter data from the 1TB data
For the first chanllange, it won't be an issue in SSIS per my testing. I have a file with 23504761 rows, which is 2.5GB large. This file can be imported into a SQL Server table within 1 minute.
For the second chanllange, we can split the bulk insert to improve the performance.
For the third chanllange, with proper index(es) created, the performance won't be a problem.
Microsoft is conducting an online survey to understand your opinion of the Technet Web site. If you choose to participate, the online survey will be presented to you when you leave the Technet Web site.