Friday, October 17, 2008 8:40 PMHello everyone,
This is a question about Windows Server 2003, although if there's a different answer for 2008 I'd like to know.
I understand there are very large numerical limits for the number of files in an NTFS filesystem, but I'm wondering what the practical limits are.
We have a directory with 3.2M files (57GB, 65GB on disk), including subdirectories. If I assume that this will get to 5M files in a year or so, are there any issues?
Although "supported", I'm wondering whether I should be recommending that this application be redesigned to utilize a database. The average file size is 17kB, but I suspect that the median is much closer to 5-6kB.
I would like to know whether this magnitude of storage is practical in the sense that clearly any reasonable workstation can handle 60GB-80GB of data, but if there's overhead - due to the way NTFS manages pointers, searching, and metadata, for example - that is significant at the level we might want to reconsider how we have architected this application.
Does anyone have experience in something around this size? Clearly, we could add hardware, like RAID1/0, more CPU, etc., and it is supported, but that's a cost that needs to be balanced against possibly rethinking how this data should be stored.
I very much appreciate any experience you might have, or pointers to offer. Thanks!
Saturday, October 18, 2008 6:20 AMModeratorMy belief is that the answers are the same for Windows Server 2008 and Windows Server 2003 as well
The theoretical limits for NTFS are document in resource kits.
There is no hard formula to decide what a practical limit is. But I can offer some pointers as to what considerations matter:
- NTFS is well documented to use B Trees. One can work out B Tree balancing algorithms and come up with a number where thinsg become unacceptabley slow. Unfortunately, your CPU speed, system load, bus I/O capability, cache hits/misses, hard disk speed etc not to mention what one person cosiders unacceptably slow are all ill defined
- Some gurus believe that handle based renames & deletes are better than path based renames/deletes because NTFS already has the relevant data structures located for handle based APIs
- You can assume that somewhere along the line, somebody is using hashes. The higher the number of files in a directory, the higher the chances that you get a hash collision
- Beleieve it or not, even the names of the files matter. An application that generates "ABCDEFG.00001", "ABCDEFG.00002" would be pretty bad if your system was also generating short file names, not to mention hash collisons
You are already past my personal comfort zone now with 3.2 million files, not to mention 5