locked
Large number of files/folders?? RRS feed

  • Question

  • I have read several posts regarding issues that can be encountered when there are many files (~million) in a single folder. I have also read the referenced articles about NTFS limits etc. and seen general recommendations for keeping the number of files in a single folder around ~10,000 or so and trying to spread the large number of files into some sort of folder structure.

    My question is this:

    Would the same issues/recommendations apply to many folders on a single volume or say number of subfolders in a single parent folder?

    Example:

    F:\Folder1

    F:\Folder1\Subfolder1

    <SNIP>

    F:\Folder1\Subfolder1000000

    Friday, November 11, 2011 9:35 PM

Answers

  • I have been trying to research this, but haven't come up with anything (yet) that supports my answer.

    However, as far as I recall, folders are really just files, but flagged as a folder.

    The files you see listed are just shortcuts to the actual location of the beginning of the file on the disk, and it's the amount of shortcuts to files in a folder that cause the limitation.

    I believe the limitation is the same for sub-folders.

    But, again, this is only based on my own theory/memory, as I haven't been able to find anything regarding this yet.

    I will let you know if I do.

    EDIT:

    Found something. According to this: http://www.pcguide.com/ref/hdd/file/ntfs/filesDir-c.html, everything on an NTFS volume is considered files, and that includes folders.

     

     



    • Edited by Ole Drews Jensen Monday, November 14, 2011 3:28 PM
    • Proposed as answer by Ole Drews Jensen Monday, November 14, 2011 4:46 PM
    • Marked as answer by Nagorg Monday, November 28, 2011 10:09 PM
    Monday, November 14, 2011 3:20 PM
  • It seems that there is actually a difference between files -vs- folders at least as it relates to the MFT.

    Given this information, would many (~millions) folders in a single folder be less likely to introduce performance related symptoms than many files in a single folder?

    I'm inclined to think not, mainly because of the referenced indexes.

    From MSDN on the topic:

    http://technet.microsoft.com/en-us/library/cc976808.aspx

    "NTFS creates a file record for each file and a directory record for each directory created on an NTFS volume. The MFT includes a separate file record for the MFT itself. These file and directory records are stored on the MFT. The attributes of the file are written to the allocated space in the MFT. Besides file attributes, each file record contains information about the position of the file record in the MFT.

    Each file usually uses one file record. However, if a file has a large number of attributes or becomes highly fragmented, it may need more than one file record. If this is the case, the first record for the file, called the base file record, stores the location of the other file records required by the file. Small files and directories (typically 1,500 bytes or smaller) are entirely contained within the file's MFT record.

    Directory records contain index information. Small directories might reside entirely within the MFT structure, while large directories are organized into B-tree structures and have records with pointers to external clusters that contain directory entries that could not be contained within the MFT structure."

    • Marked as answer by Nagorg Monday, November 28, 2011 10:08 PM
    Tuesday, November 15, 2011 11:20 AM

All replies

    • NTFS, or "New Technology File System" introduced with Windows NT, is a completely redesigned file system.

      • Maximum disk size: 256 terabytes

      • Maximum file size: 256 terabytes

      • Maximum number of files on disk: 4,294,967,295

      • Maximum number of files in a single folder: 4,294,967,295

    • Proposed as answer by Vincent Hu Monday, November 14, 2011 6:38 AM
    • Unproposed as answer by Vincent Hu Thursday, November 17, 2011 1:26 PM
    Sunday, November 13, 2011 5:28 PM
  • Thanks for recycling some of the info I mentiond I had already read.  These "maximum's" dont really hold water when it comes to various performace issues/percieved hang symptoms that many folks encounter when dealing with these large number of files/folders.

    My question was also targeting the number of folders instead of files.  Would it be accurate to classify both files and folders as "objects" and apply the same strategy for reducing the number of child objects with a single parent object?

    Monday, November 14, 2011 2:02 PM
  • I have been trying to research this, but haven't come up with anything (yet) that supports my answer.

    However, as far as I recall, folders are really just files, but flagged as a folder.

    The files you see listed are just shortcuts to the actual location of the beginning of the file on the disk, and it's the amount of shortcuts to files in a folder that cause the limitation.

    I believe the limitation is the same for sub-folders.

    But, again, this is only based on my own theory/memory, as I haven't been able to find anything regarding this yet.

    I will let you know if I do.

    EDIT:

    Found something. According to this: http://www.pcguide.com/ref/hdd/file/ntfs/filesDir-c.html, everything on an NTFS volume is considered files, and that includes folders.

     

     



    • Edited by Ole Drews Jensen Monday, November 14, 2011 3:28 PM
    • Proposed as answer by Ole Drews Jensen Monday, November 14, 2011 4:46 PM
    • Marked as answer by Nagorg Monday, November 28, 2011 10:09 PM
    Monday, November 14, 2011 3:20 PM
  • like mentioned, files or folders dont make a difference.

    when you speak about the 10k entry "imit", i assume you mean that is the point at which explorer roughly gets unresponsive when listing the content of the folder/drive.

    it depends a bit on how you plan to use the files in your system. if they actually get checked with explorer, then it is a good idea to stay below that 5 digit object counter.

    if you use those files mostly for a process/programmatically, you can take a look at the functions FindFirstFile (http://msdn.microsoft.com/en-us/library/windows/desktop/aa364418(v=vs.85).aspx) and FindNextFile, they dont retrieve the whole folder content, so you dont see the performance impact you see in explorer. If you would use .net, this was implemented with Directory.EnumerateFiles in the framework version 4, the older method GetFiles would show the same behaviour as explorer.

    Monday, November 14, 2011 4:05 PM
  • Thanks for the additional information. I think I am closer to getting the answer to my question thanks to the link posted by Ole. It would be nice to get some info from a MSFT source though. But, I'll take what I can get! (Thanks Ole)

    Since you brought up the explorer scenario, I'd like to clarify that this isn’t the only scenario where the performance symptoms can manifest. Take a look at the following blog post that reflects a programmatic manifestation of the issue:

    http://blogs.msdn.com/b/lagdas/archive/2010/04/29/managing-a-folder-with-a-million-files-in-an-asp-application.aspx

    The "~10K file" I mentioned wasn’t meant to imply an actual limit but more of a guideline that has been offered to many folks that have experienced the related symptoms.

    I'm just trying to gather information that would be helpful to those that are trying to architect a strategy for dealing with millions of files while attempting to avoid "known" pitfalls.

    Monday, November 14, 2011 4:50 PM
  • Glad I could help.

    Please mark my post as answer to your question if you feel it qualifies as one.

    Thanks. :)

    Monday, November 14, 2011 5:31 PM
  • It seems that there is actually a difference between files -vs- folders at least as it relates to the MFT.

    Given this information, would many (~millions) folders in a single folder be less likely to introduce performance related symptoms than many files in a single folder?

    I'm inclined to think not, mainly because of the referenced indexes.

    From MSDN on the topic:

    http://technet.microsoft.com/en-us/library/cc976808.aspx

    "NTFS creates a file record for each file and a directory record for each directory created on an NTFS volume. The MFT includes a separate file record for the MFT itself. These file and directory records are stored on the MFT. The attributes of the file are written to the allocated space in the MFT. Besides file attributes, each file record contains information about the position of the file record in the MFT.

    Each file usually uses one file record. However, if a file has a large number of attributes or becomes highly fragmented, it may need more than one file record. If this is the case, the first record for the file, called the base file record, stores the location of the other file records required by the file. Small files and directories (typically 1,500 bytes or smaller) are entirely contained within the file's MFT record.

    Directory records contain index information. Small directories might reside entirely within the MFT structure, while large directories are organized into B-tree structures and have records with pointers to external clusters that contain directory entries that could not be contained within the MFT structure."

    • Marked as answer by Nagorg Monday, November 28, 2011 10:08 PM
    Tuesday, November 15, 2011 11:20 AM