locked
Is DFS-R the right tool for this job? RRS feed

  • Question

  • Hi,

    Hopefully this is an easy (and perhaps interesting) question for a DFS pro. I have no prior DFS experience, but have been using file services (for the longest time using Macintosh file services) since NT4. I support a 24x7 infrastructure, and purposely try to keep things simple because they just work better that way. We have typically build a file server, turned it on, and let it run for 2-3+ years. That introduced us to fun things like servers (NT4, 2K) that would gladly store large numbers of files under Mac Services, but would run out of memory space trying to index those files if you ever rebooted the machine!

    So, we'd like to enjoy the same (nearly) non-stop, simply-administered performance that Win2003 has delivered the past several years, but (1) keep multiple copies online, and (2) have the luxury of actually maintaining the file server(s) - so take one down and patch it, without interrupting file operations. I'd prefer not to use a 3rd party app, and we really want distinct, relatively independent copies of the data. So, is DFS-R the right thing to use?

    Here's the setup: The data consists of 1 million files, averaging about 350KB each (so roughly 350GB). Our I/O is pretty light, I think, and simple. We would likely max around 100 read+write operations per second from clients, but typically would see no more than half that. And for simple, we really could just move most all the file operations through HTTP puts and gets. We'll have one primary file server that all clients will access, a second will be a local replication copy, and a third will be in our DR site over a very reliable 30Mbps+ WAN link. For these servers I've allocated decent hardware: Proliant DL380 G6, 2.26GHz, 6GB RAM, and a 6 x 300GB 10K SAS  RAID-10 data array that's directly attached.

    So, the questions that come to mind are:

    1. How real-time will replication be in such a setup? Do you see any red-flags for performance?

    2. I haven't studied enough to understand the client connection methods. We have a heterogeneous mess of clients: Mac OS X, Linux, Win 2xxx, and even NT4. Do these all connect effectively - I assume - through SMB?  Can we serve the files through IIS? (To use this as we envision, I assume that we will have client awareness as to which server is the primary.)

    3. Am I introducing more complexity (e.g. potential problems) that will in fact reduce file availability? Or if there is a problem, what is the potential recovery time?

    In a related vein, what is an efficient way to organize these files in the file system? We want to change our current method, which was developed to perform well with DOS clients on a 80286-based server! Now, we spread  files over about 5K directories ( 4 levels deep, like \\server\1\2\3\4\file ), so there's an average of 200 files per directory. In the new method, virtually all of the files will be named as a GUID.xxx, so something like E48E77D0-D3F4-4783-BD10-79F24210A2EA.xxx. Our thought is to store these as \\server\ab\cd\file, where ab and cd are the first two and next two numbers in the file name, respectively. We could obviously take that another level: \\server\ab\cd\ef\file, but that seems like it's probably overkill.

    Since we don't have any experience using DFS, we'd appreciate hearing a little of yours. Thank you!

    Thursday, April 8, 2010 9:05 PM

Answers

  • DFS-R is a great product for something like this, with a few limitations.  DFS-R requires file locks when files are changed, so multiple users editing a shared Excel document is not a good idea as data corruption can occur.  DFS-R will replicate only the changed portions of the file so long as the changes are over 64KB in size, and it will use remote differential compressioin (RDC).  We use DFS-R for our application distribution points, and we deploy software using Software Installation Settings within Group Policy.  The replication can be configured to whatever schedule you wish, and bandwidth restrictions apply if configured.  Assuming Active Directory Sites and Services is configured with the appropriate subnets, and DCs are in the proper containers, clients will be directed to only the DFS-R servers in a site in which they reside, unless one isn't present, in which case they'll be redirected to the most appropriate server.  If you want to enable web access, you can also install the Web Server role and configure WebDAV with SSL.  This will give users a familiar drag and drop interface of Windows Explorer, but the traffic will be handled over the web.  This will also allow them to be able to upload data to the directories, provided they have the appropriate NTFS permissions, and the web site is configured to allow changes.  Clients see DFS-R through a DFS namespace that is handled by DNS/Active Directory.

    In example, you have two servers serving a folder named Bob.
    \\Server1\Bob
    \\Server2\Bob

    A DFS namespace is created to host these shares and is configured for replication.  The domain is Contoso.msft.
    \\Contoso.msft\Bob   (Bob can actually be named anything else at this level as it references the other shares internally.)

    All clients now have their mapped drives updated to this new DFS-R namespace, and no longer are concerned about individual server names.  Mapped drives and file server replacements just got easy. 

    Direct Question Answers:
    1) Replication is as real as your bandwidth allows and respects scheduling, so it can be off hours if you like.  The only performance is the file should only be changed by one person at a time.

    2) SMB/CIFS will be the default access method, but you could install IIS, FTP, and/or WebDAV on the server to host directories.  NTFS permissions apply.

    3) Stability should be just find.  Consider NIC teaming to give users more bandwidth to access files.  Recovery time isn't any different from traditional file servers.  This is not a disaster recovery solution.  It's high availability, although I'm sure it has been used for Disaster Recovery.  If data is deleted in one location, it's deleted on all replication partners.  Backups are a must.

    4) (Files) I'm not sure really what you're after here.  Just make sure to break out the folders where you need permissions to be unique.  Apply permissions no more than 2 or 3 layers deep, and document these permissions.  Inheritance is your friend when it comes to managing permissions.  I cannot stress how much easier auditing is this way.  Folder depth can be an issue with some backup software, as the filenames are increased in size, so I'd research that aspect prior to making your decision.  Sometimes, 256 characters are the maximum for accessing folders/files.

    Friday, April 9, 2010 6:18 PM

All replies

  • DFS-R is a great product for something like this, with a few limitations.  DFS-R requires file locks when files are changed, so multiple users editing a shared Excel document is not a good idea as data corruption can occur.  DFS-R will replicate only the changed portions of the file so long as the changes are over 64KB in size, and it will use remote differential compressioin (RDC).  We use DFS-R for our application distribution points, and we deploy software using Software Installation Settings within Group Policy.  The replication can be configured to whatever schedule you wish, and bandwidth restrictions apply if configured.  Assuming Active Directory Sites and Services is configured with the appropriate subnets, and DCs are in the proper containers, clients will be directed to only the DFS-R servers in a site in which they reside, unless one isn't present, in which case they'll be redirected to the most appropriate server.  If you want to enable web access, you can also install the Web Server role and configure WebDAV with SSL.  This will give users a familiar drag and drop interface of Windows Explorer, but the traffic will be handled over the web.  This will also allow them to be able to upload data to the directories, provided they have the appropriate NTFS permissions, and the web site is configured to allow changes.  Clients see DFS-R through a DFS namespace that is handled by DNS/Active Directory.

    In example, you have two servers serving a folder named Bob.
    \\Server1\Bob
    \\Server2\Bob

    A DFS namespace is created to host these shares and is configured for replication.  The domain is Contoso.msft.
    \\Contoso.msft\Bob   (Bob can actually be named anything else at this level as it references the other shares internally.)

    All clients now have their mapped drives updated to this new DFS-R namespace, and no longer are concerned about individual server names.  Mapped drives and file server replacements just got easy. 

    Direct Question Answers:
    1) Replication is as real as your bandwidth allows and respects scheduling, so it can be off hours if you like.  The only performance is the file should only be changed by one person at a time.

    2) SMB/CIFS will be the default access method, but you could install IIS, FTP, and/or WebDAV on the server to host directories.  NTFS permissions apply.

    3) Stability should be just find.  Consider NIC teaming to give users more bandwidth to access files.  Recovery time isn't any different from traditional file servers.  This is not a disaster recovery solution.  It's high availability, although I'm sure it has been used for Disaster Recovery.  If data is deleted in one location, it's deleted on all replication partners.  Backups are a must.

    4) (Files) I'm not sure really what you're after here.  Just make sure to break out the folders where you need permissions to be unique.  Apply permissions no more than 2 or 3 layers deep, and document these permissions.  Inheritance is your friend when it comes to managing permissions.  I cannot stress how much easier auditing is this way.  Folder depth can be an issue with some backup software, as the filenames are increased in size, so I'd research that aspect prior to making your decision.  Sometimes, 256 characters are the maximum for accessing folders/files.

    Friday, April 9, 2010 6:18 PM
  • Thank you for the reply! It sounds very promising for us... our files are almost exclusively write-once, read-many; and when they are written/updated, it is always by a single client. So, that sounds like a good fit.

    Further, we are concerned primarily with high-availability here. The DR aspect for us is if we lose our central site, then the replication target at the remote site will have a current copy of the data. I'm with you on the backups, and understand. So really, we're good there too.

    The access isn't quite as straightforward for us, since most of the clients attaching now are Mac OS X clients, and I'm not sure how cleanly they will support such things through AD. Right now, we do authenticate users through the DCs, and obviously enforce file permissions that way. Otherwise, the Macs connect to a \\server\share using their version of Explorer ("Finder"), which just references the DNS name and share. It's not like they are connecting through a listing of directory resources. Any readers have any experience with DFS and Mac OS clients?

    My performance questions were twofold: one, I wanted to make sure that there was enough horsepower to support the DFS on top of the file sharing. It doesn't sound like that's a problem, and since we're not really thrashing these servers, I wouldn't expect one.

    The item (4) was really a question about an optimal folder structure - or at least not creating something that's poorly suited to the file system. In "the olden days", a folder with 1000 objects was noticeably slow to access. Thus, to keep our server response more real-time, we divided it up into thousands of folders four levels deep. Under our proposed method, we could have 65K folders in two levels (256x256). I wanted to make sure that wasn't inviting problems on its own. Good point on the folder\file string length, but we should be ~ 50 characters plus a server name; so it should be OK.

    Thanks again for your comments, and I appreciate any additional!

    Monday, April 12, 2010 2:18 PM
  • DFS-R should just be seen as any regular SMB share to the Macs.  I have used Macs to target SMB shares at a previous job, so I know it's quite possible to do.  The biggest issue is making sure they do not use special characters such as ,/%\^* in the file name.  It kept killing our backups. :)  I haven't noticed any limitations to the maximum number of files in a folder.  Honestly, I would just setup a test server, lump all the files in as few directories as reasonable, and see what kind of performance you get.  DFS-R's impact on performance seems to be negligible.  I have a feeling the limitations you experienced before are gone if you use any modern operating system.  Building a folder list just shouldn't be a big deal these days.

    Monday, April 12, 2010 3:32 PM
  • Thanks again, RS. Yes, I think it's to the time of testing something out, because I don't see any real roadblocks.

    I guess the one point I was (am) unclear on is if the clients must necessarily connect through the DFS root in the AD, or if it's still possible to connect to the (equivalent) share on a specific server. We've used Macs connecting to Win servers since NT4, so I'm pretty well acquainted with a number of pitfalls and problems over the years. They're pretty well-behaved now, but I'm not too confident about introducing something that requires true AD integration. Or, I'd feel better if there was a workaround like a "direct" server access (where then DFS would just be doing replication).

    Monday, April 12, 2010 5:56 PM
  • You can directly access the server share as well.  All DFS-R does is replicate the data and give you one target namespace.  Honestly, in almost all cases, you should target the namespace instead of directly because if the servers ever change, the clients won't need to know it.  DFS-R will just handle it for you after you configure the new server, allow it to replicate, then remove the old.
    Monday, April 12, 2010 8:15 PM