none
NLB, CAS array or OTHER issue? RRS feed

  • Question

  • So I set up NLB with all ports enabled for my CAS servers (there are two). Tested and it seemed to work (stopped each note and tested OWA.. was always able to get to OWA as long as one CAS was up)

    I then created a CAS Array and pointed my DB to it. My outlooked choked for about 15 min and then poof. It worked. All seemed well.

    Now I have this situation: If I go on to one computer and go into Control Panel-- > EMail Accounts-->  and point my account to the CAS Array it works fine. It resolves the server name, finds my account and I can open my mailbox.

     If I go to another server and try the same thing when I click Check Name it reports "Outlook cannot log on, Veryify you are connected to the network and are using the proper server and mailbox name. The connection to Microsoft exchange is unavailable". If I put in one node of the CAS array and check name it DOES resolve my account BUT it also flicks my server back to being the CAS array.

    Does this make any sense? Why would some computers work fine and others not (for example one bes server works.. the other not, my desktop is fine, my boss' doesnt work)

    I am stumped.

    Tuesday, August 10, 2010 5:34 PM

Answers

  • We had the same exact issue and it is now resolved.  We had to do the following:

     

    1.  Switch the NLB Cluster to multicast mode.

    2.  In VMware change the port groups for the VLAN the Exchange servers are on.  Check the "Notify Switches" box and select No.  This will turn off RARP packets to the vSwitch.  This also must be done on any ESX hosts that will host CAS servers in the array.

    3.  Create a static ARP entry in your cisco switch.

     

    Once the above was implemented we had zero issues with the NLB/CAS Array.

    Tuesday, August 24, 2010 4:56 PM

All replies

  • I should mention I considered maybe a DNS issue.. but I can ping the cas array from both servers so they both resolve the array using DNS ok. Anyone?
    Tuesday, August 10, 2010 6:30 PM
  • do you have a default gateway on both clients ? and are you specifying the FQDN of the CAS array name on the databases ?
    Tamer Sherif Mahmoud
    Tuesday, August 10, 2010 9:59 PM
  • yes to the FQDN.

    No default gateway for the NLB nics but otherwise yes. What is odd is that only certain machines have a problem connecting to the CAS.

    workstation-a works fine, workstation-b cant resolve mailboxes using the casarray. from workstation-b if I specify one of the cas nodes for the mailbox lookup it resolves the mailbox (but also redirects to the casarray, which then doesnt work).

    Could this be an issue with my NLB? If I turn off NLB as a test would the casarray still work?

    Wednesday, August 11, 2010 1:04 PM
  • i meant for workstation-a and workstation-b both are having default gateway ?
    Tamer Sherif Mahmoud
    Wednesday, August 11, 2010 1:06 PM
  • Are the workstations all on the same subnet? Or located on different LAN's?
    Wednesday, August 11, 2010 1:09 PM
  • yes .. both workstations are on the same subnet/switch etc as each other as well as the cas array.

    wsA and wsB both have the same default gateway as the cas servers.

    Wednesday, August 11, 2010 1:43 PM
  • Did you told the databases to use this array?

    If you do: Get-MailboxDatabase %STORE% | fl rpcclientaccessserver
    Do you see the name of the array?

    If not you need to set this:
    set-MailboxDatabase -Identity %STORE% -RpcClientAccessServer %Array FQDN%

     

     

    Wednesday, August 11, 2010 2:14 PM
  • ok.. some progress.

    I have two CAS servers in a CAS array. They are CAS1 and CAS2

    The two CAS servers are also in an MSNLB cluster.

    If I take CAS1 offline the issue is resolved.

    If I bring CAS1online and take CAS2 offline the issue comes back.

    If both CAS1 and CAS2 are online then the issue persists.

    What is odd with that is that CAS1 is the first CAS i had created and it had been working before the CASArray was created.

    [PS] C:\Windows\system32>get-clientaccessarray | fl


    RunspaceId        : fb45f073-4f57-4227-a057-2042ae07a48e
    Fqdn              : casvs.domain.dom
    Site              : domain.CA/Configuration/Sites/Colocation
    SiteName          : Colocation
    Members           : {CAS01, CAS02}
    AdminDisplayName  :
    ExchangeVersion   : 0.1 (8.0.535.0)
    Name              : casvs
    DistinguishedName : CN=casvs,CN=Arrays,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=domain,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=domain,DC=CA
    Identity          : casvs
    Guid              : 4308fc15-552c-45ce-addc-ae8c6daa466f
    ObjectCategory    : domain.CA/Configuration/Schema/ms-Exch-Client-Access-Array-2
    ObjectClass       : {top, server, msExchExchangeServer, msExchClientAccessArray}
    WhenChanged       : 7/22/2010 8:58:47 AM
    WhenCreated       : 7/22/2010 8:58:32 AM
    WhenChangedUTC    : 7/22/2010 12:58:47 PM
    WhenCreatedUTC    : 7/22/2010 12:58:32 PM
    OrganizationId    :
    OriginatingServer : pdc07.domain.dom
    IsValid           : True

    Wednesday, August 11, 2010 4:14 PM
  • I'm following your thread because I'm having similar issues with a larger environment that was setup without a CAS array.

    Are these VM machines using virtual networks in a vmware or hyperv environment?


    -J
    Wednesday, August 11, 2010 6:46 PM
  • The cas servers are VM

    One mbx is a vm and the other is hardware.

    I am going over the NLB config to see if I missed something.

    Wednesday, August 11, 2010 6:54 PM
  • Unicast on VMware vSwitches is a hit or miss deal. They have some great documentation in their kbase for NLB unicast setups.

    I also have two CAS vms setup using a specific port group and VLAN for NLB on a vSphere vdSwitch.

    I migrated them to the same host so NLB was consistent. I'm working to use multicast NLB when we have time to setup static ARP settings on the physical switches our hosts are plugged into. But for now, it seems to have solved our NLB funkyness.

    For testing, if you can, migrate both CAS servers to the same physical host - and try it again.


    -J
    Wednesday, August 11, 2010 7:52 PM
  • Thanks J,

    Didnt work. But I learned something about VM ware.

    I dont see anything wrong in the NLB settings.

    OWA works fine btw so some NLB IS working at some level.

     

    Thursday, August 12, 2010 1:41 PM
  • Silly question.. do I need WINS entries for these servers?

    Non of the CAS servers appear in WINS. Neither does the CAS Array.

    Thursday, August 12, 2010 2:32 PM
  • Andrew,

                in virtualized environments it is recommend to configure your WNLB in multi-cast mode since you otherwise will expect an issue with the WNLB array not working properly. Here is VMWare KB: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1556

    If you are using Hyper-V you can check the Microsoft KB: http://support.microsoft.com/kb/953828/en-us

    Regards,

    Leonardo Artese

    MCSE - MCTS: Exchange 2007 - MCITP: EMA 2010 - MCITP: EA

     

    Thursday, August 12, 2010 3:55 PM
  • Thanks Leonardo. Spiral had already pointed me to this issue.

    I am still not getting it. I have turned off NicTeam notifications for the vswitches. Nothing.

    Set the cluster to multicast mode. Still nothing. Once host resolves fine. The other doesnt.

    Did i mention that I am using two nics per host?

    recreated teh cluster. Everythign works fine until I add a second node.

    Funny thing is that OWA works fine, always connects. It is only RPC connections that fail when both nodes are on (and even then only from certain machines).

    Thursday, August 12, 2010 4:48 PM
  • Thats right,

                    because CASARRAY its only supported for mapi connections. Did you configured the autodiscover host A record correctly?

    Regards,

    Leonardo Artese

    MCSE - MCTS: Exchange 2007 - MCITP: EMA 2010 - MCITP: EA

     

    Thursday, August 12, 2010 7:35 PM
  • Ok. Here is the last little tidbit of oddness.

    Moved both CAS servers to the same ESX host and the issue no longer occurs.

    This takes the Cisco switch out of the equation. I have read about needing to install a static arp entry on the Cisco.

    We arent using autodiscover yet.. does that matter?

     

    Thursday, August 12, 2010 7:57 PM
  • Hi Andrew,

    It is glad to hear it works well now for your CAS array.
    Autodiscover service is a service on the CAS server, and it would be used for internal and external user client, you could refer to :
    http://technet.microsoft.com/en-us/library/bb124251.aspx
    For you issue, it maybe no related with it anything.

    Regards!
    Gavin
    Friday, August 13, 2010 7:20 AM
    Moderator
  • Ok. Here is the last little tidbit of oddness.

    Moved both CAS servers to the same ESX host and the issue no longer occurs.

    This takes the Cisco switch out of the equation. I have read about needing to install a static arp entry on the Cisco.

    We arent using autodiscover yet.. does that matter?

     


    Moving the CAS servers to the same host confirms you have a conflict between your NLB MAC broadcasting out both hosts and the ESX host's vswitch updating the physical switch's arp table with one route to a single CAS box. Setting up a static ARP on all the ports that could possibly host the CAS boxes (or more specifically the NLB NIC) will resolve this issue.

    There are some really good resources in the first few links of this search:

    http://www.google.com/search?q=vmware+nlb+multicast

     


    -J
    Sunday, August 15, 2010 4:15 PM
  • I talked to one of our network consultants Thursday. He has seen this issue before. Suggested a static arp entry on our hardware switch. This seems to validate what Jason has suggested. I will try and let yall know this aft.

     

    Monday, August 16, 2010 12:56 PM
  • This takes the Cisco switch out of the equation. I have read about needing to install a static arp entry on the Cisco.


    Very common if the NLB is running in Multicast mode because Cisco (and probably others) will not allow a multicast address to be plugged into their ARP table automatically.
    Microsoft Premier Field Engineer, Exchange
    MCSA 2000/2003, CCNA
    MCITP: Enterprise Messaging Administrator 2010
    Former Microsoft MVP, Exchange Server
    My posts are provided “AS IS” with no guarantees, no warranties, and they confer no rights.
    Monday, August 16, 2010 1:30 PM
  • Ok.. I should mention that we are actually running unicast mode.. so I am trying to wrap my brain around this.

     

    Monday, August 16, 2010 6:04 PM
  • We had the same exact issue and it is now resolved.  We had to do the following:

     

    1.  Switch the NLB Cluster to multicast mode.

    2.  In VMware change the port groups for the VLAN the Exchange servers are on.  Check the "Notify Switches" box and select No.  This will turn off RARP packets to the vSwitch.  This also must be done on any ESX hosts that will host CAS servers in the array.

    3.  Create a static ARP entry in your cisco switch.

     

    Once the above was implemented we had zero issues with the NLB/CAS Array.

    Tuesday, August 24, 2010 4:56 PM
  • It seems to be working (I cant break it).

    Created the ARP entry in the cisco switch

    Created the static MAC entry in the cisco switch

    Set the Notify Switches on the NIC teaming for BOTH the virtual switch and the associated network.

    Set the NLB to IGMP multicast (not just multicast).

    All this might be for nothing if we go for a hardware load balancer .. sigh.

     

    Thanks to all for your help! AND your patience.

    Thursday, August 26, 2010 7:11 PM