locked
No traffic received from gateways - Sporadic health issue. Cause for concern? RRS feed

  • Question

  • I have a very large global forest with 9 domains spread across multiple geographic sites. There are over 200 domain controllers supporting approx 50,000 users so I am trying to make sure all traffic from all DCs is captured for threat analysis by ATA - no small task!

    Setting up ATA, so far I have begun with two gateways, one in each of the largest data centers. The Gateways have been sized according to the sizing tool ran previous to deployment (4 x Quad CPU, 32Gb mem) and each have been set-up to receive port mirrored packets from roughly 14 Domain controllers each.

    ATA is working (I have seen a few alerts) and NETMON shows LDAP and Kerberos packets from the DCs hitting the gateway NIC.

    ATA itself is repeatedly showing 'No traffic received from Domain controller FQDN 'x' for 'y' hours' The actual DC changes and seems to fluctuate.

    Support in the global org I work at means that config of the DCs and Networking is done by a separate off-shore team and I have been working with them to try and figure out why these health issues occur.

    They assure me that port mirroring is correct (Virtual servers talking to virtual switches in a VMWare environment)

    They are chatting to VMWare so I just wondered if anyone out there with a similarly large organization has ran into similar issues or has any tips regarding complete ATA coverage for a global org.

    Thanks

    Chris

    Friday, December 28, 2018 10:16 AM

All replies

  • Any chance those DCs are rebooted for maintenance?

    Or if they are VMs, it could be that they temporarily migrate to a different host then come back (that would break the mirroring)  

    Any other health issues reported by the system besides this type of alert?

    Friday, December 28, 2018 12:23 PM
  • I had a "Gateway,  xxx, is receiving more network traffic than it can process. A portion of the network traffic is not analyzed."

    As far as I am aware, the DCs aren't shut for maintenance and have been continually up. Regarding moving to a different Hypervisor, that is a good point, which I'd have to check with my server team.

    I have installed the gateway on a few DCs (Thus making Lightweight Gateways) I chose ones the sizing tool said could support the agent without any hardware upgrade based on number of packets.

    The Lightweight Gateways do occasionally reach memory resource limit and i get a notification that the service is temporarily stopping, but that's nothing to do with the messages from the Gateways themselves.

    Friday, December 28, 2018 3:09 PM
  • If the traffic drop happens for long times, it can potentially also cause the first alert if we got to a point where we drop all the traffic from this DC...

    I would focus on checking the hypervisor migration, and also mitigating the dropped traffic... you don't want to run this way if you are keep getting these alerts over time as you lose coverage...

    Friday, December 28, 2018 10:23 PM
  • I'm assured that hypervisor migration isn't taking place. Looking at the series of 'No traffic received' messages, all domain controllers appear on the list at some stage, but the nature of the health issues seem sporadic. 

    Some DC health issues are 'open', some are 'closed' The netmon shows traffic still hitting the gateways.

    I'm concerned before I deploy other gateways and want to be confident that all systems are being monitored so I don't lose coverage. I can find no other reference to anyone else getting these alerts in this sporadic fashion.



    • Edited by Choll152 Friday, January 4, 2019 2:22 PM
    Friday, January 4, 2019 2:14 PM
  • The alert does not mean we don't get the traffic in the mirrored nic.

    It means we try to process what is coming in there, and we can't keep up thus we drop some of the data.

    (You won't be able to see that in netmon...)

    you can see that in the ATA GW perf counters.

    https://docs.microsoft.com/en-us/advanced-threat-analytics/troubleshooting-ata-using-perf-counters

    In most cases this is a perf issue, where the DC hardware is not within spec to what the ATA sizing tool recommended. Can you verify if it is or not?

    In rare cases there could be other more complex issues, but those should be handed by customer support where it's easier to exchange data safely. 

    Sunday, January 6, 2019 10:54 AM