locked
KB4019472 on Server 2016 causing AFD event 16002 and DHCP / ADDS / DHCP to fail RRS feed

  • Question

  • I have a pair of 2016 VM DC's running on Hyper-V, brand new domain as of January this year. Both devices are pretty much identical, built within 2 days of each other, with the same services installed on both, and up to date with KB updates. Working perfectly until the May KB update was installed on both.

    Since the update DC1 has been falling over every 26 or 27 hours, without fail. When it occurs, in the event logs, AFD throws a warning about UDP ports, DNS is un-contactable as it cant talk to AD, AD replication stops working, DHCP fails as it cannot contact ADDS etc etc. However other network services (file shares, ping, RDP) work correctly.

    A reboot on the affected machine and all affected services kick back in to life, replication starts again, dcdiag etc reports no errors, repadmin reports all working fine and 27 hours later, the same thing happens again.

    Uninstalled the latest KB in case there was an issue with the update (installed via SCCM) and DC1 once again works perfectly, no issues at all. Reinstalled a few days later using online win update on the device, sure enough same issue occurs again.

    The warning error in Event viewer suggests network drivers could be to blame, however both devices were installed using the same iso and currently both are sat on the same Hyper-V host (I know.... new servers coming soon).

    I've tried searching for similar issues however the only potentially related issues arewith KB4019215 and a system running out of ephemeral ports - this is my next point of investigation. I've considered replacing the VM driver for the network device but am a little hesitant due to the update clearly causing the issue.

    In the meantime I've pulled the update for now and was going to reinstall when the June update is released but have a sinking feeling i'll see the same issue.

    Thursday, June 1, 2017 3:33 PM

All replies

  • Hi Deequeue

    >>In the meantime I've pulled the update for now and was going to reinstall when the June update is released

    Based on your situation, we also agree with this method. If you have further information after reinstalling June updates, you could post it on the forum.

    Best Regards,

    Candy


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Friday, June 2, 2017 6:43 AM
  • Hi Deequeue,

    Did you have any updates?

    Best Regards,

    Candy


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Monday, June 12, 2017 4:40 AM
  • Hello,

    We encounter the same issue with one of our customers. 
    The server is a HP Proliant ML350 Gen9 with up-to-date drivers/bios. 
    Other customers with an Windows 2016 server are not having the issue. 

    Regards,
    Tim Van Engeland
    Monday, June 12, 2017 6:51 AM
  • Hi Candy

    I Installed the June update rollup yesterday, and after just under 2 weeks of running correctly without the patch installed, we are experiencing exactly the same issue with AFD from the new patch.


    Any additional advice would be appreciated. I am starting to see a few more people online starting to have the issue, but still not in great numbers.

    This doesnt appear to be a hardware issue for us, as the VM host is running 3 other fully patched 2016 servers without issue.

    Dale

    Friday, June 16, 2017 12:06 PM
  • Hello,

    Are you using iSCSI on those servers? 

    Regards,
    Tim Van Engeland

    Friday, June 16, 2017 1:32 PM
  • Hello,

    Because we use a iSCSI path to a NAS with this customer. Other customer's don't have the issue, as they are not using iSCSI. 

    Windows Server 2012 R2 and Server 2016 computers that experience disconnections to iSCSI attached targets may show many different symptoms. These include, but are not limited to:

    • The operating system stops responding
    • You receive Stop errors (Bugcheck errors) 0x80, 0x111, 0x1C8, 0xE2, 0x161, 0x00, 0xF4, 0xEF, 0xEA, 0x101, 0x133, or 0xDEADDEAD.
    • User log on failures occur together with a "No Logon Servers Available" error.
    • Application and service failures occur because of ephemeral port exhaustion.
    • An unusually high number of ephemeral ports are being used by the System process.
    • An unusually high number of threads are being used by the System process.

    Cause

    This issue is caused by a locking issue on Windows Server 2012 R2 and Windows Server 2016 RS1 computers, causing connectivity issues to the iSCSI targets. The issue can occur after installing any of the following updates:

    https://support.microsoft.com/en-us/help/4019472/windows-10-update-kb4019472 

    https://support.microsoft.com/en-us/help/4022715

    Issue still in KB4022715

    Regards,
    Tim Van Engeland

    Friday, June 16, 2017 1:37 PM
  • Hi Tim,

    Yes, we use iSCSI on both DC's, one with and one without the issue. There are no events listed with regards to iSCSI disconnections.

    Had another failure over the weekend, but I was actually able to logon to the server and do some checks before it fell over.

    Our monitoring software reported DNS server with 100% CPU utilisation - this was confirmed, and is an issue that we have not seen previously.

    iSCSI was still accessible, and has been every time we have seen the error, as shared home folders on the device were accessible across the network, as was netlogon and sysvol. Shared printers were also visible, although I didnt have time to check in they worked.

    Checking ephemeral ports, only 5406 of 16384 available were used. Also used netstat -an | find /c "TIME_WAIT" but 0 ports were listed.

    DHCP was completely unresponsive from other devices, and the service would not respond to a stop request. Annoyingly, even though DHCP failover is configured, and the DHCP service does not respond, the service does not fail over correctly to the other DC when this happens. I have tested failover since the issue has been happening by stopping the service forcefully, and it correctly switches to the other DC.

    Intersite messaging is also unresponsive.

    DNS server was restarted, but still inaccessible on reboot.

    I have applied an update to the network driver for the VM host this morning, but I do not expect it to resolve the issue.

    Dale

    Monday, June 19, 2017 8:05 AM
  • Hi Again Tim,

    So, after your post regarding iScsi initiators, I thought I would do a little digging.

    The only thing I can find that is different between my 2 devices is that there was an attempted reconnecting SCSI target that from an old initiator that had been disabled and (i thought) removed from the failing server.

    Not sure if this is my issue or not, but if so, I will report back.

    Dale

    Monday, June 19, 2017 11:04 AM
  • Thanks for this suggestion. "Faulty" iscsi connections indeed seem to be causing this. I'll update in a few days.

    Thursday, August 31, 2017 8:43 AM