locked
NETLOGON 5719 and DHCP 50024, applied KB2459530 and Broadcast flag set to 1, still getting timeouts RRS feed

  • Question

  • Good afternoon,

    I've bumped into this issue, and have yet to find a good solution.

    The environment

    Catalyst 3560G running 15.0(1)SE with following port config

    switchport access vlan <some number> switchport mode access spanning-tree portfast spanning-tree bpduguard enable

    Catalyst 6509 w/ VS-SUP2T-10G running 15.1(2)SY with WS-X6148E-GE-45AT blades and the following port config

     switchport
     switchport access vlan <some number>
     switchport mode access
     logging event link-status
     spanning-tree portfast edge
     spanning-tree bpduguard enable

    The vlan interface on the 6509 is configured as...

     ip address 10.xx.xx.252 255.255.255.0
     ip broadcast-address 10.xx.xx.255
     ip helper-address <ip of DHCP server>
     ip helper-address <ip of SCCM server>
     no ip redirects
     ip directed-broadcast
     ip pim sparse-dense-mode
    

    We are using ip helper on the switches. There is no 802.1x configuration that might be fiddling with port settings.

    I've been testing with this problem against both switches to rule out network differences.

    Monitor ports have been configured on the switches so we can watch the traffic to/from the workstations that are experiencing the DHCP timeouts.

    The endpoints are workstations running Windows 7 with SP1.

    The problem

    We're seeing lots of NETLOGON 5719 errors on boot up. This is breaking group policy processing and a few other boot time processes. The root cause appears to be DHCP requests timing out, which are visible in the DHCP Operational Log as EvendID 50024. So my problem is that DHCP requests are timing out. I need to find out why and get it working so our endpoints start working as expected.

    The tests performed

    I've taken a sample of machines that consistently exhibit problems. Some have the Gigabyte GA-890GPA-UD3H and others a Gigabyte F2A88XM-D3H. Both systems use the onboard Realtek NIC. From the PCI IDs, they use the exact same NIC.

    F2A88XM-D3H -    PCI\VEN_10EC&DEV_8168&SUBSYS_E0001458
    GA-890GPA-UD3H - PCI\VEN_10EC&DEV_8168&SUBSYS_E0001458
    

    I've tested with this with drivers from Realtek. Both versions 7.73.618.2013 and 7.92.115.2015 (current as of 2015-05-22).

    I've already read up on and deployed the hotfix from KB2459530. Checking the file versions on stuff like dhcpcore.dll and friends confirms the hotfix is installed. KB2459530 also talks about manually tweaking the DhcpGlobalForceBroadcastFlag and DhcpConnForceBroadcastFlag values. I've made the required changes and confirmed via the monitor port that requests are leaving the workstation with the Broadcast flag set to 1, instead of Unicast (0).

    All of that said, I am still seeing inconsistencies between the workstation DHCP operational event log and the captured traffic on the monitor port.

    Here is an example that is consistent across all the test machines...

    5/22/2015 1:00:26 PM         50044 Information      Inform ack is received in the adapter 11.
    5/22/2015 1:00:26 PM         50018 Information      Inform is sent in the adapter 11. Status code is 0x0
    5/22/2015 1:00:26 PM         50058 Information      Your computer was successfully assigned an address from the network, and it can now connect to other computers.
    5/22/2015 1:00:26 PM         50042 Information      Dns registration has happened for the adapter 11. Status Code is 0x0. DNS Flag settings is 64.
    5/22/2015 1:00:26 PM         50028 Information      Address 10.40.250.2 is plumbed to the adapter 11. Status code is 0x0
    5/22/2015 1:00:23 PM         50063 Information      Dhcp has notified NLA for the configuration changes for the interface 11
    5/22/2015 1:00:23 PM         50035 Information      Routes are updated in the adapter 11. Status Code is 0x0
    5/22/2015 1:00:23 PM         50059 Information      Route is added with the values Dest = 0.0.0.0, DestMask = 0.0.0.0, NextHop = 10.40.250.254, Address = 10.40.250.2
    5/22/2015 1:00:23 PM         60000 Information      PERFTRACK (Request-Ack): Address confirmed for the adapter 11.Confirmed Address is 10.40.250.2.Server address is 10.0.10.21
    5/22/2015 1:00:23 PM         60010 Information      PERFTRACK (Request-Ack): Address confirmed for the adapter 11.Confirmed Address is 10.40.250.2.Server address is 10.0.10.21
    5/22/2015 1:00:23 PM         50013 Information      Ack is accepted in the adapter 11. Received Address is 10.40.250.2.Server address is 10.0.10.21
    5/22/2015 1:00:23 PM         50012 Information      Request is sent from the adapter 11. Status code is 0x0
    5/22/2015 1:00:23 PM         50024 Warning          Ack Receive Timeout has happened in the Interface Id 11
    5/22/2015 1:00:20 PM         50012 Information      Request is sent from the adapter 11. Status code is 0x0
    5/22/2015 1:00:20 PM         50006 Information      Request-Ack is initiated on the adapter with Interface Id 11
    5/22/2015 1:00:20 PM         60018 Information      PERFTRACK (DHCPv4): Media Connect on adapter 11
    5/22/2015 1:00:20 PM         60019 Information      PERFTRACK (DHCPv4): End of Media Connect on adapter 11
    5/22/2015 1:00:20 PM         50025 Information      Cancelling pending renewals on the adapter in the Interface Id 11
    5/22/2015 1:00:20 PM         50033 Information      An interface is added whose interface index is 11 and Status Code is 0x0.
    5/22/2015 1:00:20 PM         50004 Information      Dhcp is enabled on the adapter with Interface Id 11
    5/22/2015 1:00:20 PM         50001 Information      Media Connect notification received with Interface Id 11
    5/22/2015 1:00:20 PM         50002 Information      Media Disconnect notification received with Interface Id 11
    5/22/2015 1:00:20 PM         50001 Information      Media Connect notification received with Interface Id 1
    

    The initial request (Event 50012) was sent at 1:00:20. The timeout is reached at 1:00:23 (Event 50024) and the request is subsequently resent (50012). The second request gets a response and the DHCP service binds the provided IP to the interface.

    However, on the monitored port, Wireshark doesn't see ANY of the traffic from 1:00:20. The first DHCP Request we see on the wire is at 1:00:23. The rest of the conversation in Wireshark matches what is listed in the Event log.

    I have confirmed that the switches and workstations are pulling NTP from the same source, so the timestamps in wireshark are accurate when comparing to event log entries.

    With the Realtek drivers, I have experimented with Energy Efficient Ethernet (EEE, 802.3az) and Green Ethernet with no change in results. They remain disabled while we continue testing.

    So, although this matches the problems seen in 2459530, it addresses a problem where the DHCP request was being sent with the Broadcast flag set to 0 and the windows firewall dropping the DHCP ACK. Since I don't even see the initial traffic on the wire, I do not think my problem is resolved by KB2459530.

    Has anyone else seen problems like this? Any additional information would be helpful.

    Thank you,

    -nils


    Friday, May 22, 2015 10:01 PM

All replies

  • Just for more information, I just tried the Realtek LAN Driver from Gigabytes support site, as it's a different version (7.082.0317.2014) than I had previously tested with. Same settings with EEE and GE disabled, and the DHCP Broadcast flag still set.

    Same results. The event log indictes a request (50012) was sent across the wire at 3:16:43 PM, the timeout (50024) was hit at 3:16:48 PM and the second request (50012) was sent at 3:16:48 PM which succeeded. Wireshark doesn't see the first request sent at 43 seconds, only the second request at 48 seconds.

    Friday, May 22, 2015 10:21 PM
  • Would also like to report that forcing the switch port to only auto negotiate for 10 100, this problem completely disappears.


    EDIT: Please note, this is a point of data. It is not a solution.
    • Proposed as answer by MeipoXuMicrosoft contingent staff Monday, May 25, 2015 7:13 AM
    • Unproposed as answer by nf_ Monday, May 25, 2015 3:36 PM
    • Edited by nf_ Monday, May 25, 2015 3:38 PM Adding comments that this is not a solution.
    Friday, May 22, 2015 10:36 PM
  • Hi nf_,

    So the issue is resolved by changing the negotiation port?

    I am glad the issue has been resolved and thanks for updating.It will be very useful as a reference for the person who will come across the similar issue in the future.

    Best regards


    Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact tnmff@microsoft.com.

    Monday, May 25, 2015 7:15 AM
  • Good morning MeipoXu,

    Thank you for your reply. I've updated my post about the 10/100 connectivity. I believe my comment was not clear in that it was simply a data point. Degrading gigabit switchports to 10/100 is not a solution.

    Thank you,

    -nils

    Monday, May 25, 2015 5:16 PM
  • I have opened a case (Request ID 115052512767289) for this issue.
    Monday, May 25, 2015 10:11 PM
  • Hi nf_,

    We would appreciate it if you can update this thread when there is any evolution from that case.

    Best regards


    Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact tnmff@microsoft.com.


    Tuesday, May 26, 2015 2:00 AM
  • The MS support case for this is progressing. Due to some scheduling conflicts, it's taken longer than I would like, but work is still being done.

    In our last session, the MS tech gathered some ETL data for further processing in house. They had previously confirmed that KB2459530 was installed, tried twiddling the DhcpGlobalForceBroadcastFlag and DhcpConnForceBroadcastFlag values, and confirmed that those changes do not fix the problem I'm seeing.

    Will post more information as it comes available.

    Wednesday, June 3, 2015 7:56 PM
  • Hi nf_

    Thanks for updating, we are looking forward to the good news.

    Best regards


    Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact tnmff@microsoft.com.

    Thursday, June 4, 2015 1:12 AM
  • Just an update, because I don't want to be that guy.

    My support ticket with Microsoft didn't go too far, with the last communication indicating the problem was with our network. Regardless of my troubleshooting, since we don't use a Microsoft DHCP relay agent and the devices are on seperate network segments, I too couldn't rule out the network. Unfortunately other projects took priority and I had to stop working on this issue.

    I'm sure my findings would have been different if I plugged a worksation into the same vlan the DHCP server lives on, and maybe I'll get time to do that in the future. However we're moving to Windows 10 in the next year, with new problems, and this one will be left alone.

    Hopefully others have better success with this than I did.

    Wednesday, December 23, 2015 12:30 AM