locked
SCCM 1606 - Very intermittent PXE issue RRS feed

  • Question

  • I've been having a very odd issue with PXE in that it will only attempt to correctly PXE boot with an IP address around once every five attempts or so.  Some background information:

    • SCCM server is SCCM 1606 (CB) with Update Rollup 1 and additional KB (it uses hybrid Intune MDM)
    • Distribution point and management point roles are on the same server
    • PXE support has been enabled and WDS services are operational
    • IP Helpers have been enabled on the network routers
    • Boot images are on distribution point and tick box enabled to ensure available from PXE service point
    • The task sequences are deployed as available to the unknown computers collection and unknown computer support for PXE is enabled (and also password protected)

    Effectively, when you try and PXE boot, around 4 times out of 5 it'll receive a PXE error E-53 stating that the boot filename wasn't received properly. On the successful boot, it shows its IP address, DHCP server etc, brings down the boot file from smsboot/x64 and then (as it's available) correctly prompts for F12 to continue, which then brings up WinPE and allows us to choose the task sequence from the ones deployed to the unknown computers.

    SMSPXE.log shows the MAC address of the PC in question and sees the most recent task sequence deployment advertised to the collection (with the deployment ID of that, and the Package ID of the Windows boot image) and shows it's communicating with the management point as well.

    If DHCP options 66 and 67 on the DHCP server are manually configured to specify the PXE server and the boot filename it does then boot, but obviously that isn't recommended, and as there's a mix of legacy BIOS and UEFI clients, I'd rather have the IP helper do the work and therefore give out the correct boot filename.

    A few questions to ask:

    • Would antivirus software on the SCCM server somehow be interfering with the WDS services so not always allowing PXE transmissions?
    • If SMSPXE.log shows the MAC address of the client in all attempted boots, is the IP helper correctly setup?
    • Would removing the PXE options from the DP, removing the WDS role, restart and re-add the PXE options (thus reinstalling WDS) fix the issue?
    Wednesday, October 26, 2016 7:30 PM

Answers

  • Good news,

    We've managed to resolve the issue, and this may help other users as well.

    It transpired that the DHCP settings for the vLAN (same subnet / vLAN as clients and SCCM server was on) was responding too quickly to requests, therefore not able to acknowledge what boot image etc it needed.

    So,  in DHCP, right click the Scope that has the same subnet / vLAN, and select properties.

    In the Advanced tab, set the DHCP delay to an appropriate number (we used 100 milliseconds)

    This way, the DHCP request is slowed down enough to be able to get a suitable PXE response back, meaning clients can then communicate accordingly.

    With the other subnets and vLANs, because of the fact they were all using correctly configured IP helpers, they had to relay to that helper, hence the DHCP delay would already be long enough anyway, hence the fact that it worked for those without issues (and still does)


    • Marked as answer by Zawtowers Monday, April 10, 2017 12:56 PM
    • Edited by Zawtowers Monday, April 10, 2017 12:57 PM
    Monday, April 10, 2017 12:31 PM

All replies

  • #1: unlikely, but possible
    #2: yes
    #3: I don't think so. You should find the root cause. Can you provide smspxe.log from a working and non-working attempt?


    Torsten Meringer | http://www.mssccmfaq.de

    Wednesday, October 26, 2016 7:45 PM
  • Thanks Torsten.  I've been battling with this issue attempting to locate the root cause.

    From a successful attempt, the snippet of SMSPXE.log is below.  Originally it looks at architecture 0 and only when it then sees architecture 9 as below does it then do the following:

    Client boot action reply: <ClientIDReply><Identification Unknown="0" ItemKey="2046820352" ServerName=""><Machine><ClientID>cb5a4ee1-5dc2-40b4-97ea-1a3628523267</ClientID><NetbiosName/></Machine></Identification><PXEBootAction LastPXEAdvertisementID="" LastPXEAdvertisementTime="" OfferID="S12200AC" OfferIDTime="09/09/2016 11:37:00" PkgID="S12000E1" PackageVersion="" PackagePath="http://SCCM1.AD.LOCAL/SMS_DP_SMSPKG$/S1200055" BootImageID="S1200055" Mandatory="0"/></ClientIDReply>
        SMSPXE    26/10/2016 21:46:32    11216 (0x2BD0)
    00:15:5D:0A:41:39, 095A957F-A189-45D8-9FCE-80964337902E: found optional advertisement S12200AC    SMSPXE    26/10/2016 21:46:32    11216 (0x2BD0)
    Getting boot action for unknown machine: item key: 2046820353    SMSPXE    26/10/2016 21:46:32    11216 (0x2BD0)
    Prioritizing local MP https://SCCM1.AD.LOCAL.    SMSPXE    26/10/2016 21:46:32    11216 (0x2BD0)
    In SSL, but with no client cert    SMSPXE    26/10/2016 21:46:32    11216 (0x2BD0)
    Request using architecture 9.    SMSPXE    26/10/2016 21:46:32    11216 (0x2BD0)
    In SSL, but with no client cert    SMSPXE    26/10/2016 21:46:32    11216 (0x2BD0)
    Client boot action reply: <ClientIDReply><Identification Unknown="0" ItemKey="2046820353" ServerName=""><Machine><ClientID>d7d4abf5-cca4-4e79-a525-be793e94ecc3</ClientID><NetbiosName/></Machine></Identification><PXEBootAction LastPXEAdvertisementID="" LastPXEAdvertisementTime="" OfferID="S12200AC" OfferIDTime="09/09/2016 11:37:00" PkgID="S12000E1" PackageVersion="" PackagePath="http://SCCM1.AD.LOCAL/SMS_DP_SMSPKG$/S1200055" BootImageID="S1200055" Mandatory="0"/></ClientIDReply>
        SMSPXE    26/10/2016 21:46:32    11216 (0x2BD0)
    00:15:5D:0A:41:39, 095A957F-A189-45D8-9FCE-80964337902E: found optional advertisement S12200AC    SMSPXE    26/10/2016 21:46:32    11216 (0x2BD0)
    Looking for bootImage S1200055    SMSPXE    26/10/2016 21:46:32    11216 (0x2BD0)

    Boot image is found, I press F12 and I can see boot.sdi load first followed by the S1200055.wim boot image.

    From an unsuccessful attempt.  Note it's the same device, same MAC address, the only thing I could see is that it's using architecture 0 (which should be standard x86 BIOS) -  architecture 9 is I believe X64/X86 UEFI, although it's a standard BIOS machine in this case.  It writes the same entries below several times with reply attempts and exits with PXE-E53 error stating no boot filename received.

    00:15:5D:0A:41:39, 095A957F-A189-45D8-9FCE-80964337902E: found optional advertisement S12200AC    SMSPXE    26/10/2016 21:55:24    11216 (0x2BD0)
    Getting boot action for unknown machine: item key: 2046820352    SMSPXE    26/10/2016 21:55:24    11216 (0x2BD0)
    Prioritizing local MP https://SCCM1.AD.LOCAL.    SMSPXE    26/10/2016 21:55:24    11216 (0x2BD0)
    In SSL, but with no client cert    SMSPXE    26/10/2016 21:55:24    11216 (0x2BD0)
    Request using architecture 0.    SMSPXE    26/10/2016 21:55:24    11216 (0x2BD0)
    In SSL, but with no client cert    SMSPXE    26/10/2016 21:55:24    11216 (0x2BD0)
    Client boot action reply: <ClientIDReply><Identification Unknown="0" ItemKey="2046820352" ServerName=""><Machine><ClientID/><NetbiosName/></Machine></Identification><PXEBootAction LastPXEAdvertisementID="" LastPXEAdvertisementTime="" OfferID="" OfferIDTime="" PkgID="" PackageVersion="" PackagePath="" BootImageID="" Mandatory=""/></ClientIDReply>
        SMSPXE    26/10/2016 21:55:25    11216 (0x2BD0)
    Request retry.    SMSPXE    26/10/2016 21:55:25    11216 (0x2BD0)
    In SSL, but with no client cert    SMSPXE    26/10/2016 21:55:25    11216 (0x2BD0)
    Client boot action reply: <ClientIDReply><Identification Unknown="0" ItemKey="2046820352" ServerName=""><Machine><ClientID>cb5a4ee1-5dc2-40b4-97ea-1a3628523267</ClientID><NetbiosName/></Machine></Identification><PXEBootAction LastPXEAdvertisementID="" LastPXEAdvertisementTime="" OfferID="S12200AC" OfferIDTime="09/09/2016 11:37:00" PkgID="S12000E1" PackageVersion="" PackagePath="http://SCCM1.AD.LOCAL/SMS_DP_SMSPKG$/S1200055" BootImageID="S1200055" Mandatory="0"/></ClientIDReply>
        SMSPXE    26/10/2016 21:55:25    11216 (0x2BD0)

    So it possibly could be that the DHCP client request is fetching the incorrect architecture, doesn't find the correct one eventually, and therefore not booting, and on occasions when it finds the right one, it then boots?  Very strange behaviour really.

    I should also add that eventually with a failed attempt the SMSPXE.log shows this at least 10-15 times:

    00:15:5D:0A:41:39, 095A957F-A189-45D8-9FCE-80964337902E: Not serviced.

    • Edited by Zawtowers Wednesday, October 26, 2016 9:22 PM
    Wednesday, October 26, 2016 9:15 PM
  • Can you please enable verbose logging on the PXE server, re-start WDS and then re-do those two pass/fail tests?

    I have a suspicion that you have a duplicate MAC address problem, but I need see the verbose logs to confirm that.

    Wednesday, October 26, 2016 9:53 PM
  • I'll try that first thing tomorrow.  Although I did mention it is the same device I attempted to boot from, hence the same MAC address.

    Attempting to boot from other devices brings back their correct MAC addresses, but with the same errors in the log as an unsuccessful one above.

    Wednesday, October 26, 2016 9:57 PM
  • Yes, but it looks like you have other machines listed in the database that have the same MAC address or the same SMBIOS ID.

    Again, I'm just guessing right now based on incomplete clues from the short log file that you posted. A verbose log would give us more clues.

    You can also search your database to confirm that. The System MAC Address Array table lists the MAC addresses reported by the client. Every MAC Address there should be related to only a single Item Key. The System Auxiliary Information table lists the SMBIOS GUID reported by the clients. Every SMBIOS GUID in that table should be related to only a single Item Key.


    Wednesday, October 26, 2016 10:10 PM
  • A search via the report to find by MAC address showed no results in terms of computer name when I performed it (which I'd expect as it should be an unknown for deployment.)

    Using another device this morning shows the following event in SMSPXE.log, numerous times:

    A0:D3:C1:9D:1F:F5, CAABC3A8-7CB7-11E3-9D10-67BB030F3053: Not serviced.

    The PXE boot attempts to get an IP address but doesn't appear to go anywhere. Verbose logging is now on, but that is all that seems to appear.  The fact it does report back to SMSPXE.log should say it's communicating with the SCCM distribution point.

    Thursday, October 27, 2016 7:52 AM
  • So, some progress on this.

    Turned out that there may have been a possible error when initially ticking the box that was marked to allow response to PXE requests.

    I unticked the box and allowed this to wait for a few minutes, and noticed in SMSPXE.log that it wasn't responding to requests (good).  I then ticked the box, and PXE appears to be working - but only it seems for one IP range, not another.

    So: 10.2.xxx.xxx - working, and 10.1.xxx.xxx - not working.

    I will check to see why this may be in terms of the network connectivity - the SCCM server and DHCP server are within the 10.1.xxx.xxx range....

    Thursday, October 27, 2016 8:27 AM
  • Hi,

        "I will check to see why this may be in terms of the network connectivity - the SCCM server and DHCP server are within the 10.1.xxx.xxx range...."

        Did you mean that the clients on the IP range 10.1.xxx.xxx are in the same subnet/vlan with your DHCP & PXE server? If you move a machine from 10.1.xxx.xxx to 10.2.xxx.xxx will it start to work? If so, I think you will have to check if there're any network related issues which may lead to this error.

    Best regards,

    Jimmy


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.


    Thursday, October 27, 2016 1:53 PM
  • Yes, exactly as you said.  So if a client is on the 10.1.xxx.xxx subnet/vlan (same as DHCP and PXE server) it shows a few dots then aborts out with PXE-E53.

    The same device connected to the 10.2.xxx.xxx subnet/vlan works - first time, every time, whether the BIOS is set to UEFI or legacy as well, so the IP helper etc is clearly doing what it's supposed to do.  Those clients weren't PXEing until I unticked / reticked the "respond to PXE requests" as I mentioned.

    I suspect there may be some network activity on the other vLAN - I'm hoping to resolve it with them.  It seems that 1 out of every 5 or 6 attempts work but the rest fail.

    In the meantime, at least I can hook up a device to the other vLAN or use USB boot media.

    Thursday, October 27, 2016 4:15 PM

  • I assume that DHCP and SCCM DP aren't on the same server? According to this post http://blog.coretech.dk/rja/dhcp-guide/ you should configure the IP helpers on the routers to point both, the DHCP server and the PXE enabled SCCM DP.
    Thursday, October 27, 2016 4:48 PM
  • Can you please post the new SMSPXE log file?

    Thursday, October 27, 2016 8:12 PM
  • DHCP and the SCCM DP are on different servers, but within the same subnet and vLAN.

    The IP helpers are configured as one vLAN range connects - every time, but not the other.

    I enabled the verbose logging for Deployment-Services-Diagnostics, and noticeably it appears to be issuing out 169.254.xxx addresses for some machines, assumedly those on the vLAN I'm having an issue with:

    [WDSServer] [WDSTFTP][UDP][Ep=10.1.10.180:64076][0x000000A9790D8470] Created  - this is the SCCM DP

    [WDSServer] [WDSTFTP][UDP][Ep=169.254.95.120:64076][0x000000A9790D8E00] Created - this is the client

    Not getting the correct IP is obviously not good. 

    Friday, October 28, 2016 11:05 AM
  • Good news,

    We've managed to resolve the issue, and this may help other users as well.

    It transpired that the DHCP settings for the vLAN (same subnet / vLAN as clients and SCCM server was on) was responding too quickly to requests, therefore not able to acknowledge what boot image etc it needed.

    So,  in DHCP, right click the Scope that has the same subnet / vLAN, and select properties.

    In the Advanced tab, set the DHCP delay to an appropriate number (we used 100 milliseconds)

    This way, the DHCP request is slowed down enough to be able to get a suitable PXE response back, meaning clients can then communicate accordingly.

    With the other subnets and vLANs, because of the fact they were all using correctly configured IP helpers, they had to relay to that helper, hence the DHCP delay would already be long enough anyway, hence the fact that it worked for those without issues (and still does)


    • Marked as answer by Zawtowers Monday, April 10, 2017 12:56 PM
    • Edited by Zawtowers Monday, April 10, 2017 12:57 PM
    Monday, April 10, 2017 12:31 PM