none
Ephemeral port exhaustion - Event ID 4231

    Question

  • Hi All

    A couple of my servers have started getting Event ID 4231

    “A request to allocate an ephemeral port number from the global TCP port space has failed due to all such ports being in use.”

    Server 1: Physical Dell Poweredge R720 used as a Veeam repository. When event 4231 occurs Veeam backups stop working. Sometimes I can’t RDP to the server. 

    Sever 2: Physical Dell R210 ii used a remote branch DC/DNS/DHCP/Printer server. When event 4231 occurs DC does weird things such as printer server stops working.

    Both Servers are 2012R2 and its started happening over the past few months

    In both cases a swift reboot resolves the issue but it seems to be happening every 2 days or. So far (before I realized it was to do with port exhaustion) I have tried re-joining server to domain, trying a different NIC, updating drivers and firmware for the NIC and then recently reg keys from this server fault forum post. Nothing seems to have worked.

    I have run: netstat –anob on both servers and uploaded the results to paste bin - Server1Server2

    Is there anything obvious you can see from the logs? For these commands to be effective should it be run right at the point of the event 4231 happening? I ran these commands a couple of hours after the event but with the server still in an error state/not rebooted.

    Any help would be massively appreciated. Starting to pull my hair out!

    Tuesday, May 16, 2017 4:49 PM

Answers

  • Yes it will fix the issue....  When the server try to reconnect it will use ephemeral ports to try to reconnect (multiple time) until it use all portsl.  The issue was discovered in our lab and we are holding the patch for production until Microsoft release a fix.
    • Marked as answer by Tee-Eff Thursday, May 18, 2017 1:40 PM
    Thursday, May 18, 2017 1:28 PM

All replies

  • Hi Tee-Eff,

    >> Sometimes I can’t RDP to the server. 

    RDP port is 3389.From your posting log, we can see the port(3389) is being used by many processes with the same PID 3428. Such as VeeamNFSSvc.exe.

    You could verity the process owner by researching PID 3428 in your machine.

    >> When event 4231 occurs Veeam backups stop working.

    Which ports are being used by Veeam backups?

    You could observe these ports in the log to verity which processes are using it.

    Best Regardds,

    Candy


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Wednesday, May 17, 2017 6:31 AM
    Moderator
  • Probably caused with the combination of April patch and one ISCSI path down.  Look like it try to reconnect the LUN until all port are exhausted.  This is clearly a bug.
    Thursday, May 18, 2017 12:59 AM
  • Hi Emmanuel

    I think you might have smashed it! Those 2 servers had ISCSI paths that were indeed down/trying to connect to a redundant target. I have removed these now. Would you suggest this fixes it? Can you point me in the direction of April patch?

    Thanks

    Thursday, May 18, 2017 9:11 AM
  • Yes it will fix the issue....  When the server try to reconnect it will use ephemeral ports to try to reconnect (multiple time) until it use all portsl.  The issue was discovered in our lab and we are holding the patch for production until Microsoft release a fix.
    • Marked as answer by Tee-Eff Thursday, May 18, 2017 1:40 PM
    Thursday, May 18, 2017 1:28 PM
  • We have the same problem, 2012R2 file server cluster with the latest patches.

    It also has iSCSI, we will disable it and see if it helps.

    Thursday, May 18, 2017 1:29 PM
  • Awesome. Thank you so much for sharing that. Massively appreciated! 
    Thursday, May 18, 2017 1:40 PM
  • Is there any solution to work with iSCSI? I have backup on iSCSI from QNAP.

    I have one server which has Firebird database and if iSCSI is connected then clients are disconnected every 15minutes. When I disconnect iSCSI from W2012R2 then everything working fine. I tried put modification to registry from this site http://deploymentresearch.com/Research/Post/532/Fix-for-Windows-10-exhausted-pool-of-TCP-IP-ports  but nothing change. Still I have a problem.

    I didn't try on other servers which I have the same problem. I have three other servers which have same problem.

    Tuesday, May 23, 2017 6:44 AM
  • What April patch/update are you referring to, do you have the KB# ?

    Thanks

    Tuesday, May 23, 2017 1:23 PM
  • I don't know, because last update was May 2017 and previous December. I don't see April update on list, maybe included in May patch.
    Wednesday, May 24, 2017 2:08 PM
  • This is still an issue, when will MS release the fix for this?

    Shutting down the ISCSI service didnt help.

    Wednesday, May 31, 2017 11:07 AM
  • We also have this issue and require the ISCSI service.  Can you please give us an update on patch release?  Thanks!
    Friday, June 2, 2017 11:27 AM
  • Hi.

    We have similar problems.
    The cluster on Windows Server 2012R2 is being destroyed.
    Updated completely.
    Errors like 4227 and 4231, after which the cluster stops working.
    How can I get a fix?


    Thursday, June 22, 2017 9:29 AM
  • I'm having the exact same issue with a cluster. Every day or two everything completely locks up and I have to reboot the whole thing, with the VMs unable to migrate to other hosts in the process. It's extremely disruptive.

    Tomorrow I'll try manually removing the updates that last installed to see if it clears the issue. 

    Thursday, June 29, 2017 1:35 AM
  • I saw that problem exist if any iscsi has status reconnecting. Then after couple of hours server not responding.

    If all iscsi have status connected then everything working fine.

    Thursday, June 29, 2017 7:49 AM
  • I have been fighting this issue on a 7 node Server 2016 Hyper-V cluster all week.  Exactly what everyone is describing here.

    After spending the $500 to open an incident with Microsoft and getting nowhere for 3 days, I pointed the Microsoft Tech towards event id 4231 and he said it is definitely the issue.

    He told me it is global to server 2012 R2 and Server 2016 clusters.  They are working on it.  The fix will not be available until the July update preview which he said we will not see until the third week of July.

    The work around is kind of ugly.  He told me you need to remove all the cumulative updates and rollups on the servers dating back to March.

    In my case, I got lucky.  We upgraded our cluster to server 2016 a few weeks ago so we only have KB4022715 to do.  Sadly, when we removed it from one of the nodes, it now blue screens on boot and we have yet to solve that issue.  We might have to resort to a server reload.  It might take all weekend to get our cluster stable.

    Hopefully some of you will find this information useful.




    Friday, June 30, 2017 7:30 PM
  • Thanks Dave. This really helps. We are having the same issue and currently trying to resolve it. I hope Microsoft can release the patch soon!
    Wednesday, July 19, 2017 4:15 AM
  • We have a 4 node cluster on 2012 R2 here. Fortunately this issue only seems to have hit one server. For what it's worth, if I run netstat -aqo it shows me thousands of ephemeral ports with status "BOUND" for the system process. However the iSCSI connections all have status "connected" so maybe not exactly the same issue. Looking forward to testing the patch, should be released soon I guess.
    Friday, July 21, 2017 12:13 PM
  • I am experiencing the same identical issue in a 3 node 2012 R2 hyper-v cluster.  Rebooting the server solves the problem, but it is not the solution.  I have the cluster connecting to a shared Oracle SAN, but do not have an iscsi connection in the "reconnecting..." state.  

    It is July 21st today, looking for the patch.  I've put a hold on this hyper-v deployment for this customer because of this problem.

    Friday, July 21, 2017 3:17 PM
  • Looks like the patch is out but in preview. Has anyone tested it yet?

    https://support.microsoft.com/en-us/help/4025335/windows-8-1-windows-server-2012-r2-update-kb4025335

    There might be some problems:

    https://social.technet.microsoft.com/Forums/office/en-US/9c8e637e-d42a-479e-a703-110986281ee9/kb4025335-kills-certificate-based-computer-authentication?forum=winserverNAP

    Wednesday, July 26, 2017 6:15 AM
  • Hi RuoT,

    Not only in the preview Update... At least for Windows Server 2016:
    https://support.microsoft.com/en-us/help/4025334/windows-10-update-kb4025334

    Looks like only for Windows Server 2012 R2 it is still in the "Preview" Status...

    Cheers


    Wednesday, July 26, 2017 6:24 AM
  • I'd like to chime in on how I solved the problem, without a patch, after looking into this in detail.  I did not find any iSCSI connections in the 'reconnecting....' status, however, the Microsoft iSCSI initiator program keeps past connections that have connected in the favorites tab.  Upon doing a "netstat -aqo" command, I found that the OS was still trying to connect to a previous iSCSI connection even though it is not part of the discovery portal configuration anymore.  Apparently, it was kept in the "Favorites" tab of the iSCSI initiator program.  iSCSI actually made multiple connections to the same SAN just because they were in the Favorites tab.  I removed all the irrelevant connections, after that, the problem was solved.  I have not installed the July Preview to see if it does fix the problem though.
    Wednesday, July 26, 2017 2:16 PM
  • I found more info after trolling on spiceworks, which identifies the actual updates that cause the iSCSI issue.

    https://support.microsoft.com/en-us/help/4019215/windows-8-update-kb4019215

    This issue is caused by a locking issue on Windows Server 2012 R2 and Windows Server 2016 RS1 computers, causing connectivity issues to the iSCSI targets. The issue can occur after installing any of the following updates:

    Windows Server 2012 R2

    Release date

    KB

    Article title

    May 16, 2017

    KB 4015553

    April 18, 2017—KB4015553 (Preview of Monthly Rollup)

    May 9, 2017

    KB 4019215

    May 9, 2017—KB4019215 (Monthly Rollup)

    May 9, 2017

    KB 4019213

    May 9, 2017—KB4019213 (Security-only update)

    April 18, 2017

    KB 4015553

    April 18, 2017—KB4015553 (Preview of Monthly Rollup)

    April 11, 2017

    KB 4015550

    April 11, 2017—KB4015550 (Monthly Rollup)

    April 11, 2017

    KB 4015547

    April 11, 2017—KB4015547 (Security-only update)

    March 21, 2017

    KB 4012219

    March 2017 Preview of Monthly Quality Rollup for Windows 8.1 and Windows Server 2012 R2

    Windows Server 2016 RTM (RS1) 

    Release date

    KB

    Article title

    May 16, 2017

    KB 4023680

    May 26, 2017—KB4023680 (OS Build 14393.1230)

    May 9, 2017

    KB 4019472

    May 9, 2017—KB4019472 (OS Build 14393.1198)

    April 11, 2017

    KB 4015217

    April 11, 2017—KB4015217 (OS Build 14393.1066 and 14393.1083)

    Friday, July 28, 2017 5:46 PM
  • You guys are a godsend! 

    3 MONTHS trying to diagnose the issue of our File Server, Printer Server, SQL Server, IIS Server and one of our DCs falling over every few days and stop broadcasting respective services.

    Logs indeed showed this Event ID and iSCSI Initiator showed a 'Reconnecting...' status to a backup array that was decommissioned. When it uses all Ephemeral ports the server will stop allowing new connections, correct?

    Corrected just now, and hope that it's much more stable moving into the future. I rebuilt all the DCs to try and fix this issue thinking it was a 2012 R2 -> 2016 upgrade thing.

    Tuesday, August 8, 2017 12:17 AM
  • For future readers of this thread. The fix appears to be in the August 2017 and later Non-Security Rollups.

    https://support.microsoft.com/en-us/help/4034681



    Monday, September 18, 2017 4:07 PM
  • Monday, September 18, 2017 4:12 PM
  • What is the KB article number, so I can check if we have installed it. This issue started 24hrs ago for me

    thanks

    Friday, June 1, 2018 11:55 AM