none
Exchange 2007, port 25 randomly unresponsive and Mail Transport svc cannot be restarted RRS feed

  • Question

  • Hello,

    To make it short, on an Exchange Server 2007 up to date, in an AD 2003 level, fully functional (A.D, DNS, PKI, etc.) we have noticed since mid-july that "randomly", port 25 doesn't respond anymore.

    The server (all roles ) has enough disk space available (more than 20 GB fo the system partition as well as the partiton where it's been installed), proc and memory levels can be deemed "normal",no peak before, during and after the problem ocurs.

    First symptoms : a Pull pop 3 software we've been using flawlessly for 10 years (Pytheas Mailgate) is displaying this message : No connection could be made because the target machine actively refused it.

    A telnet on port 25 from a machine in the LAN but also from the Windows 2008 Server itself (therefore it's not a routing , ISP, or firewall issue) is refused : unable to communciate with the host on port 25.

    A Telnet on another port, for instance 587, is working.

    A netstat indicates that port 25 is listening.

    All usual Exchange services are started none is stopped.

    Access from Outlook, OWA is fully operationnal except, of course, email sending and reception.

    The pull POP3 sofwtare (Pytheas Mailgate) is working properly on port 110 since it can download any available.

    Analysis by the editor ruled out the software for this issue.

    Logs activated in verbose mode (transport, protocols, etc.) don't provide any relevant information, it's sort of like everything stopped recording information around the time the probleme occured !

    Events manager doesn't show up many events : one message related to the Mail Attendant mailbox but following explanation provided on many forum, through ADSI edit proved that the CN name is not empty and information is the same on both locations.

    Another message related to ID event 9327 and OALGen turns up just once.

    An analysis with Microsoft malware package didn't find anything (even though this is not panacea, and few anti-malware accept to install on servers).

    By the way, no AV is installed on this server.

    I've uninstalled anti-spam agents and GFI Mail Essentials (old version) beause some posts were pointing in this direction, but to no avail...

    I simply noticed that the problem started around mid-july, could it be an update issue, a .Net framework one as mentioned once in the many posts I've came across, I'm not really convinced !

    Another thing that matters : when I try to restart the Mail Transport service, it's failing with a message 1053 if I remember, and the only solution left is restarting the server.

    Up to now, this is the only way to recover a fully functional server... :-(

    Now I'm left with no other clue and idea for a solution !

    If anyone had an idea I would really appreciate as the server is now holding only 12 hours maximum, it can be less, 6, 8 or 10 hours before I have to reboot it !

    Kind regards

    Philippe



    Wednesday, July 25, 2018 7:43 AM

Answers

  • Hi Phil49000,

    For Windows Server 2008~2016, after installing July 10th update, a transport issue will occur and it is fixed now. However, Windows server 2003 is too old, I am not sure if it will be affected by this patch.

    Regards,

    Kyle Xu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.

    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Thanks! We've noticed the same behaviour on Server 2008 R2 with Exchange 2010 deployed. Your answer helped us find the root cause.

    • Marked as answer by Phil49000 Wednesday, August 1, 2018 9:06 AM
    Wednesday, August 1, 2018 8:40 AM

All replies

  • Bonjour,

    Pour faire simple, un serveur Exchange 2007 à jour, dans un domaine au niveau fonctionnel AD 2003 parfaitement fonctionnel (AD, DNS, PKI, etc), nous avons remarqué depuis la mi-juillet un arrêt aléatoire du traitement des données sur le port 25.

    Le serveur (il possède tous les rôles) dispose d'espace disponible de plus de 20 Go sur les partitions système et celle où a été déportée l'installation d'Exchange, les ressources proc sont normales, pas de pics avant, pendant et après l'incident, les ressources mémoire sont également suffisantes avant, pendant et après.

    Premiers symptômes : un logiciel Pull POP3 qui fonctionne sans défaillance depuis 10 ans, Pytheas Maillgate, ne peut plus se connecter au port 25.

    Message apparaissant  dans le logiciel : No connection could be made because the target machine actively refused it.

    Parallèlement, on constate qu'un Telnet sur le port 25 depuis le LAN mais également depuis le serveur lui-même (il ne s'agit donc pas d'un problème de routage ou d'ISP ou de firewall) est repoussé : "impossible d'ouvrir une connexion à l'hôte sur le port 25".

    Un Telnet sur un autre port, le 587 par exemple, fonctionne.

    Un netstat permet de voir que le port 25 est en écoute.

    Tous les services sont correctement démarrés.

    Les accès depuis OWA sont fonctionnels (hormis l'envoi et la réception des emails bien sûr).

    Le logiciel Pull POP3 Pythéas récupère les messages sans soucis sur le POP.

    les analyses par l’éditeur écartent tout dysfonctionnement à son niveau.

    Les logs activés sur la partie protocole, transport, etc en mode verbeux ne donnent rien, pas d'erreur exploitable c'est comme si tout s'arrêtait au moment où la coupure intervient !

    L'observateur des événements indique quelques rares événements notamment lié à la BAL de Mail Attendant (j'ai suivi al procédure, tout est ok dans l'AD), Event ID 9327 OALGEN, mais il n'apparaissent qu'une seule fois.

    Un redémarrage du service Transport Exchange met beaucoup de temps et finit par un time-out et un message selon lequel il n'a pas pu être redémarré, il me faut dès lors redémarrer le serveur lui-même et tout rentre dans l'ordre pour 5 ou 6 ou 8 ou 10 heures, le plus long n'étant hélas aujourd'hui plus que de 12 heures en moyenne...

    Une analyse avec les outils d'analyse de Microsoft n'ont rien donné, la piste malware peut-être plus ou moins écartée (ce n'est pas la panacée non plus mais pas beaucoup de softs tournent sur Windows Server, 2008 pour le cas présent)

    J'ai même désactivé GFI Mail Essentials pour voir si cela pouvait venir de cela, même chose pour l'agent anti-spam de Microsoft, mais sans aucun effet...

    J'ai constaté que le problème est plus ou moins apparu avec une série d'updates aux environs du 12 juillet, je suis tombé sur un post dans les innombrables forums que j'ai écumés concernant une MAJ .Net Framework, mais entre temps il y a eu deux autres vagues d'updates qui n'ont rien changé non plus...

    Je n'ai plus rien en réserve pour nous sortir de ce pétrin, si quelqu’un avait une piste je lui serais infiniment reconnaissant !

    Philippe

    Wednesday, July 25, 2018 6:59 AM
  • Same here
    Wednesday, July 25, 2018 6:41 PM
  • Hi Phil49000,

    I want to confirm with you: Did you do a update for your Windows server or Exchange recently?

    If you did a update recently, please try to uninstall it.

    Meanwhile, as Exchange 2007 has ended its support lifecycle, we recommend you to migrate all users to Exchange 2013 as soon as possible. 

    Regards,

    Kyle Xu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.

    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Thursday, July 26, 2018 8:34 AM
  • Hi Kyle Xu,

    Of course, ideally it would be nice to migrate our organization to Exchange 2013, but we'll have to wait a little bit for that but this is scheduled.

    In parallel to this post, I've decided to look for a solution with a new angle of attack.

    As I've noticed that Mail Transport fails to restart when I manually try to do it and generate an error code (1053), I've decided to search forums for this code.

    Most of the answers suggested to uninstall and then reinstall the Hub Transport role, which I did this morning.

    I'm now waiting for the dreaded 12 hours horizon to find out if this operation is successful...

    Hope it will because I don't have any other solution at the moment !

    Regards

    Philippe

    Thursday, July 26, 2018 9:19 AM
  • Hello all,

    I'm really desperate, less than 6 hours after uninstalling and reinstalling the Hub Transport, the problem is back !

    I've no idea what I can do now ! :-(

    I still can't stop/restart Microsoft exchange Transport svc, still the same error 1053, Windows couldn't stop the svc !

    Events manager is so clean that I no longer have anything left to start with! :-(

    Philippe

    Thursday, July 26, 2018 1:58 PM
  • Hello again

    I give it a try uninstalling Microsoft .Net Framework 3.5 SP1, my last resort... :-(

    Philippe

    Thursday, July 26, 2018 2:10 PM
  • Hello,

    Alas, I still have to reboot the server every 12 hours or so when I'm lucky and barely 6 hours when I'm not !

    I had spotted a few updates around 12th of July and tried to uninstall them, but as they're security updates, they keep being reinstalled automatically by the system, the system rebooting itself autmatically, even when Windosws update svc is disabled...

    They're updates : KB4293756, 4291391, 4338422, 4340583, 4339854, 4339503, 4339291, 4339093 and 4295656.

    Don't know if they're involved in the issue however anyway...

    Up to now, what I can say is that blocked port 25 is not the cause of the problem, but probably the consequence of another origin : everything seems to be focused on the Microsoft Exchange transport service that cannot be restarted when the problem occurs : again, is that the cause or the conséquence ?

    When I'm trying to restart it, I keep getting after a very long time an error 1053 stating that the system couldn't restart the service fast enough.

    I also noticed lately, but never paid attention to that before, that I have very numerous information events in the application section of the events manager, every minute precisely,  with code 4016 : MSExchange System Attendant Mailbox.

    The message says only : Connection to the Mailbox of Mail attendant mailbox

    Apart from that, as I said before, the events manager is so clean that i no longer have any failure or alert !

    Thanks very much for any clue as i'm now desperate !

    Philippe


    • Edited by Phil49000 Saturday, July 28, 2018 11:59 AM
    Saturday, July 28, 2018 11:58 AM
  • Hello,

    As I can't do anything else, I try to analyze any event available in the manager when the problem occurs.

    I noticed just before the issue a serie of 4 events, all the same.

    In the application section : event 16022, MSExchangeTransport (sorry this is a translation from the French operating system) :

    The description of ID event  16022 in the source MSExchangeTransport cannot be found.
    The component that triggered this event isn't installed on the lcoal computer or the installation is damaged.
    You can either install or repair the component on the local computer.

    If the event comes from another computer, the dispaly information must be recorded with the event.
    The following information was included with the eevnt :
    The message ressource is present but the message cannot be found in the chains or messages table.

    This is pure gobbledygook to me, if anyone could translate in human language I would appreciate !

    Could it be a clue for my problem !?

    Please note that it's always preceded and followed by a list of the same event (every minute) : 4016 MSExchange System Attendant Mailbox :

    Connection to the reception mailbox of the System attendant mailbox.

    Regards,

    Philippe



    • Edited by Phil49000 Saturday, July 28, 2018 3:27 PM
    Saturday, July 28, 2018 3:24 PM
  • Hello

    I have the same problem with exchage 2010
    The solution that I applied to get out of the jam ........
    Add a new reception connector through port 26 and in the firewall
    Redirect all traffic from port 25 to 26 of the exchange server.

    It is only a temporary solution until you find the problem



    • Edited by Gperez2804 Sunday, July 29, 2018 5:10 PM
    Sunday, July 29, 2018 5:03 PM
  • Hello,

    I noticed that the maximum up time is 12 hours, I don't really know why, there is certainly an explanation...

    I've just tried another suggestion (nothing to lose now) that consists in renaming the queue folder after stopping the Transport svc and then restarring it once it's done since it's recreated upon restarting.

    Don't know if it'll work but it was worth giving it a try...

    I know this is a solution that is applied for queue issues in general, mails stuck or something else going wrong with the queue.

    If it fails again, I'll try your solution as a last resort !

    Did you delete or simply disabled the previous port connector ?

    How long have you been using this soution and have you noticed any side effect ?

    Does OWA work properly and any other function expected from Exchange ?

    Is there only the port forwarding to change  ?

    Thanks so much anyway for you answer, as this is the first in a world where there're so many would be gurus and nobody ever heard or shares his knowledge because I'm pretty sure we're not alone in the world to experience such an issue.

    kind regards,

    Philippe

    Monday, July 30, 2018 7:14 AM
  • Hi Phil49000,

    For Windows Server 2008~2016, after installing July 10th update, a transport issue will occur and it is fixed now. However, Windows server 2003 is too old, I am not sure if it will be affected by this patch.

    Regards,

    Kyle Xu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.

    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    • Proposed as answer by John Seerden Wednesday, August 1, 2018 8:36 AM
    Monday, July 30, 2018 9:40 AM
  • Hello Kyle

    Well, as a matter of fact no, as I said in my first email, the AD level of the forest is Windows 2003, but the Exchange Sever is a Windows 2008 64 bits !

    So, I've just installed the update and I keep my fingers crossed for the next 12 hours from now on !

    You can be sure that I'll let you know the result ASAP !

    This is my first glimmer of hope...

    Kind regards,

    Philippe


    • Edited by Phil49000 Monday, July 30, 2018 9:58 AM
    Monday, July 30, 2018 9:57 AM
  • Hi.

    Do not erase anything and do not disable the default connector, simply add the new connector, activate the firewall rule and restart the server.
    I have not noticed the side effects.

    My problem occurred approximately every 6 hours, now it has been working correctly for more than 24 hours.

    Best regards.

    Gonzalo.



    • Edited by Gperez2804 Monday, July 30, 2018 12:57 PM
    Monday, July 30, 2018 12:52 PM
  • Hello Gonzalo,

    As you may have noticed in my previous email, I'm giving a try to Kyle suggestion for the patch.

    If it fails, then I'll implement your solution and do as explained.

    Thanks again for your precious feedback.

    I'll let you know which one succeeds.

    Regards

    Philippe

    Monday, July 30, 2018 1:41 PM
  • Hello all,

    I dare not believe that it... works !

    It's been about 16 hours and I'm over the 12 hours I couldn't go over for a a week !

    I'm waiting the 24 hours limit and then a day, a week, and a month to bretahe a sigh of relief !

    I'm so happy but yet so cautious after so many things attempted !

    I'll let you know for each limit if it's ok.

    Kind regards,

    Philippe

    Tuesday, July 31, 2018 6:07 PM
  • Hello all,

    This morning I can see that my server is up for... 28 hours without any reboot !

    The first time in almost three weeks !

    I can even dream that i will be able to go on holyday this month without figuring out how to constantly get a remote access in order to reboot or control the server, even though scheduled tasks could have help me for this task...

    I guess this should now be ok and most certainly this is the proof that update KB4295656 as far as we're concerned, was responsable for the port 25 randomly unresponsive !

    Thank you Kyle for this firsthand information that would have been corrected later in all likelihood, but would have been a torture for still many days trying to guess out what could be the origin of such a strange and metronomic symptom !

    Kind regards

    Philippe

    Wednesday, August 1, 2018 5:42 AM
  • Hi Phil49000,

    I am very happy to see that your problem has been solved, please be free to mark it as an answer for helping more people.

    Regards,

    Kyle Xu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.

    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Wednesday, August 1, 2018 6:44 AM
  • Hi Phil49000,

    For Windows Server 2008~2016, after installing July 10th update, a transport issue will occur and it is fixed now. However, Windows server 2003 is too old, I am not sure if it will be affected by this patch.

    Regards,

    Kyle Xu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.

    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Thanks! We've noticed the same behaviour on Server 2008 R2 with Exchange 2010 deployed. Your answer helped us find the root cause.

    • Marked as answer by Phil49000 Wednesday, August 1, 2018 9:06 AM
    Wednesday, August 1, 2018 8:40 AM