locked
Header checksum errors RRS feed

  • Question

  • A Wireshark capture on the Unified Message server when the pilot number is called shows a bunch of Header checksum errors:  
    Header checksum: 0x0000 [incorrect, should be 0xb530] for various SIP transmissions as well as RTP transmissions:

    SIP Status: 100 Trying
    TCP sip > 49342 [ACK] Seq=766 Ack=1436 Win=64455 Len=0
    RTP PT=GSM 06.10, SSRC=0xCE240CF7, Seq=1572, Time=2872544416, Mark

    I recently installed Unified Messaging for Exchange 2010.  All was working well, except the next morning when I called the pilot number for voicemail, the call just failed.  I tried again in about 5 minutes and it worked. It's as if a process timed out and needed to restart.

    Are these checksum errors normal and is there something on the Unified Messaging server that might be timing out overnight like an ASP process?

    Thanks...Chris

    • Edited by rosenstc Thursday, March 3, 2011 11:15 PM more info
    Thursday, March 3, 2011 5:20 PM

Answers

  • The problem appears to be that the UM server establishes two connections, one on port 5060 (umservice.exe) and one on 5065 or 5067 (umworkerprocess.exe).  Since both servers (UM and SIP) are on different networks, the firewall will timeout these two established connections after 60 minutes if they remain idle. Neither server does any keep alive traffic, so the connection gets torn down after 60 minutes. Fortunately, there is a feature on the firewall to handle dead connection detection. The firewall will send a SYN ACK to each server to determine if the timer should be reset and keep the connection active. This seems to be working for us.

    The solution is to create a policy map on the firewall that uses the DCD (Dead Connection Detection) feature.  This will prevent the connections from getting torn down by the firewall as long as the hosts respond.
    • Edited by rosenstc Wednesday, March 30, 2011 8:33 PM fix
    • Marked as answer by rosenstc Wednesday, March 30, 2011 8:34 PM
    Wednesday, March 16, 2011 11:29 PM

All replies

  • Chris,

    To further investigate this issue we need some more information:

    • is there anything logged in the event log
    • what are the other SIP messages logged with wireshark

    Regards,

    Johan

     


    Exchange-blog: www.johanveldhuis.nl
    Sunday, March 6, 2011 7:55 PM
    • I'm not seeing anything logged to indicate an issue.

    The VoIP gateway in our case is a 3CX SIP server.  When the voicemail pilot number fails to answer after some amount of time of inactivity, Wireshark shows five or six SIP/SDP [TCP Retransmission] for the Request: Invite from the 3CX SIP Server to the UM server.

    Then if you wait a few minutes and try again, it will work.  I do see that the 3CX Server is responding to the UM server when it does a PBX Ping every 20 seconds or so (SIP OPTION).  The 3CX Server does appear to respond with a 200 OK, although the header checksum in that packet is incorrect.  So, perhaps the UM server is ignoring the 200 OK from the 3CX server as it no longer thinks it is registered?

    Monday, March 7, 2011 11:16 PM
  • I don't believe this is at all related to header checksum errors. 

    When I use Microsoft Network Monitor, I can see the conversation based on the UMWorkerProcess.exe.  When calling the pilot number fails, I don't see the SIP INVITE or any LDAP requests.  When it is working, I see all of that.

    It's as if the UMWorkerProcess.exe stops responding.  If I kill the UMWorkerProcess.exe process in Task Manager, it automatically restarts it and then it will work again.  If I don't kill it, it will work again in a few minutes on its own as I believe it detects that it is not responding and starts a new UMWorkerProcess.exe process.

    Is there any reason why the UMWorkerProcess.exe would stop responding?   Any suggestion on how to prevent this?

    • Edited by rosenstc Friday, March 11, 2011 9:46 PM typo
    Friday, March 11, 2011 9:29 PM
  • The problem appears to be that the UM server establishes two connections, one on port 5060 (umservice.exe) and one on 5065 or 5067 (umworkerprocess.exe).  Since both servers (UM and SIP) are on different networks, the firewall will timeout these two established connections after 60 minutes if they remain idle. Neither server does any keep alive traffic, so the connection gets torn down after 60 minutes. Fortunately, there is a feature on the firewall to handle dead connection detection. The firewall will send a SYN ACK to each server to determine if the timer should be reset and keep the connection active. This seems to be working for us.

    The solution is to create a policy map on the firewall that uses the DCD (Dead Connection Detection) feature.  This will prevent the connections from getting torn down by the firewall as long as the hosts respond.
    • Edited by rosenstc Wednesday, March 30, 2011 8:33 PM fix
    • Marked as answer by rosenstc Wednesday, March 30, 2011 8:34 PM
    Wednesday, March 16, 2011 11:29 PM