locked
SFB on-prem with UM in O365: calls to Autoattendant fail intermittently RRS feed

  • Question

  • Have a system with a SIP trunk to an SBC that then sends to 3 FE servers (Co-located Mediation) in round robin flow. The EV is using O365 UM with an Autoattendant in O365.  Balance of deployment is Kemp HLB for FE's, HLB on both sides of Edge pool (2 servers).  Kemps are all HA.

    We are experiencing issues with external calls (i.e. cell, home phone= PSTN) dropping 6-20 seconds after first ring (usually 10 seconds).  Dead air until call drops.  This is intermittent and can be every other call or every 20th call and then sometimes a few in a row.  All other calls are successful and will connect 2-6 seconds after the initial ring (still has dead air during this time).

    I have done CLS logging and what I see is:

    ms-diagnostics: 22;source="FE1.dom.local";reason="Call failed to establish due to a media connectivity failure when both endpoints are internal";component="MediationServer";Exception="Proxy side ICE connectivity check failed.";ICEWarningFlags="BaseAddress="10.X.X.20:53916";LocalSite="10.X.X.76:61163";RemoteSite="10.27.46.15:11361";MediaEpBlob="ICEWarn=0x44003a0,ICEWarnEx=0x0,LocalMR=72.XXX.XX.64:58789,RemoteMR=207.46.5.102:51199,PortRange=49152:57500,LocalMRTCPPort=57332,RemoteMRTCPPort=51199,LocalLocation=2,RemoteLocation=2,FederationType=0,StunVer=0,CsntRqOut=0,CsntRqIn=0,CsntRspOut=0,CsntRspIn=0,Interfaces=0x2,BaseInterface=0x2,IceRole=1,RtpRtcpMux=0,AllocationTimeInMs=209,FirstHopRTTInMs=1,TransportBytesSent=15931,TransportPktsSent=467,IceConnCheckStatus=5,PrelimConnChecksSucceeded=0,IceInitTS=3694544818521,ContactSrvMs=206,AllocFinMs=259,FinalAnsRcvMs=2023,ReinviteSntMs=12465,BlobGenTime=3694544830986,MediaDllVersion=6.0.8953.234,BlobVer=1""
    ms-diagnostics-public: 22;reason="Call failed to establish due to a media connectivity failure when both endpoints are internal";component="MediationServer";Exception="Proxy side ICE connectivity check failed."

    I've tried finding something that seems relevant to that error but not finding which internal peers would be having the issue.

    I have taken all but 1 FE out of the SBC routing and still have the issue.

    Any input?

    Monday, January 30, 2017 12:57 AM

Answers

  • I disabled each Edge server one at a time in the HLB and the issue was still present on both situations.

    On a whim, I decided to bounce the HLBs.

    The problem seems to have gone away now!

    Continuing testing to make sure.

    • Marked as answer by BiggJake Wednesday, February 1, 2017 3:02 PM
    Wednesday, February 1, 2017 3:02 PM

All replies

  • Deleted
    Monday, January 30, 2017 3:19 AM
  • In comparing the CLS logs (VoiceMail option) for a successful call the call flow shows an invite to the UM AA after the initial OK and ACK sessions, whereas the failed call shows a BYE however long the call 'delayed' before dropping.

    It could be the Edge server but any thoughts on port, HLB issue, etc?

    Monday, January 30, 2017 3:09 PM
  • One of your edges might have missing internal NIC route to the Mediation server or potentially the PBX. As the internal NIC on the edge has no gateway specified you must have static routes to your server lan / phoneLan.

    Login to both the edges and try to ping the mediation server, the FE and the PBX for good measure. See if any of these fail.

    Monday, January 30, 2017 9:01 PM
  • Thanks.

    The SBC, FE HLB (collocated Mediation), Internal Edge HLB are all on the same subnet and all pings are successful.

    Monday, January 30, 2017 9:19 PM
  • Try to narrow down the issue to see which edge might be the cause. Can you exclude each edge server from the load balancer (or shut them down individually) and then test some calls to see if the issue seems to stick to one of them? 


    Tuesday, January 31, 2017 6:52 PM
  • I disabled each Edge server one at a time in the HLB and the issue was still present on both situations.

    On a whim, I decided to bounce the HLBs.

    The problem seems to have gone away now!

    Continuing testing to make sure.

    • Marked as answer by BiggJake Wednesday, February 1, 2017 3:02 PM
    Wednesday, February 1, 2017 3:02 PM