none
Outbound audio dropping

    Question

  • This problem has started happening again. When an external user calls in at some point during the call the outbound audio drops. The external user cannot hear anything, while the internal user can still hear them. This happens with both regular calls AND conference calls.

    For the past couple of days i've been thoroughly trying to troubleshoot this. I'll try to summarize what i've found.

    Environment

    Collocated front-end and mediation server. There are two NICs on this server, external and internal. External=11.1.1.11 and has a gateway specified. Internal=10.1.1.82 and has no gateway specified. In the frontend settings in topology builder the Primary IP = 10.1.1.82. The PSTN IP = 11.1.1.11

    1 Edge server with all of the edge roles. This has two NICs, one external and internal. The external one = 11.1.1.4 and has a gateway specified. The internal = 10.1.1.98 and has no gateway specified.

    1 Reverse proxy. This has two NICs, one external and one internal. External one = 11.1.1.5 and has a gateway specified. Internal one = 10.1.1.69 and has no gateway specified.

    There's also an Exchange server with UM enabled. This has auto attendant configured

    SIP Trunk provider = Intelepeer

    External firewall = SonicWall 2040. This is on an old version, SonicOS 2.0.x

    Symptoms and errors

    1. In external-to-internal calls the outbound audio drops. This does not always happen. Sometimes it happens within 10 minutes, sometimes 50 minutes.

    2. Same as above but for conference calls.

    3. The above problems happen whether the user is connected to the VPN or not.

    4. When connected to the Lync client without VPN the Lync client frequently says "Limited functionality due to server connectivity issues" then says "Repaired connection". The call doesn't drop when this happens, and the audio doesn't immediately drop either. In other words, this can happen several times during a call and one way audio problem may or may not happen.

    5. On the front end server I see events 41024, 41025, and 41026 from source LS Data MCU happen frequently throughout the day. The error message in 41024 is "Lost connection to the web conferencing edge server", 41206 is "Lost connection to all Web Conferencing Edge Services" and 41025 states that the connection has been restored

    6. On my client at the beginning of the call i see "icewarn=0x9" in the log file. According to the ICE Warnings Flags Decoder tool that means "TURN server is unreachable"

    I have used the Lync Logging Tool and Wireshark on the frontend, edge, and my computer. I don't see anything obvious when looking at the logs, but i'm no expert.

    Any help and suggestions are much appreciated.


    Makolyte, Software Developer + System/Network Admin


    • Edited by Makolyte Thursday, October 31, 2013 9:33 PM
    Thursday, October 31, 2013 9:32 PM

All replies

  • It looks like a connectivity problem, but if that is the case you should see drops on the wireshark sniffs. If the connection to the Lync server is lost you get a warning limited connectivity but since the call is Client <> Client this stays connected (as long as there is connectivity between them).

    The ICE warning normally points to closed ports: http://dusk1911.wordpress.com/2012/05/28/ice-warning-messages/ But it is strange that the connections drop in random times, would expect this to occur everytime and more frequent. I would try to sniff all endpoints (is a lot of work, but you would have all the data you needed) in:

    • FE
    • Edge
    • Client (internally)
    • Client (externally)
    • Firewall (most of the time they can tcpdump)

    You could also use the Snooping tool and collect all the files generated there, but the Wireshark solution would be preferred since it sounds like a network issue (if i read your story correctly a connectivity problem between FE and Edge).

    There are ping tools (http://www.colasoft.com/ping_tool/) set up a ping test between the FE and the Edge.

    Good luck.

    Friday, November 01, 2013 6:50 AM
  • That's a good idea. I would have wireshark capturing on all endpoints and wait for the problem to happen, and then see on which endpoint the data stopped flowing.

    One thing i forgot to mention. I did several test calls and the problem happened more than 90% of the time when "Firewall SPI" was enabled on my home router. I disabled that setting and made 2 calls, one lasting over an hour, with no problems. Two tests isn't much of a sample size, but the fact that it's not failing two times in a row seems significant.

    I'll try what you suggested, with running wireshark on all endpoints and I'll post the results here when I get them.


    Makolyte, Software Developer + System/Network Admin

    Friday, November 01, 2013 2:11 PM
  • I took Wireshark captures on my client, the FE, edge, and looked at the log in the firewall.

    On my client I see traffic continually flow between my IP and the VPN server IP. I was connected to the VPN when taking the captures, so traffic was routed to that server. I believe that's why i'm not seeing traffic between me and the Edge server.

    On the frontend i'm seeing traffic continually flow between the PSTN IP (11.1.1.11) and the Intelepeer IP. I can also see my IP on the VPN going to the internal NIC of the frontend (and vice versa).

    When I contacted Intelepeer they told me on their side that the traffic coming from our side drops off, whereas the traffic going out to us keeps flowing. This is consistent with the one-way audio dropping to the PSTN caller.

    This means traffic is leaving the frontend server BUT not being received at the other end. This traffic would be hitting two things on my side before leaving our network: Windows Firewall and the external Sonicwall firewall. However, when I look at the logs in Sonicwall i don't see anything referring to the traffic between 11.1.1.11 and the Intelepeer IP. It seems to me that i would be seeing dropped packets in the log. The second thing is the Windows Firewall. But outbound traffic is not monitored according to the firewall settings (outbound traffic is only blocked IF you explicitly block it, which is different than inbound traffic where you have to explicitly allow it).

    I thought of another possible point of failure. The frontend server is a virtual server on Hyper-V. That means the traffic is physically being routed through the host server's NIC. Could the firewall on the host possibly be blocking or have anything to do with it?


    Makolyte, Software Developer + System/Network Admin

    Saturday, November 02, 2013 10:37 PM
  • It sure could have something to do with that, to sum it up, you don't see any incoming traffic from your FE on your firewall. Do you have any other Hyper-V machines that experience the same problems? Next things to try:

    • Migrate the FE to a different Hyper-V machine if you have any.
    • Log the traffic on the physical port of the Hyper-V machine (machine to switch).
    • Check the configuration of the Hyper-V machine and your FE virtual machine (firewall, traffic shaping, QoS).

    I have no experience with Hyper-V to be honest so wouldn't know the possibilities to try next but it seems according to your story that the problem comes from there.

    Sunday, November 03, 2013 2:34 PM
  • The SonicWall log only shows problems, but doesn't show all traffic. So i'm not seeing that the SonicWall is dropping packets between us and Intelepeer. There are reports for that in the SonicWall, but I can't access those due to the license being expired + model is EOL (i'd love to replace this)

    What i'm going to do now is start looking at all possible points of failure between the server and Intelepeer in this order:

    1. Windows Firewall on the Lync server
    2. Hyper-V host server’s NIC and Windows Firewall
    3. SonicWall
    4. Cable modem
    5. Our ISP

    I'm going to use this http://www.measurementlab.net/ to see if the ISP is doing something to the traffic.

    I'll post any results i get here


    Makolyte, Software Developer + System/Network Admin

    Sunday, November 03, 2013 6:41 PM
  • Here's an update

    I've confirmed that packets are leaving the Lync server. I can see them in the Packet Trace tool on the SonicWall. I see the UDP packets being sent by the SonicWall even when i can no longer hear the audio.

    I've contacted the ISP and they say there's no problems.

    The next step is to focus on the Sonicwall. I'm going to update the firmware, and then turn off all transformation settings + enable Consistent NAT.

    If that still doesn't work then i'm going to try another firewall.

    UPDATE 2

    I upgraded the PRO 2040 to TZ210. This has the latest SonicOS Enhanced firmware on it. The one way audio problem only seems to be happening after a long period of time now, so most likely a timing issue or something. I'm not seeing packets being dropped on my side at all. It could be Intelepeer is blocking packets at their end after a timer expired or something.

    Other issues fixed:

    1. Several services had an incorrect SIP Domain. It was missing the .com at the end. This resulted in them failing. Only the Audio Test Service was reporting errors though. I had to use ADSIEdit to update the domain to include the .com. Now that's working.

    2. While in ADSIEdit fixing the other issue i noticed a bunch of records pointing to a server that's not even used, and it was created in 2008. Someone had "attempted" to install OCS 2007 and looks like it left a ton of orphaned records in there. I deleted them all. I'm now noticing that the constant "Lost connection to all Web Conferencing Edge Services" errors have gone away. This was most likely caused by it trying to connect using the orphaned OCS 2007 records. Well, those have stopped.

    3. Port 444 wasn't opened in the firewall.

    4. Reverse proxy wasn't forwarding port 80.

    These issues seem to play a role in the intermittment problems we have. I'm not sure if they would have anything to do with the one way audio problem though, but it sure helps having a non-buggy environment.


    Makolyte, Software Developer + System/Network Admin


    • Edited by Makolyte Monday, November 11, 2013 3:16 PM
    Wednesday, November 06, 2013 5:26 PM