locked
Resilient SIP trunking RRS feed

  • Question

  • I'm in the process of testing SIP trunking with Global Crossing, and all is going well, Interop test just completed. However, in this site I need to setup a resilient service so which involves two SE servers, a primary and a backup. What I want to do is have SE1 route calls to SBC1 and SE2 also route calls to SE1, but it seems Lync topology builder won't let me do this. I can only associate a single SE server with a destination SIP gateway (SBC1), it doesn't offer the option to assoicate SE2 with SBC1.

    This means I need to ask GC for a second SBC for a resilient service, which you could argue is required anyhow if you want true resilience.

    So, before I ask for that am I missing anything?

    Jed

    Thursday, May 5, 2011 5:21 PM

Answers

  • "...but it seems Lync topology builder won't let me do this..."

    This is - if you use IP address as gateway name. Create two DNS A records - SBC1.yourdomain and SBC2.yourdomain pointing to the gateway's IP address and use FQDN when creating gateway(s) in TB.

     

    Drago


    http://www.lynclog.com
    • Marked as answer by JedE Saturday, May 7, 2011 10:23 AM
    Thursday, May 5, 2011 7:48 PM

All replies

  • "...but it seems Lync topology builder won't let me do this..."

    This is - if you use IP address as gateway name. Create two DNS A records - SBC1.yourdomain and SBC2.yourdomain pointing to the gateway's IP address and use FQDN when creating gateway(s) in TB.

     

    Drago


    http://www.lynclog.com
    • Marked as answer by JedE Saturday, May 7, 2011 10:23 AM
    Thursday, May 5, 2011 7:48 PM
  • Great one Drago, all sorted now :)

    I notice this is in the EV planning doc now too:

    "To solve multiple Mediation Servers interacting with the same gateway peer entity, you need to configure multiple virtual gateways. Each gateway would be associated with a different FQDN, which DNS would resolve to the same IP address."

    This brings me onto the next relevant point.

    I now have two SIP trunks configured (although not yet both live so can't test) and I'm wondering what happens with respect to otbound calls and failvoer/load balancing. For the oubound routes I have added both SIP trunks as Associated Gateways but I'm not sure how Lync selects the associated gateway for the routing, is it round robin load balanced, on failover or something else? If it's failover, what determines a failure i.e. TCP timeout, SIP rejection etc?

    My gateways are each associated with different colocated mediation servers so I'm not  sure if that has an impact too. Without the live trunks I can't test this, so hoping somebody has already been through this.

    What I would really like is an outbound load balanced setup so that I make use of all the concurrent calls I can on each SIP trunk as I can then purchase two SIP trunks and share the concurrent call charges across both, rather than having one idle whilst still paying for all the concurrent call costs.

    The next step after that will be working out what Global Crossing do with their inbound load balancing of calls,

    Jed

    Friday, May 6, 2011 10:53 AM
  • Very, very interesting topic!

    Lync sends OPTIONS sip request every minute or so. If the gateway/sip trunk does not reply in timely manner, the destination is marked “internally” as Down. Now, even thou marked, if we have only one gateway, the next outbound call will be send to the gateway anyway and it will either be established (if the issue was resolved), or fail (if has not been). In case of two or more gateway, the call will be offered to the next gateway that properly responded to OPTIONS and meanwhile, Lync will still try to verify the first gateway availability by periodically sending OPTIONS request. Once promptly replied with 200 OK, this gateway will be marked as “available”.

    To achieve a real resiliency, one must have two independent (in case of sip trunking) ITSP because having two or more gateways connected to the same or different provider(s) does not mean anything if your WAN goes down (happened to me last week).

    On provider’s side, there are few options; just… they vary by provider. Mine for example, would accept inbound from two different ordinate IP but send to one only. I know another provider in Florida who’s soft switch monitors the gateway availability in the same manner as Lync does, and route the call to alternative destination if the primary does not respond, while accepts from at least two ordinate IP’s. This is, more or less, full resiliency on provider’s side.

     

    Drago


    http://www.lynclog.com
    Friday, May 6, 2011 9:59 PM
  • OK, so you think the Associated Gateway list is a failover pair so that the top of the list is the primary, and in the event of an OPTIONs failure the second gateway is tried ...and so on down the list. This means I don't have the opportunity to outbound load balance my traffic.

    I would assume that in the scenario when I reach my concurrnet call limit on the trunk, the OPTIONS responses will still be 200 OK so calls will still be routed down the primary trunk but rejected due to the call limit. In this scenario is Lync clever enough to try the second trunk? I don't know enough about SIP failure codes to know if there is a sensible failure code Lync could act up on inthis scenario.

    What I would ideally like to happen is for the calls to overflow into the second SIP trunk. Hmm, I suppose CAC can't be used here can it ... maybe I could split the two SEs into seperate sites and CAC into the second site when the limit is reached.

    As for real resilience, in this scenario GC provide the IP VPN and we have specced dual POPs so in theory we should have full resilience .... it's just the SBC and gateway end that needs to be specced up as resilient.

    Jed

    Saturday, May 7, 2011 10:23 AM
  • Jed,

    You could try the following – instead of A record for each gateway, create single record for SBC1 with the IP addresses of SBC1 and SBC2. On theory, DNS will reply in Round Robin manner and so, every next call would be sent via different gateway. I am not sure thou, if Lync cashes the DNS or makes a DNS query every time sbc1.domain.local is sought.

    Drago


    http://www.lynclog.com
    Saturday, May 7, 2011 1:32 PM
  • I tried this with two A records for SBC1 pointing to two IPs and no luck. I had to bounce the services to get any DNS changes to kick in but it didn't ever try to connect to the second SBC IP address. I determined this from a Wireshark trace and no SYN requests heading out so I don't think it even tries the second DNC entry.

    One good thing is I could see that the failover works well from SE1 server to SE2 server at dial time. If SE1 gateway is down it tries for 5-10 seconds and then moves onto the SE2 gateway whilst in the same call, so no need to redial. Possibly on the downside you get a ring tone even when it can't get a TCP connection to the SBC, so it hides the failyre to the end user which prob makes debugging a little harder.

    So the testing I have performed so far doesn't help me work out what happens when the session limit is reached for a SIP trunk ... anybody have a clue on this one?

    Jed

    Monday, May 9, 2011 10:13 AM
  • So, now we know DNS is cached. Look this article: http://support.microsoft.com/kb/318803 how to disable the client side caching. Not sure if this is good idea, thou. One thing bothers me - you are talking about resiliency and cost balancing. Two different things, indeed...

     

    Drago


    http://www.lynclog.com
    Monday, May 9, 2011 11:37 AM
  • Hi Jed,

    I too had this question regarding outbound gateway selection behaviour in Lync - will Lync use the first GW specified in a route definition until/unless that GW is unavailable, or will calls be routed in a round-robin/load balancing fashion across all available GWs in the list?

    This Blog suggests that the answer is the latter - that calls are load balanced across all available gateways in the list:

     http://blogs.pointbridge.com/Blogs/Crockett_keenan/Pages/Post.aspx?_ID=12

     I just wondered if this is in keeping with what you have observed in production?

    Thanks,

    Garry

    Monday, January 9, 2012 5:01 PM
  • Hi Garry,

    With the setup I deployed last year I don't believe I saw a round robin behaviour. However, with my config the second SIP trunk was via a seperate SE server (backup pool) which is the equivalent of a different pool. I think this would mean that trunk is only used if the primary pool route was down, hence no round robin for this configuration. 

    I'm sure the link you provided is accurate, so if you have two SIP trunks in the same pool providing access to the same destinations, then you would get round robin.

    Jed


    Jed Please take a second to hit the green arrow on the left if the post was helpful, or mark it as an answer if it resolved your issue.
    Sunday, January 22, 2012 2:53 PM