locked
MDM VPN not connecting over Mobile Operator network but does over WiFi RRS feed

  • Question

  • Hi guys,

    I'm experiencing the following problem;

    Enrollment of devices goed fine, but the VPN does not connect if i'm using a WAN connection. Through Wifi, everything goes fine, device connects immediately. When I switch a connected device through WiFi to WAN, the connection re-establishes perfectly.

    As you can see in the following logfile, the connection times-out on the Port4500 UDP.

    LOG-I:05-11.12:46:08- IPsecVPNPM: STATE: [StateDNSResolve].
    LOG-I:05-11.12:46:08- IPsecVPNPM: STATE: [StateTunnelSetup].
    LOG-I:05-11.12:46:08- IKE SA Lifetime: 26280 seconds.
    LOG-I:05-11.12:46:08- IPSec SA Lifetime: 21600 seconds.
    LOG-I:05-11.12:46:09- IPsecVPNPM: Interface change Trigger.
    LOG-I:05-11.12:46:09- IPsecVPNPM: commit succeeded.
    LOG-I:05-11.12:46:09- IPsecVPNPM: STATE: [StateWaitForTunnel].
    LOG-I:05-11.12:46:46-
    LOG-I:05-11.12:46:46- IKEv2 SA [Initiator] negotiation failed:
    LOG-I:05-11.12:46:46-
    LOG-I:05-11.12:46:46-   Local IKE peer  10.79.32.95:4500 ID (null)
    LOG-I:05-11.12:46:46-   Remote IKE peer XX.XX.85.85:4500 ID (null)
    LOG-I:05-11.12:46:46-
    LOG-I:05-11.12:46:46-   Message: Timed out (65540)
    LOG-I:05-11.12:46:46- IKE SA negotiations: 1 done, 0 successful, 1 failed
    LOG-I:05-11.12:46:46-
    LOG-I:05-11.12:46:46- IPsec SA [Initiator] negotiation failed:
    LOG-I:05-11.12:46:46-
    LOG-I:05-11.12:46:46-   Local IKE peer  10.79.32.95:4500 ID (null)
    LOG-I:05-11.12:46:46-   Remote IKE peer XXX.XXX.85.85:4500 ID (null)
    LOG-I:05-11.12:46:46-
    LOG-I:05-11.12:46:46-   Message: Timed out (65540)
    LOG-I:05-11.12:46:46- IPsec SA negotiations: 1 done, 0 successful, 1 failed
    LOG-E:05-11.12:46:46- IPsecVPNPM: Internal Error.

    I've tried everything, but just cannot seem to find the problem. Network monitor shows everything ok, the port filtration tests are all Ok. If I check the event log on my Gateway server, it notifies me that the connection is established with the device. I've tried mulitple GPRS vendors (Vodafone/KPN), emulators on different sites (in house, works/remote, does not work) etc. Device management through Wifi goes perfectly as well as sofware distribution, remote wipe etc.

    I've also placed a call at microsoft, but also want to share my misery with you guys....

    Thanks for all the help
    Sander, the Netherlands.

    Wednesday, June 3, 2009 11:10 AM

Answers

  • Sander,

    If an external Wi-Fi Node doesn't work and your Mobile Operator doesn’t work, but a local Wifi connection does, then It’s most likely your ISP. That being said, you should test as many different MO SIM cards as you can get your hands on, and try as many External WLAN connections as possible just to be sure. 



    It just seems that the 4500 UDP packet is never received by the mobile device.... Network monitor says it's sent out, which then must activate the vpn connection, but it just never does....


    Since the server is sending out reply UDP 4500 Packets, we can assume that the device is hitting the server. So it's the return traffic causing you issues.
    Very hard to troubleshoot the ISP, but I have an idea. Hopefully your showcase environment is outside the environment we are troubleshooting. If this is the case, why not try this in reverse? Connect a device to the local Wi-Fi node and then try to establish a VPN connection to your showcase environment. If the ISP is mangling the 4500 traffic somehow, then this should not work.
    This doesn't help you much but at least it gives you some good ammunition to make the ISP take notice.

    Cheers Wayne
    Airloom
    Friday, June 5, 2009 11:31 PM
    Moderator
  • We've managed to get it to work by rerouting a SDSL line to the GW server, though it must be reconfigured when KPN Netherland fixes the issue.
    Tuesday, June 30, 2009 2:33 PM

All replies

  • I have seen this issue. I assume you are running SP1 of SCMDM which solves some VPN related issues?

    A common reason for this issue is port filtering on the mobile network. Are you sure the APNs you are using allow the proper ports to pass through?

    There's most likely nothing wrong in your SCMDM deployment since it works over WiFI, and thus it seems to be a device side issue or operator-related. Which devices are you using? Have you tested different devices? Running the latest ROM on the devices?
    Wednesday, June 3, 2009 2:57 PM
  • Hi,

    I'm using it with;

    Sony Xperia X1
    HTC Touch HD WM6.1 &WM6.5

    Both those devices do work on my own environment. The vpn is tested on multiple connections (via Wifi with and without Emulators and through emulators remote, with and without mobile operator network), so APN is also not an issue.... I'm teaming up with a Cisco engineer tomorrow to see if the core router drops some traffic or something....

    It just seems that the 4500 UDP packet is never received by the mobile device.... Network monitor says it's sent out, which then must activate the vpn connection, but it just never does....


    It's the SCMDM SP1 indeed.
    Wednesday, June 3, 2009 3:12 PM
  • Try running through the MDM VPN troubleshooting steps.  http://technet.microsoft.com/en-us/library/dd252860.aspx

     

    I recommend Installing the MDM VPN Diagnostics client on the device and performing a Port Filtration Test. This should tell you whether there is a simple ISP issue or firewall issue.

    One major trick, that is missed by many, is the VPN outbound ports. Don’t rely on the inbound VPN rules allowing the traffic back out. Most firewalls process the incoming state correctly but others don’t. Make sure you specify the traffic leaving the MDM gateway.

    Make sure you are not trying to NAT the traffic on the way in as this is not supported.

    Make sure you allow ESP and AH protocols. Some firewall require you to specify the protocol being used.

     

    Cheers Wayne

    Airloom

    Thursday, June 4, 2009 8:24 AM
    Moderator
  • The gateway server isn't situated behind any firewall at the moment, that's the strange thing of all. Devices cán connect through Wifi (wich is als situated outside the corporate firewall but within the same subnet as the gateway server's external ip-adress), but just cannot get through internet. I've troubleshooted the Cisco 1811 Router today with my network experts, but that device is also doing it's job sorely; routing.... The only cause I can think of now is the PAT that is done between the ISP's cloud and that of the mobile operator.... The mobile operator does it's job fine, it works on other environments i've installed at customers sites.

    I've ran the BPA analyzer, it says at the post-deployment scan that the "certificate chain is not valid", but;

    - Via the VPN diagnostics tool on the device, everything is ok (configuration - certificates and via the report)
    - Via Pkiview.msc, everything is ok, the CRL and AIA's, all ok for the root offline and the issuing intermediate
    - the certificate used for the gateway server shows up as valid (with the chain in it)

    I cannot think this is the issue, as the connection get's 'up' when I connect the devices through WiFi....

    Please advice.
    Thursday, June 4, 2009 6:59 PM
  • Are you using an internal WLAN AP, or does it apply for all WLANs? I mean - we can probably assume that the WLAN traffic routes along a different path than the GPRS traffic even if it ends up at the same IP/interface.

    But is there really any PAT between the mobile operator and your ISP? Do you have a "proper" public IP on your gateway server? NAT and/or PAT in front of the gateway is not a good thing. The mobile device may have a NATed IP - that's ok, but it might cause issues if it applies to the server side.
    Thursday, June 4, 2009 7:10 PM
  • The Wifi AP is situated at the external side of the firewall of the customers network, so it has an external ip adress (217.XXX.XX.XX), the gateway server also has an external ip-assigned (217.XXX.XX.XX), so they are in the same subnet and are therefore not routed but switched to eachother.
    By PAT, I mean the Port Adress Translation that is being done between the cloud of my Mobile Operator and the cloud of the ISP (Mobile operators assign devices an internal IP range through SNAT), so :

    Device ip = 10.X.X.X | Mobile Operators cloud-ip = 64.XX.XX.XX | Gateway Server IP = 217.XX.XX.XX, so traffic goes as;

    10.X.X.X:4500 => 64.X.X.X.X:6992(e.g.) => 217.XX.XX.XX:4500
    217.XX.XX.XX:4500 => 64.X.X.X.X:6992 => 10.X.X.X:4500

    Between my Mobile device and the Mobile Operators cloud there is a SNAT and PAT, but that is correct and supported(Check your Base IP on the VPN diagnostics).

    Quote(Link: http://social.technet.microsoft.com/Forums/en-US/SCMDM/thread/62620f30-a866-4bd5-8712-947ec2621399/):

    The mdm solution does support SNAT (Source NAT), as most carriers NAT the devices at source. MDM uses Negotiation of NAT-Traversal in the IKE, to support SNAT. It does not support DNAT (Destination NAT), or any destination translation of the IPSEC traffic. IPSEC uses AH (Authentication Headers) to check the traffic has not been modified.


    Am I just clueless? :)

    Thursday, June 4, 2009 8:51 PM
  • If the WiFi AP is on the same subnet as the gateway, and no routing occurs, there might also be less firewalls to traverse?

    Wayne is correct that SNAT = ok and DNAT != ok. I'm not sure what the proper term for the scenario above is, but PAT is a form of NAT, and as you describe it your mobile operator is translating the traffic. If we assume that there are no firewalls blocking port 4500 - have you checked that protocol 50 (if I remember correctly) isn't blocked along the way? (I admit I have no idea how protocol 50 works with NAT/PAT, but I'm sure someone else knows that stuff better than me.)
    Thursday, June 4, 2009 9:59 PM
  • Protocol 50 is ESP, which is the encryption part of IPSEC (Encapsulating Security Payload). This Protocol is happy behind a NAT. Protocol 51, AH (Authentication Headers) is a different matter... it doesn’t allow any translation of the traffic. That’s its sole purpose in life... it signs and numbers each packet so that you can guarantee it’s not been tampered with. To compensate for NAT translation, NATT was invented, NAT Transversal uses port 4500 to Negotiate and protects the original IPSec encoded packet by encapsulating it with another layer of UDP and IP headers. Simple hey ! NOT. Well if it was easy it wouldn’t be so secure. ;-)

     

    The difference between you Wi-Fi Devices and your Mobile Operators Devices, is probably NATT related. Your Wi-Fi devices are on the same LAN/switch so don’t need to use NATT. Your Mobile Operator Devices are in the cloud so they will need NATT. As your Wi-Fi devices are not subject to SNAT (Source NAT) they wouldn’t have to use NATT (UDP 4500).

     

    I think the issue is between your MO and your ISP. You seem to be Port translating the traffic on the way in and the way out, which is not supported. It should still work, but it's not supported.

    What happened when you performed the Port Filtration Test ? Did it pass ?

    Try connecting a Wi-Fi Device to a public Access point (subject to NAT) to see if the VPN comes up.

     

    Check the MDM Firewall Rules with your ISP. Notice the IPSEC rules are the only ones that have bi-directional listed.

     

    Cheers Wayne

    Airloom

     

    Friday, June 5, 2009 12:20 AM
    Moderator
  • Hi Sander,

    I believe others have confirmed SCMDM working with KPN in NL on their <internet> APN.. But Vodafone and the <office.vodafone.nl> APN I'm not so sure on unless you have good contacts there.. Which APNs are you using?

    Mazzzl,

    |\\arco..
    Friday, June 5, 2009 4:15 AM
    Answerer
  • Hi Sander,

    I believe others have confirmed SCMDM working with KPN in NL on their <internet> APN.. But Vodafone and the <office.vodafone.nl> APN I'm not so sure on unless you have good contacts there.. Which APNs are you using?

    Mazzzl,

    |\\arco..
    It should work, as i'm using it with Vodafone on my showcase environment. I'm just thinking the problem does not lay in the Mobile Operators network (APN's), but the ISP's cloud. It's only almost impossible to tell them that....


    What happened when you performed the Port Filtration Test ? Did it pass ?

    Try connecting a Wi-Fi Device to a public Access point (subject to NAT) to see if the VPN comes up.

     

    Check the MDM Firewall Rules with your ISP. Notice the IPSEC rules are the only ones that have bi-directional listed.

     

    Cheers Wayne

    Airloom

     


    Port filtration tests are all Ok. Connecting through an external Wifi AP gives no luck. The ISP says they do not firewall anything (KPN Epacity I think it was), as stated before, the gateway server is publicly placed at the moment, no firewall near it.

    Friday, June 5, 2009 8:19 AM
  • I've created a network graph of the situation;



    The DM gateway server must get back behind the firewall (Smoothwall DNS proxy) when I manage to fix the problem, till that time I want to bypass all local possible issues like firewalls etc :)
    Friday, June 5, 2009 8:37 AM
  • Hmm.. it's a bit tricky... You could try to run Wireshark and compare a successfull connection through WLAN with an unsuccessful trough GPRS. (They would be different due to no NATing over WLAN, but still...)
    Friday, June 5, 2009 10:27 AM
  • Hmm.. it's a bit tricky... You could try to run Wireshark and compare a successfull connection through WLAN with an unsuccessful trough GPRS. (They would be different due to no NATing over WLAN, but still...)

    Yup, did that yesterday. No difference in traffic unfortunately. WLAN also NATs, Wireless devices get an 192.168.200.x adress(so it's also using UDP4500).


    Friday, June 5, 2009 11:07 AM
  • Sander,

    If an external Wi-Fi Node doesn't work and your Mobile Operator doesn’t work, but a local Wifi connection does, then It’s most likely your ISP. That being said, you should test as many different MO SIM cards as you can get your hands on, and try as many External WLAN connections as possible just to be sure. 



    It just seems that the 4500 UDP packet is never received by the mobile device.... Network monitor says it's sent out, which then must activate the vpn connection, but it just never does....


    Since the server is sending out reply UDP 4500 Packets, we can assume that the device is hitting the server. So it's the return traffic causing you issues.
    Very hard to troubleshoot the ISP, but I have an idea. Hopefully your showcase environment is outside the environment we are troubleshooting. If this is the case, why not try this in reverse? Connect a device to the local Wi-Fi node and then try to establish a VPN connection to your showcase environment. If the ISP is mangling the 4500 traffic somehow, then this should not work.
    This doesn't help you much but at least it gives you some good ammunition to make the ISP take notice.

    Cheers Wayne
    Airloom
    Friday, June 5, 2009 11:31 PM
    Moderator
  • Hopefully your showcase environment is outside the environment we are troubleshooting. If this is the case, why not try this in reverse? Connect a device to the local Wi-Fi node and then try to establish a VPN connection to your showcase environment. If the ISP is mangling the 4500 traffic somehow, then this should not work.
    This doesn't help you much but at least it gives you some good ammunition to make the ISP take notice.

    Cheers Wayne
    Airloom

    Excellent plan, I'm gonna try it right away.
    Saturday, June 6, 2009 10:08 AM
  • Ok, i've tried with my Emulator at the customer's site (that works on it's site itself) to connect to my environment. The situation is exactely the same! Thanks Wayne, it was en excellent tip, never thought of it.

    LOG-I:06-06.12:14:52- IPsecVPNPM: entering event loop.
    LOG-I:06-06.12:14:52- IPsecVPNPM: STATE: [StateStart].
    LOG-I:06-06.12:14:53- IPsecVPNPM: PM created successfully.
    LOG-I:06-06.12:14:53- IPsecVPNPM: STATE: [StateDisconnectVNICs].
    LOG-I:06-06.12:14:54- IPsecVPNPM: STATE: [StateDisconnectVNICs].
    LOG-I:06-06.12:14:54- IPsecVPNPM: STATE: [StateSetInitialRules].
    LOG-I:06-06.12:14:54- IPsecVPNPM: Interface change Trigger.
    LOG-I:06-06.12:14:55- IPsecVPNPM: commit succeeded.
    LOG-I:06-06.12:14:55- IPsecVPNPM: STATE: [StateConnMgrEstablish].
    LOG-N:06-06.12:14:55- IPsecVPNPM: Using default connections.

    LOG-I:06-06.12:14:55- IPsecVPNPM: STATE: [StateDNSResolve].
    LOG-I:06-06.12:14:55- IPsecVPNPM: STATE: [StateTunnelSetup].
    LOG-I:06-06.12:14:55- IKE SA Lifetime: 26280 seconds.
    LOG-I:06-06.12:14:55- IPSec SA Lifetime: 21600 seconds.
    LOG-I:06-06.12:14:55- IPsecVPNPM: commit succeeded.
    LOG-I:06-06.12:14:55- IPsecVPNPM: STATE: [StateWaitForTunnel].
    LOG-I:06-06.12:15:38-
    LOG-I:06-06.12:15:38- IKEv2 SA [Initiator] negotiation failed:
    LOG-I:06-06.12:15:38-
    LOG-I:06-06.12:15:38-   Local IKE peer  192.168.200.96:4500 ID (null)
    LOG-I:06-06.12:15:38-   Remote IKE peer 85.234.235.216:4500 ID (null)
    LOG-I:06-06.12:15:38-
    LOG-I:06-06.12:15:38-   Message: Timed out (65540)
    LOG-I:06-06.12:15:38- IKE SA negotiations: 1 done, 0 successful, 1 failed
    LOG-I:06-06.12:15:38-
    LOG-I:06-06.12:15:38- IPsec SA [Initiator] negotiation failed:
    LOG-I:06-06.12:15:38-
    LOG-I:06-06.12:15:38-   Local IKE peer  192.168.200.96:4500 ID (null)
    LOG-I:06-06.12:15:38-   Remote IKE peer XX.XXX.XXX.216:4500 ID (null)
    LOG-I:06-06.12:15:38-
    LOG-I:06-06.12:15:38-   Message: Timed out (65540)
    LOG-I:06-06.12:15:38- IPsec SA negotiations: 1 done, 0 successful, 1 failed
    LOG-E:06-06.12:15:38- IPsecVPNPM: Internal Error.

    LOG-I:06-06.12:15:38- Phase-I negotiation failed
    LOG-I:06-06.12:15:38-   Message: Invalid argument (65538)
    LOG-I:06-06.12:15:38- IPsecVPNPM: Tunnel down: Mobike was not operational
    LOG-I:06-06.12:15:38- IPsecVPNPM: STATE: [StatePurgeRules].
    LOG-I:06-06.12:15:38- IPsecVPNPM: into UnAttended Mode.
    LOG-I:06-06.12:15:38- IPsecVPNPM: Deleting the Tunnel
    LOG-I:06-06.12:15:38- IPsecVPNPM: commit succeeded.
    LOG-I:06-06.12:15:38- IPsecVPNPM: STATE: [StateDefaultPolicy].
    LOG-I:06-06.12:15:38- IPsecVPNPM: STATE: [StateDefaultRun].
    LOG-I:06-06.12:15:38- IPSEC VPN TUNNEL RETRY DELAY: 19 seconds.
    LOG-I:06-06.12:15:38- IPsecVPNPM: Set the CESetUserNotification
    LOG-I:06-06.12:15:38- IPsecVPNPM: out of UnAttended Mode.
    LOG-I:06-06.12:16:15- IPsecVPNPM: retry timeout fired.
    LOG-I:06-06.12:16:15- IPsecVPNPM: STATE: [StateDefaultRun].
    LOG-I:06-06.12:16:15- IPsecVPNPM: STATE: [StatePurgeRules].
    LOG-I:06-06.12:16:15- IPsecVPNPM: Clear CESetUserNotification.
    LOG-I:06-06.12:16:15- IPsecVPNPM: into UnAttended Mode.
    LOG-I:06-06.12:16:15- IPsecVPNPM: Deleting the Tunnel
    LOG-I:06-06.12:16:15- IPsecVPNPM: commit succeeded.
    LOG-I:06-06.12:16:15- IPsecVPNPM: STATE: [StateConnMgrEstablish].
    LOG-N:06-06.12:16:15- IPsecVPNPM: Using default connections.



    This must mean that the ISP is messing with the packets. I'm gonna contact that ASAP.

    Saturday, June 6, 2009 10:19 AM
  • Sander,

    It's great to hear you can replicate the issue on the outbound traffic. Let us know what they end up saying !

    Cheers Wayne
    Airloom
    Tuesday, June 9, 2009 12:13 AM
    Moderator
  • We've managed to get it to work by rerouting a SDSL line to the GW server, though it must be reconfigured when KPN Netherland fixes the issue.
    Tuesday, June 30, 2009 2:33 PM
  • Well guys,

    You'll never guess it, but at a another customer's site with also KPN business, i'm experiencing EXACT the same problem. KPN is still saying the problem is not on their side, even when I say that the connection instantaniously works on another carriers WAN line.... I'm really getting frustrated with this. The problem is also not solved yet at the first customers' site............................ The advantage i've got here, is that the router that is placed befóre the Firewalls is a managed one from KPN. So i'll let them figure it out.

    Keep you posted (again).

    Friday, August 28, 2009 9:34 PM
  • Hi guys,

    I've got some progress, can anyone say anything usefull about the information i've gotten;




    It seems that that the VPN client cannot use fragmented IP packets (which is neccesairy with PPPoE), our advice is to reduce the MTU sizes. When PPPoE used, lower the MTU's to 1450 bytes(or lower).
    Because the KPN uses PPPoE, there is an extra overhead in the frames, when the MTU size is too large, the frames will be divided on multiple frames, known as interleaving. IPSEC classifies interleaved frames as non-trusted traffic. As an alternative we can offer a numbered link, instead of PPPoE.

     

    Can this be the cause (PPPoE)?

    Thanks.
    Friday, September 4, 2009 8:17 AM
  • This is definitely a point of interest. PPPoE is sort of a "fake dial-up", and is most commonly seen on DSL connections for home users. I try to avoid it I can.

    Wikipedia has the basic info about PPPoE and MTU :) But, yes, the summary above gives the explanation.

    It is possible to change the MTU on a Windows Server, but the box that needs its MTU re-configured here is probably the DSL router. Unfortunately there's no guarantee that will work with IPSec, and if it does probably not optimally.

    I haven't heard the term numbered link, but I assume this is a non PPPoE link. (Using PPPoA, fiber, or something instead, but that doesn't matter.) If your ISP recommends this I would probably go for it.
    Friday, September 4, 2009 8:43 AM
  • I've seen the same problem with 2 cellular providers we use. The symptoms we saw was that using a GPRS connection, Mobile VPN would not connect. Using a higher speed connection (EDGE, 3G or HSDPA) the problem was not seen.

    We found, after extensive troubleshooting with the cellular providers, that the packets were fragmented because GPRS was unable to handle so large packets, and so only part of the authentication request were sent to the MDM gateway and then discarded as incomplete (or something along those lines).

    The solution, as I understood it, was for the cellular providers to re-route the traffic to a router that reassembled the fragmented packets before forwarding them to the MDM gateway.

    I'm not exactly sure on the details, I've got the information second hand, but just wanted to post it here so other people would have a place to look if stumped like we were.
    Tuesday, December 29, 2009 12:41 PM