locked
Connectivity errors in SQL jobs RRS feed

  • Question

  • Hi All,

    We are seeing intermittent communication link failures for sql jobs and we have to re-run the jobs once again.
    Also, some of our team suspecting its a TCP Chimney offload issue.
    What is the concept behind TCP chimney offload how it is related to job failure's.
    Can anybody explain what does it mean in simple terms and why it might cause connection drops?

    Thanks,
    -Sam
    Friday, November 27, 2015 8:43 PM

Answers

  • So it sounds like the issue occurs when connecting to other servers. What you need to investigate is if the errors always occur when connecting to a specific server, or whether it happens across the board.

    Note that when it comes to the TCP Chimney thing, it is my understanding that this is something that matters on the target server, not the originating one.

    But if it could happen across the board, it could be other issues in your network like an overloaded switch or whatever. And such a problem would certainly be intermittent in its nature, as the component only gets overloaded sometimes.

    Saturday, November 28, 2015 10:39 AM

All replies

  • TCP Chimney (sometimes referred to as TCP Offloading)

    This feature is designed to take processing of the network such as packet segmentation and reassembly processing tasks, from a computer's CPU to a network adapter that supports TCP Chimney Offload. This has the effect of reducing the workload on the host CPU and moving it to the NIC, allowing both the Host OS to perform quicker and also speed up the processing of network traffic.

    http://blogs.technet.com/b/onthewire/archive/2014/01/21/tcp-offloading-chimney-amp-rss-what-is-it-and-should-i-disable-it.aspx


    Please click "Mark As Answer" if my post helped.

    Friday, November 27, 2015 9:29 PM
  • But how does the job fails because of this TCP chimney feature. We are seeing communication link failures in job history and I do see people suggesting to disable this feature for all NICs. I want to understand what can happen for communication link to fail. Want to understand one such instance why It could lead to communication link failures. Can anyone explain?
    Friday, November 27, 2015 9:43 PM
  • Anyone please?
    Friday, November 27, 2015 9:57 PM
  • May I guess that the job is accessing a remote SQL Server instance one way or another. That is, either runs a job on the remote server or runs a job on the local server which includes a query to a linked server. It's a little more intriguing if you get the error on a job that only access the local SQL Server instance.

    Does this only happen to jobs? Does it happens to jobs that runs any time during the day, or only with jobs that runs in the wee hours at night (when the network may be clogged by backups).

    Friday, November 27, 2015 10:14 PM
  • Basically in the job step , a SSIS package is being called which in turn has some connection managers defined inside it.

    It happens with other jobs as well. May be I can check with the job schedule.

    Issue is again intermittent.

    Saturday, November 28, 2015 8:49 AM
  • So it sounds like the issue occurs when connecting to other servers. What you need to investigate is if the errors always occur when connecting to a specific server, or whether it happens across the board.

    Note that when it comes to the TCP Chimney thing, it is my understanding that this is something that matters on the target server, not the originating one.

    But if it could happen across the board, it could be other issues in your network like an overloaded switch or whatever. And such a problem would certainly be intermittent in its nature, as the component only gets overloaded sometimes.

    Saturday, November 28, 2015 10:39 AM
  • Okay, then we shall engage network team to take some traces on client & server communication.
    Saturday, November 28, 2015 3:59 PM