Thursday, March 15, 2012 6:14 PM
I have a baffling problem on only one of the Windows Server 2008 R2 installations of an SSIS package which contains the latest download of the codeplex SCD component (http://dimensionmergescd.codeplex.com/). I do not see this problem on any of the Windows 7 development boxes or on either of the Windows Server 2008 R2 QA and UAT installations. On the working servers, desktops this same package runs without fail.
Here is the symptom on the problem Windows Server 2008 R2 box--
The SQL Agent job executes the SSIS package from the database. The package runs through validation, pre-execution--then executes the sql tasks that gather the source and existing dim data. So far, so good. It appears that when the SCD task is ready to do the merge in the 3rd party component, it hangs. this causes all the sql spids associated with the running job/ssis package to fall into a "sleeping" state--waiting on some action. The job eventually has to be terminated. No errors are recorded in the SSIS logs, SQL Server or the Windows Event logger.
I've tried running the package via dtexec in a command window on the problem box with the same results; i've tried running the job under an ssis proxy account using my login (I'm in the dba/windows admin role--so a goddess across multiple servers, including the problem child) with the same results.
Now for the kicker--I can successfully run this package from the integrated services object connection.
On all the other server (and development) installations, the package runs in all 3 ways; through the command line, through the sql agent job and through the integrated services object connection.
I've looked at so many things--the assemblies, the versions of the framework, the permission settings on the directories where the component dll lives, my permissions, the service account permissions, windows firewall, etc. with no obvious culprit(s).
Can anyone direct me on what the likely problem source might be--or how best to further analyze the problem?
All suggestions, think-throughs, questions, etc. will be very gratefully accepted!:-)
Thursday, March 15, 2012 8:13 PMLooks like SCD task merging large volume of data on that specific environments where its hanging, might not on other environments - I would suggest you to start SQL Trace and see what causing this delay- any blocking, wait stats.
Thursday, March 15, 2012 10:19 PM
Thank you for suggesting SQL Trace. I will see what this brings.
Thursday, March 15, 2012 11:25 PMModerator
What differences are there in the hardware and installed SQL between the working boxes and the one that doesn't work?
Have you turned on the first item in the Logging tab of that component? Do you have Package Logging enabled for your package?
Talk to me now on
- Proposed As Answer by Eileen ZhaoMicrosoft Contingent Staff, Moderator Friday, March 16, 2012 8:05 AM
Friday, March 16, 2012 3:43 PM
The successful runs of the package are happening on VMs--the other unsuccessful runs are happening on a physical box--so there's a world of difference there. Our server group is looking into that aspect. Is there anything specific that you think is most worthy of review?
I do have logging turned on; routed to an xml output file.
There is one setting that I am interested in knowing more about. The MaxConcurrentExecutables on this package is set to -1. It is my understanding that this setting doesn't mean limitless, but means as many executables as possible based on the load of the cpu--still maintaining the performance integrity of the box. This SCD component does utilize threading--based on what I see recorded in the SSIS log. Since this is a production box with a LOT of use, compared to the near idle others--I am thinking that the threshold of max executables package setting may be a factor -- if separate threads are considered separate executables.
Also, I am not familiar enough with the internals of the Windows 2008 R2 op sys to know if there a similar, configurable threshold setting available there to review.
The one fly in the ointment with this theory is that I am still able to run the package through DTExecUI with successful results--so that begs the question, what is different about how DTExecUI? If DTExecUI runs under 32b and SQL Agent/dtexec command line... runs under 64b, is it possible that more threads might become available under DTExecUI?
Let me know your thoughts on this.
- Edited by plditallo Friday, March 16, 2012 4:20 PM
Friday, March 16, 2012 8:02 PMModerator
There's nothing that I know of that's "special" between SSIS or DMSCD execution within a VM or on physical.
Have you tried that "per second informational" logging on on the DMSCD? Can you see those messages on the VM runs? Can you see any of those messages on the physical runs?
The MCE setting of -1 means... number of procs +2. I believe they mean cores. This setting applies to the Control Flow. If your VM/phys has 4 cores, then a maximum of six Tasks will be concurrently executed. Regardless of machine load, regardless of number of threads used within each task.
There is an EngineThreads property on each Data Flow Task. That governs how many sources and transforms are run simultaneously. Here's a good recap.
The DMSCD does NOT respect either - you set it's threading behaviour inside its editor independently.
Again, none of those settings takes other activity on the machine into account.
I missed that distinction between DTExecUI and DTExec in your original post. You're saying that you can execute the package on the physical machine with DTExecUI but not with DTExec? If so, then I'd expect the opposite of what you describe may be happening - DTExec from the command line may be referencing the 32bit version. Check this Registry key SOFTWARE\Microsoft\MSDTS\Setup\DtsPath to ensure it's the same on the VMs and phys.
Talk to me now on
- Marked As Answer by plditallo Friday, March 16, 2012 8:36 PM
Friday, March 16, 2012 8:27 PM
For anyone tracking this thread, I thought I would publish my findings ("solution" might be too strong a word).
In the process of problem solving, I decided to focus on the one consistent difference between the production box and the others: contention for processing resources. I cross referenced the symptom(s) with what direct/indirect cpu/memory management options were available to reduce the call for resources to a low enough level to either see the package run through completion successfully or fail with a meaningful error. I manually increased the number of MaxConcurrentExecutables, then executed the package again--this time adding under Advanced, an output text file. The package failed leaving this footprint in the output file:
Started: 2:03:50 PM
Error: 2012-03-16 14:03:53.05
Source: Load Demographics Dimension Merge Slowly Changing Dimension
Description: Internal error (Arithmetic operation resulted in an overflow.) in PreExecute determining thread count.
DTExec: The package execution returned DTSER_FAILURE (1).
This error is a by product of the SCD component's attempt to calculate the resources available to launch the needed threads for processing. I then jumped over to the codeplex website to search the error to discover that there is a known bug when a server has multiple cores.
Since DTExecUI runs under 32b AND was the only case in which the package ran successfully, I decided to reset the MaxConcurrentExecutables back to -1 and execute the package again, this time setting the sql job agent to execute using 32b. Viola! The package ran successfully.
For more details on DMSCD, etc. read Todd's second post on memory management.
Friday, March 16, 2012 10:01 PMModerator
For future reference, here's the known bug report.
But a question for you that I asked Martin Bobo on that thread - how many cores does your box have?
Talk to me now on
Monday, March 19, 2012 2:26 PM
I've asked our server gurus what the physical box has. Here's the answer:
4 quad cores; 16 x 3 processors with hyperthreading.