none
SQL - Run time RRS feed

  • Dotaz

  • We have this query that another team uses to extract data in to their system. This pulls huge volume of data around 300MI records. They do get timeout error most of the time. How can we modify this query to improve run time ?




    sami


    • Upravený arc2013 středa 23. října 2019 14:23
    úterý 22. října 2019 20:49

Odpovědi

  • Actually I modofied the first SELECT to something below. I took the records in to a temp table passing date parameter. Join condition that is happening on second query is only joined on the records from temp table. This helped. But not sure why SQL would do a complete table scan for the join instead of considering where cluase filter (I mean the actual query that was causing the issue)

    declare @start int

    select @start=20191020

    IF OBJECT_ID('#Temp') IS NOT NULL

    BEGIN

        DROP TABLE #Temp

    END

    Select * into #Temp from  Table1 p

    where P.dateid=@start

    select 

         , p.[attribute]

         , cast(DATEDIFF(s, '1970-01-01', [DateTime]) as bigint) * 1000 as DateTime

         , p.MIN

         , p.MAX

         , p.AVG

         , p.SUM

         , p.COUNT

      from #temp p

      join stagingdata c (NOLOCK)

        on p.Aid = c.CIName

       and p.cl = c.clname

    --where p.DATEID = @start


    sami

    • Označen jako odpověď arc2013 středa 6. listopadu 2019 17:55
    středa 23. října 2019 14:35
  • Actually I modofied the first SELECT to something below. I took the records in to a temp table passing date parameter. Join condition that is happening on second query is only joined on the records from temp table. This helped. But not sure why SQL would do a complete table scan for the join instead of considering where cluase filter (I mean the actual query that was causing the issue)

    Because it estimates that scanning the table is cheaper than using the index.

    A non-clustered index is good if the condition hits a few rows. But if many rows are hit, it is more expensive to read by the index than by scanning, since pages needs to be accessed many times.

    In the example, @start is a variable, to SQL Server needs to make a blind guess based on the density of the DateID colunm. That is, how many distinct values are there?


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    • Označen jako odpověď arc2013 středa 6. listopadu 2019 17:55
    středa 23. října 2019 22:00

Všechny reakce

  • Did you try to extract data by using two SELECT statements without UNION ALL?

    A Fan of SSIS, SSRS and SSAS

    úterý 22. října 2019 20:58
  • Both the queries are taking long time.

    sami

    úterý 22. října 2019 21:59
  • Removing all those dangerous NOLOCK could be a start.

    Do they have any reason to have a timeout in the first place? Retrieving 300 million rows takes some time. If they are running with the default timeout of 30 seconds they should increase it. This is a client-side setting.

    As whether the query itself can be improved - no idea, since I don't know the tables, the execution plan etc. You could capture the actual execution plan and upload it to http://www.brentozar.com/pastetheplan.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    úterý 22. října 2019 22:01
  • Actually I modofied the first SELECT to something below. I took the records in to a temp table passing date parameter. Join condition that is happening on second query is only joined on the records from temp table. This helped. But not sure why SQL would do a complete table scan for the join instead of considering where cluase filter (I mean the actual query that was causing the issue)

    declare @start int

    select @start=20191020

    IF OBJECT_ID('#Temp') IS NOT NULL

    BEGIN

        DROP TABLE #Temp

    END

    Select * into #Temp from  Table1 p

    where P.dateid=@start

    select 

         , p.[attribute]

         , cast(DATEDIFF(s, '1970-01-01', [DateTime]) as bigint) * 1000 as DateTime

         , p.MIN

         , p.MAX

         , p.AVG

         , p.SUM

         , p.COUNT

      from #temp p

      join stagingdata c (NOLOCK)

        on p.Aid = c.CIName

       and p.cl = c.clname

    --where p.DATEID = @start


    sami

    • Označen jako odpověď arc2013 středa 6. listopadu 2019 17:55
    středa 23. října 2019 14:35
  • Create an index on #temp on Aid and cl  (include all the columns you specify in SELECT) and

    have an index on stagingdata on CIName and CLName

    I would be good if  you can add  a WHERE clause as well....


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    středa 23. října 2019 14:46
    Přispěvatel
  • Actually I modofied the first SELECT to something below. I took the records in to a temp table passing date parameter. Join condition that is happening on second query is only joined on the records from temp table. This helped. But not sure why SQL would do a complete table scan for the join instead of considering where cluase filter (I mean the actual query that was causing the issue)

    Because it estimates that scanning the table is cheaper than using the index.

    A non-clustered index is good if the condition hits a few rows. But if many rows are hit, it is more expensive to read by the index than by scanning, since pages needs to be accessed many times.

    In the example, @start is a variable, to SQL Server needs to make a blind guess based on the density of the DateID colunm. That is, how many distinct values are there?


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    • Označen jako odpověď arc2013 středa 6. listopadu 2019 17:55
    středa 23. října 2019 22:00