none
Very complex/compund SQL view... Help making this more efficient, please?

    Question

  • I have some complicated tables.   Tables which I have no control over, nor can they be changed.  Thus, I have derived views to compile the data into a single table, so then I can copy this table into cloud services very quickly.  Getting the table from my local server to the cloud works great.  But, the views are so complex, it can take up to three minutes to generate the view.  I am looking for some help making this more efficient.

    This View is to top level.  It contains several sub-views that parse and arrange data. At the top level, it takes about 2m 44s to generate.

    SELECT     TOP (100) PERCENT dbo.Documents.DocumentID, dbo.VIEW_SUB_PDM_DWGREVISION_LATEST.DwgRevision, dbo.VIEW_SUB_PDM_DESCRIPTION_LATEST.Description, dbo.VIEW_SUB_PDM_PROJECTNAME_LATEST.ProjectName, 
                      dbo.VIEW_SUB_PDM_PARTNUMBER_LATEST.PartNumber, dbo.VIEW_SUB_PDM_PARTNUMBER_LATEST.CurrentStatusID, CASE WHEN dbo.Documents.CurrentStatusID <> 10 THEN 1 ELSE 0 END AS Pending, dbo.Documents.LatestRevisionNo, 
                      dbo.VIEW_SUB_PDM_PARTNUMBER_LATEST.Filename, dbo.VIEW_SUB_PDM_RvTbl_DESCRIPTION_LATEST.RvTbl_Description, dbo.VIEW_SUB_PDM_RvTbl_DwgDate_LATEST.RvTbl_DwgDate, dbo.VIEW_SUB_PDM_RvTbl_Approved_LATEST.RvTbl_Approved, 
                      dbo.VIEW_SUB_PDM_RvTbl_Revision_LATEST.RvTbl_Revision, dbo.DocumentsInProjects.ProjectID
    FROM        dbo.VIEW_SUB_PDM_RvTbl_Revision_LATEST RIGHT OUTER JOIN
                      dbo.Documents INNER JOIN
                      dbo.VIEW_SUB_PDM_PARTNUMBER_LATEST ON dbo.Documents.DocumentID = dbo.VIEW_SUB_PDM_PARTNUMBER_LATEST.DocumentID INNER JOIN
                      dbo.DocumentsInProjects ON dbo.Documents.DocumentID = dbo.DocumentsInProjects.DocumentID ON dbo.VIEW_SUB_PDM_RvTbl_Revision_LATEST.DocumentID = dbo.Documents.DocumentID LEFT OUTER JOIN
                      dbo.VIEW_SUB_PDM_RvTbl_Approved_LATEST ON dbo.Documents.DocumentID = dbo.VIEW_SUB_PDM_RvTbl_Approved_LATEST.DocumentID LEFT OUTER JOIN
                      dbo.VIEW_SUB_PDM_RvTbl_DwgDate_LATEST ON dbo.Documents.DocumentID = dbo.VIEW_SUB_PDM_RvTbl_DwgDate_LATEST.DocumentID LEFT OUTER JOIN
                      dbo.VIEW_SUB_PDM_RvTbl_DESCRIPTION_LATEST ON dbo.Documents.DocumentID = dbo.VIEW_SUB_PDM_RvTbl_DESCRIPTION_LATEST.DocumentID LEFT OUTER JOIN
                      dbo.VIEW_SUB_PDM_DWGREVISION_LATEST ON dbo.Documents.DocumentID = dbo.VIEW_SUB_PDM_DWGREVISION_LATEST.DocumentID LEFT OUTER JOIN
                      dbo.VIEW_SUB_PDM_DESCRIPTION_LATEST ON dbo.Documents.DocumentID = dbo.VIEW_SUB_PDM_DESCRIPTION_LATEST.DocumentID LEFT OUTER JOIN
                      dbo.VIEW_SUB_PDM_PROJECTNAME_LATEST ON dbo.Documents.DocumentID = dbo.VIEW_SUB_PDM_PROJECTNAME_LATEST.DocumentID
    ORDER BY dbo.Documents.DocumentID DESC

    Each of the sub-views that has the suffix _LATEST are all very similar.  These tables are not just simple columns with values.  Some columns contain index codes that have to be collected together, and then the last of that index is the "latest" which is desired for the resulting view.  All of the sub-views take about 2m30s to generate.

    SELECT     TOP (100) PERCENT DocumentID, VariableID, ExtensionID, Description, RevisionNo, CurrentStatusID, Date, TransitionNr, RevNr, LatestRevisionNo, Filename, Deleted
    FROM        (SELECT     dbo.VariableValue.DocumentID, dbo.VariableValue.VariableID, dbo.Documents.ExtensionID, dbo.VariableValue.ValueText AS Description, dbo.VariableValue.RevisionNo, dbo.Documents.CurrentStatusID, dbo.TransitionHistory.Date, 
                                         dbo.TransitionHistory.TransitionNr, dbo.TransitionHistory.RevNr, dbo.Documents.LatestRevisionNo, dbo.Documents.Filename, dbo.Documents.Deleted, ROW_NUMBER() OVER (PARTITION BY dbo.VariableValue.DocumentID
                       ORDER BY dbo.VariableValue.RevisionNo DESC, dbo.TransitionHistory.TransitionNr DESC) AS Seq
    FROM        dbo.Documents INNER JOIN
                      dbo.VariableValue ON dbo.Documents.DocumentID = dbo.VariableValue.DocumentID INNER JOIN
                      dbo.TransitionHistory ON dbo.Documents.DocumentID = dbo.TransitionHistory.DocumentID
    WHERE     (dbo.Documents.Deleted = 0) AND (dbo.Documents.ExtensionID = 3) AND (dbo.Documents.CurrentStatusID <> 0) AND (dbo.VariableValue.VariableID = 47)) t
    WHERE     Seq = 1
    ORDER BY DocumentID DESC, RevisionNo DESC, TransitionNr DESC

    Here is another... 

    SELECT     TOP (100) PERCENT DocumentID, VariableID, ExtensionID, ProjectName, RevisionNo, CurrentStatusID, Date, TransitionNr, RevNr, LatestRevisionNo, Filename, Deleted
    FROM        (SELECT     dbo.VariableValue.DocumentID, dbo.VariableValue.VariableID, dbo.Documents.ExtensionID, dbo.VariableValue.ValueText AS ProjectName, dbo.VariableValue.RevisionNo, dbo.Documents.CurrentStatusID, dbo.TransitionHistory.Date, 
                                         dbo.TransitionHistory.TransitionNr, dbo.TransitionHistory.RevNr, dbo.Documents.LatestRevisionNo, dbo.Documents.Filename, dbo.Documents.Deleted, ROW_NUMBER() OVER (PARTITION BY dbo.VariableValue.DocumentID
                       ORDER BY dbo.VariableValue.RevisionNo DESC, dbo.TransitionHistory.TransitionNr DESC) AS Seq
    FROM        dbo.Documents INNER JOIN
                      dbo.VariableValue ON dbo.Documents.DocumentID = dbo.VariableValue.DocumentID INNER JOIN
                      dbo.TransitionHistory ON dbo.Documents.DocumentID = dbo.TransitionHistory.DocumentID
    WHERE     (dbo.Documents.Deleted = 0) AND (dbo.Documents.ExtensionID = 3) AND (dbo.Documents.CurrentStatusID <> 0) AND (dbo.VariableValue.VariableID = 45)) t
    WHERE     Seq = 1
    ORDER BY DocumentID DESC, RevisionNo DESC, TransitionNr DESC

    The total records in the results is around 650.  The  longest table [dbo.Documents] is 8,300 rows.  




    • Edited by martirl3 Thursday, May 9, 2019 6:31 PM
    Thursday, May 9, 2019 6:30 PM

All replies

  • Ok, here is some more info on this.  This is the graphical representation of this at top level.  It looks rather busy, but simply, each table represents a single row for each DocumentID, keyed on DocumentID.  Each "sub" table runs in SQL in a fraction of a second.  Pulling them together is over 3 minutes in some cases.  It doesnt seem that it should take that long.  Please help!!!

    Wednesday, May 15, 2019 4:27 PM
  • Bump!
    Monday, May 20, 2019 1:19 PM
  • You would need to look at the query plan to determine the best way to approach tuning your query.

    However, what you describe of many views are likely joining the same table(s) over and over and over.  You are likely much better off writing a whole new query which just does what you need, instead.

    Monday, May 20, 2019 1:29 PM
  • Hi,

    You may also consider creating a stored procedure where you declare the main table and then you fill it up gradually joining with 2-3 other tables.

    Do you need two 'FROM' in above query, maybe a comma after 'Seq' works too?

    Monday, May 20, 2019 1:52 PM
  • You would need to look at the query plan to determine the best way to approach tuning your query.

    However, what you describe of many views are likely joining the same table(s) over and over and over.  You are likely much better off writing a whole new query which just does what you need, instead.


    The nature of the tables I'm pulling from, I'm not certain can be done with a single statement.  Perhaps its my inexperience.  If I were to see the pattern formed by writing the two "sub" tables in the example I provided above, I could easily incorporate the others.  Is this something you might have a little time to look into?  It would be so helpful!

    Thank you.

    Monday, May 20, 2019 2:01 PM
  • Hi,

    You may also consider creating a stored procedure where you declare the main table and then you fill it up gradually joining with 2-3 other tables.

    Do you need two 'FROM' in above query, maybe a comma after 'Seq' works too?

    Stored procedures are a new concept for me.  I would be willing to have a look into it.  However, I do need to ask, if my PoweraApp needs to read the entire table to complete its filter, would the overall time to populate this procedure be a time saver?  I'm not sure if PA considers a stored procedure differently, or can even use them.  I do seem to remember seeing a reference to stored procedures, but I'm not sure if I saw that in PA, or MS Flow.




    • Edited by martirl3 Monday, May 20, 2019 2:16 PM
    Monday, May 20, 2019 2:04 PM
  • Hi,

    Stored procedures are much more powerful than views. However in the above case maybe you can get away with views. The reason I would use stored procedure is mainly because as I build the main table I can check every step and verify I am getting the correct result set. Anyway, it will be something along the line:

    create procedure myproc as

    begin 

    create table #mainTable(
    DocumentID varchar(50) null,
    DwgRevision varchar(50) null,
    Description varchar(50) null,
    etc
    )

    insert #mainTable (DocumentID)
    select DocumentID
    from dbo.Documents

    update #mainTable
    set DwgRevision = b.DwgRevision
    from VIEW_SUB_PDM_DWGREVISION_LATEST b
    where #mainTable.DocumentID = b.DocumentID

    update #mainTable
    set Description = b.Description
    from VIEW_SUB_PDM_DESCRIPTION_LATEST b
    where #mainTable.DocumentID = b.DocumentID

    etc

    you may consider joining multiple tables and use single update.

    at end you do:

    select *
    from #mainTable
    order by DocumentID

    end

    You can also use function, then you will call it as:
    select *
    from dbo.myFunc
    right now it will be instead:

    exec myProc()

    As for performance it will be fast since tables are small.

    Monday, May 20, 2019 3:22 PM
  • This is a better question for the "Transact-SQL" forum about query tuning.

    I would suggest you start with your query plan. Please see:

    https://www.sqlshack.com/using-the-sql-execution-plan-for-query-performance-tuning/

    Monday, May 20, 2019 4:29 PM
  • I did an execution plan.  I'm not sure what it really shows... except, i'm finding a stark difference in performance in running my view from a SELECT * FROM as compared to the Show first 1000 rows.  Running it as the first (SELECT * FROM) yeilds full results in about 6 seconds, which is very acceptable.  They other mechod (right button on view in studio, pick "Select Top 1000 Rows" takes in excess of 3 minutes.

    Any thougts on this?  

    


    • Edited by martirl3 Monday, May 20, 2019 7:22 PM
    Monday, May 20, 2019 7:15 PM
  • Bump
    Wednesday, May 22, 2019 1:45 PM
  • Please save your execution plan and upload it to a file share.  Then someone can look into it.

    See:

    https://docs.microsoft.com/en-us/sql/relational-databases/performance/save-an-execution-plan-in-xml-format?view=sql-server-2017

    Wednesday, May 22, 2019 4:50 PM
  • Ok, I think this link will get you the .dsqlplan file.

    https://drive.google.com/open?id=1unXdjmNTTY1_lvPjDfCgICnHc9kXSNTp

    • Edited by martirl3 Wednesday, May 22, 2019 6:57 PM
    Wednesday, May 22, 2019 6:30 PM
  • Can a moderator please move this thread to the Transact-SQL forum for more help on query tuning.


    Thursday, May 23, 2019 12:04 PM
  • I can see several opportunities for improvement with your query.

    It looks like you are calling several views which are all doing the same thing, Documents, VariablesValue and TransitionHistory. This is going to be very inefficient and you should rewrite your query to no use the views, so I only queries the documents table once.

    Your join to get data from dbo.TransitionHistory is running for every document returned.  You should look at why this is happening. 

    Your views are using ROW_NUMBER which requires a sort and is inefficient when you get many rows involvled.  You should rethink how this is working.

    Thursday, May 23, 2019 12:13 PM
  • I can also tell you are running SQL Server 2014 RTM version.  You really need to upgrade to the current service pack 3 and the current CU before trying to tune this query at all.

    https://support.microsoft.com/en-us/help/4022619/sql-server-2014-service-pack-3-release-information

    https://support.microsoft.com/en-us/help/4491539/cumulative-update-3-for-sql-server-2014-sp3

    Thursday, May 23, 2019 12:20 PM
  • I can see several opportunities for improvement with your query.

    It looks like you are calling several views which are all doing the same thing, Documents, VariablesValue and TransitionHistory. This is going to be very inefficient and you should rewrite your query to no use the views, so I only queries the documents table once.

    Your join to get data from dbo.TransitionHistory is running for every document returned.  You should look at why this is happening. 

    Your views are using ROW_NUMBER which requires a sort and is inefficient when you get many rows involvled.  You should rethink how this is working.

    I am all for rewriting as you say, however, someone much smarter than me is required.  It took a LONG time to figure out how to extract the LATEST data for each property, so I could collect all of those properties in a typical columnar form.  They way the VariableValue table is arranged makes it very difficult to extract a single property (VariableID).  It took someone smarter than me to figure out what I currently have.  I do not know if there is a way to do this all in one view.  

    The other unanswered question (in another post) is why does this view only take 6 seconds to generate if I dump it into a Query, as opposed to doing a "Show first 1000 rows" which takes 2-3min?  Which run method is it using when I call the view in Powerapps?  

    I also need the latest transaction code for every row, which is the reason for transaction history being invoked.  perhaps there is a cleaner way to do that as well.  Again, I'm just not smart enough.  I need some advice on that.

    The only thing I do know... is that the data result is correct.  That part is not a struggle.  I just need it structured better and my knowledge of SQL is quite deficient.  

    Here is a sampling of VariableValue table.  Each value of VariableID represents a different property for a DocumentID, which its value is found in ValueText.  All historical values of this property are maintained in this table.  The individual views sort by RevisionNo to get the latest for that particular property.  RevisionNo is not consistent between properties either.  Thus, VariableID=49 may have its latest revision a different number than VariableID=45.  So, you cannot key all VariableID from it.  In my limitied mind, this is all very complicated.

    Thanks for the replied!  It is so much appreciated.


    • Edited by martirl3 Thursday, May 23, 2019 2:13 PM
    Thursday, May 23, 2019 1:48 PM
  • I can also tell you are running SQL Server 2014 RTM version.  You really need to upgrade to the current service pack 3 and the current CU before trying to tune this query at all.

    https://support.microsoft.com/en-us/help/4022619/sql-server-2014-service-pack-3-release-information

    https://support.microsoft.com/en-us/help/4491539/cumulative-update-3-for-sql-server-2014-sp3

    I have approached my IT folks to get this done.  Do you think this alone will improve the performance??

    Thursday, May 23, 2019 1:49 PM
  • The Variable is not the biggest part of your query.  48% of your query is getting the TransitionHistory.   The query as written is running for every DocumentId.  So 1000 documents = 1000 lookups to TransitionHistory.  That is the part you should focus on.

    Can you post one of the views you are calling.

    In order to make the Variable lookup faster, I would personally add a IsCurrent column and a trigger to set it for each row. Then you would not need to figure it out every query.

    Friday, May 24, 2019 11:24 AM
  • It will definitely help.  There have been many query performance issues resolved since the RTM version was released.

    Friday, May 24, 2019 11:25 AM
  • The Variable is not the biggest part of your query.  48% of your query is getting the TransitionHistory.   The query as written is running for every DocumentId.  So 1000 documents = 1000 lookups to TransitionHistory.  That is the part you should focus on.

    Can you post one of the views you are calling.

    In order to make the Variable lookup faster, I would personally add a IsCurrent column and a trigger to set it for each row. Then you would not need to figure it out every query.

    I like that idea, but unfortunately, I cannot touch the tables.  I have been warned by the software company that if I touch the tables in any way, that I'm losing all "warranty" support.  The apps that generate the tables I am not the owner of.  Its a delicate situation. 

    I definetely only need to look at transition history for the results of the other conditions.  I'm not sure how to address moving the order around, because the graphical view cannot be generated due to the complexities.  Without that, I am just not able to wrap my head around the code.  I have to have the visual, or I'm toast.

    I posted the top level, and a couple the sub level view code in my first post.  Is that what you are asking?


    • Edited by martirl3 Friday, May 24, 2019 1:53 PM
    Friday, May 24, 2019 1:42 PM
  • It will definitely help.  There have been many query performance issues resolved since the RTM version was released.

    This was done yesterday.

    The view as shown just took 2min to generate.  That might be a little quicker than average, but only one datapoint.  

    • Edited by martirl3 Friday, May 24, 2019 1:51 PM
    Friday, May 24, 2019 1:42 PM
  • Would it be possible to combine my SUB statements all into a single statement (see original post)?  I'm not afraid of having a long script, but only if it would make a difference.  I would not know how to do it either.  Any thoughts on that?
    Tuesday, June 11, 2019 3:54 PM