locked
Bug or feature when querying SQL-server data? RRS feed

  • Question

  • I’ve experienced some “surprises” when working on SQL-server-data:

    1. Normally by sorting a table before removing duplicates I can control which rows will be kept (the first ones). But with query-folding on SQL-server-data it actually didn’t keep the sort-order applied in the previous step. So I had to stop the folding process by applying Table.Buffer on the sort-result. Is this a bug that will be sorted out in the future or just a “side-effect” of query-folding that one has to keep in mind?

    2. Another disturbing “side-effect” of query folding: When creating an index column in one query that will be referenced by other (different) queries, the indizes returned in the different branched/referencing queries will be different: Only the referencing query that runs last will be in sync with the referenced query, the queries that run before return different indizes. So if one wants to use them as keys to merge some of the referencing queries under each other, this wouldn’t work, because they don’t point to the same rows.

    This is probably due the fact that every query will trigger the update of previous/referenced queries. Thereby the referencing query that runs last will be in sync with the referenced query but not the previous ones. So this is probably not a bug, but it would be great if we would have an option to turn off multiples refreshes of referenced queries (or to be able to split these queries to different workbooks with the possibility to reference data in the data model or to export data to csv-files in order to freeze intermediate results – or any other method that would allow to reference more than 1,1 Mio datasets as intermediate results).

    Are there any plans that go in this direction? (actually, this highly rated Power BI-feature-request would do this job as well:  https://support.powerbi.com/forums/265200-power-bi/suggestions/8984308-ability-to-connect-excel-to-power-bi-data-model-an, or this: https://support.powerbi.com/forums/265200-power-bi/suggestions/9609906-export-powerbi-desktop-to-csv-excel


    Imke Feldmann TheBIccountant.com

    Saturday, November 7, 2015 12:43 PM

Answers

  • Hi Imke,

    You are right that ordering, if done before certain operations, may be lost. This is often because we can't enforce the order in the folded query or because when run locally it may constraint the operators in some ways that would make the operation inefficient or impossible to enforce. For example if you have two tables and you do an inner join, how are we to ensure order specified in both tables? If you see ordering change depending if the operation is folded or not and it takes you by surprise, it's likely that you've taken a dependency on ordering that happen to be preserved by coincidence of how the operations ran or how the algorithms were implemented.

    As a rule of thumbs, I would say, if order is not specified as the last step, don't expect it to stick, except in obvious cases such as selecting the top rows. SQL typically has a similar constraint in that ordering can only be specified in the outer SELECT.

    I'm not sure I understand the issue with index columns. Are you enforcing an order before applying the index column? If not, the rows might come back in a different order from the data source and the index might end up mapping to different rows. Also if the order changes after a refresh, it's likely that you haven't specified an order, in which case a refresh might result in a change of order. Even if we prevented previous queries from updating, you would still experience the problem on the next refresh or with other queries that build on them.

    Tristan

    Tuesday, November 10, 2015 10:27 PM

All replies

  • Hi Imke,

    You are right that ordering, if done before certain operations, may be lost. This is often because we can't enforce the order in the folded query or because when run locally it may constraint the operators in some ways that would make the operation inefficient or impossible to enforce. For example if you have two tables and you do an inner join, how are we to ensure order specified in both tables? If you see ordering change depending if the operation is folded or not and it takes you by surprise, it's likely that you've taken a dependency on ordering that happen to be preserved by coincidence of how the operations ran or how the algorithms were implemented.

    As a rule of thumbs, I would say, if order is not specified as the last step, don't expect it to stick, except in obvious cases such as selecting the top rows. SQL typically has a similar constraint in that ordering can only be specified in the outer SELECT.

    I'm not sure I understand the issue with index columns. Are you enforcing an order before applying the index column? If not, the rows might come back in a different order from the data source and the index might end up mapping to different rows. Also if the order changes after a refresh, it's likely that you haven't specified an order, in which case a refresh might result in a change of order. Even if we prevented previous queries from updating, you would still experience the problem on the next refresh or with other queries that build on them.

    Tristan

    Tuesday, November 10, 2015 10:27 PM
  • Hi Tristan,

    thank you very much for your answer.

    Yes, it dawned to me already, that a better understanding of the source I’m working on would be helpful here J

    Re the index-issue you’re right - I had no sorting applied. Luckily found a field in the source data that I could use instead.

    Still having performance issues with the “unbreakable refresh chain”. Using export to table and re-import as a workaround but this means one set of queries need to be refreshed before all others, which is not ideal.


    Imke Feldmann TheBIccountant.com


    Wednesday, November 11, 2015 4:35 AM