locked
Performance while grouping data RRS feed

  • Question

  • Hi there!

    I have big orders table (1M+) in such format:

    o_id, c_id, o_date, host

    I want to group table by c_id, get max o_date and based on o_date get corresponding host.

    I tried 2 approaches:

    (A) create query c_id, max_o_date and then join with original order table

    (B) group by c_id, max_o_date, all_rows and then expand and filter by max_o_date

    1. Is there any other (realy fast) solution?

    2. I use custom SQL-query. Should I use TableBuffer and if so, where to put it?



    Thursday, May 14, 2015 7:32 PM

Answers

  • I would recommend starting by using the UI to do operations (rather than specifying custom SQL). If it is slow, then you could try some hand-optimized SQL. But in many cases it will be just as fast as hand-written SQL, and you'll get the added benefit that it will be easy to modify via the UI.

    Table.Buffer is useful in situations where you don't want folding to take place. For example, if you know that it would be faster to read a given source into memory at a given point and do subsequent operations on that in-memory copy instead of folding back to the original source, you could insert a step to buffer the previous results.

    Hope that helps.

    Ehren

    • Proposed as answer by Michael Amadi Wednesday, May 27, 2015 11:59 AM
    • Marked as answer by Michael Amadi Thursday, June 4, 2015 11:14 AM
    Tuesday, May 26, 2015 5:25 PM

All replies

  • Hi Paul. Are you using Native SQL queries, or just connecting/merging/grouping via the PQ UI?

    Ehren


    Monday, May 18, 2015 9:42 PM
  • Hi Ehren,

    Thanks for reply.

    I am using Native SQL quries and that's why Query Folding does not take place.

    I know that in PQ a lot of things can be done in a different ways.

    So I hope there is some guideline about which ways (preffereing dedicated functions, caching, etc) can give me maximum performance.

    -- Paul

    Tuesday, May 19, 2015 6:17 AM
  • I would recommend starting by using the UI to do operations (rather than specifying custom SQL). If it is slow, then you could try some hand-optimized SQL. But in many cases it will be just as fast as hand-written SQL, and you'll get the added benefit that it will be easy to modify via the UI.

    Table.Buffer is useful in situations where you don't want folding to take place. For example, if you know that it would be faster to read a given source into memory at a given point and do subsequent operations on that in-memory copy instead of folding back to the original source, you could insert a step to buffer the previous results.

    Hope that helps.

    Ehren

    • Proposed as answer by Michael Amadi Wednesday, May 27, 2015 11:59 AM
    • Marked as answer by Michael Amadi Thursday, June 4, 2015 11:14 AM
    Tuesday, May 26, 2015 5:25 PM