none
Question Regarding Summarizing a Field

    Question

  • Hello!

    I am very much a novice with SQL, I have currently been working with SQL Server(2008) for about 5 months now. Before December of last year I didn't even know what SELECT was, so I still have much to learn!

    I had a question specifically relating to summarizing one specific field for my result set from a SELECT statement. It's a very basic query where I'm joining on a few tables and have some conditions applied in the WHERE clause. Now, one of these tables is an, "Activity" table. So this table shows a timestamp for when each record was modified. Basically I want to show only the most recent date modified, not every date modified for my primary ID.

    I've tried a few things but I'm stumped. I've tried learning ROLLUP and CUBE but I'm not sure if it's applicable here, I was playing around with it but I don't quite understand it. It seems as if none of my fields with ROLLUP were being summarized at all, there were still duplicates for the specified field, so I know I'm not understanding it correctly.

    So to clarify for this specific inquiry, I want only want one record returned from the "Activity" table for where it's joined on my other tables(so one record per column in the relevant table). Thank you so much for your time, maybe in due time I'll become as proficient as you folks!

    Friday, April 25, 2014 3:37 AM

Answers

  • As I understand what you need to do is something like below

    SELECT Columns...
    FROM Othertables t
    ...
    INNER JOIN (SELECT *,ROW_NUMBER() OVER (PARTITION BY JoiningCol1,JoiningCol2,.. ORDER BY Timestamp DESC) AS Seq
    FROM Activity
    )a
    ON a.JoiningCol1 = t.RelatedCol
    AND a.JoiningCol2 = ...
    AND ...
    AND a.Seq=1

    In above code JoiningCol1,JoiningCol2,... etc represents the columns in Activity table on which you need to join.


    Please Mark This As Answer if it helps to solve the issue Visakh ---------------------------- http://visakhm.blogspot.com/ https://www.facebook.com/VmBlogs

    Friday, April 25, 2014 6:02 AM

All replies

  • Looking at your requirement, we may not require using ROLLUP and CUBE as those are used in SQL SERVER analysis services. However we can achieve the summarization here as well.

    We can use simple join statement in TSQL to aggregate or summarize information. Refer below example,

    SELECT LOGIN_NAME, ACTIVITY_ID, MAX(ACTIVITY_DATE) FROM

    LOGIN L INNER JOIN ACTIVITY A ON L.LOGIN_ID=A.ACTIVITY_LOGIN_ID

    WHERE L.LOGIN_NAME = 'ABC'

    GROUP BY L.LOGIN_NAME, A.ACTIVITY_ID

    The above query will simply return those login users, associated activities and the latest activity update date.

    If you can provide table definition as well your expected output then we may help you better.


    Regards, RSingh

    Friday, April 25, 2014 5:15 AM
  • As I understand what you need to do is something like below

    SELECT Columns...
    FROM Othertables t
    ...
    INNER JOIN (SELECT *,ROW_NUMBER() OVER (PARTITION BY JoiningCol1,JoiningCol2,.. ORDER BY Timestamp DESC) AS Seq
    FROM Activity
    )a
    ON a.JoiningCol1 = t.RelatedCol
    AND a.JoiningCol2 = ...
    AND ...
    AND a.Seq=1

    In above code JoiningCol1,JoiningCol2,... etc represents the columns in Activity table on which you need to join.


    Please Mark This As Answer if it helps to solve the issue Visakh ---------------------------- http://visakhm.blogspot.com/ https://www.facebook.com/VmBlogs

    Friday, April 25, 2014 6:02 AM
  • Thank you both very much for providing useful information, I really appreciate it.. I believe this query will actually perform the action I'm trying to execute! However I am a little bit concerned that a literal representation from your recommend query wouldn't work. I think that's because I'm not sure if I fully understand ROW_NUMBER() entirely, and I didn't elaborate very clearly my definiton. My question is:

    If I use ROW_NUMBER() as you suggest, would that mean every record in the entire table would be labeled with a unique value(1,2,3,4), or would it actually label each a.JoiningCol1 = t.RelatedCol column with a unique value(1,2,1,2,3,1,2)? I hope that's understandable, for my purpose I would need to display the last logged activity for each joined record. When I am of able to I will try your recommendation out, thank you again for your time.


    Saturday, April 26, 2014 3:38 AM
  • If I use ROW_NUMBER() as you suggest, would that mean every record in the entire table would be labeled with a unique value(1,2,3,4), or would it actually label each a.JoiningCol1 = t.RelatedCol column with a unique value(1,2,1,2,3,1,2)? I hope that's understandable, for my purpose I would need to display the last logged activity for each joined record. When I am of able to I will try your recommendation out, thank you again for your time.


    If you don't use any partition with row_number() then it is likely to generate unique values(1,2,3,4). However if you are using partition on field/fields then the ordering will be within partitions i.e each partition will have unique ids base on the ordering of the field/fields. If you are ordering by activity_date asc/desc the unique id will be appear accordingly. Hope this clarify.

    Regards, RSingh

    Saturday, April 26, 2014 4:39 AM
  • Thank you both very much for providing useful information, I really appreciate it.. I believe this query will actually perform the action I'm trying to execute! However I am a little bit concerned that a literal representation from your recommend query wouldn't work. I think that's because I'm not sure if I fully understand ROW_NUMBER() entirely, and I didn't elaborate very clearly my definiton. My question is:

    If I use ROW_NUMBER() as you suggest, would that mean every record in the entire table would be labeled with a unique value(1,2,3,4), or would it actually label each a.JoiningCol1 = t.RelatedCol column with a unique value(1,2,1,2,3,1,2)? I hope that's understandable, for my purpose I would need to display the last logged activity for each joined record. When I am of able to I will try your recommendation out, thank you again for your time.


    See the way I've used ROW_NUMBER function. It has a PARTITION BY section which will form vertical groups of data within the table based on Join column values. Then within each group it nubers records sequentially based on descending order of timestamp value. And the filter Seq=1 will ensure we take only latest record from each group which is exactly what you were asking for.

    Please Mark This As Answer if it helps to solve the issue Visakh ---------------------------- http://visakhm.blogspot.com/ https://www.facebook.com/VmBlogs

    Saturday, April 26, 2014 4:57 AM
  • Thank you both again for further clarification, you guys really clarified my inquiry. The logic is sound, this is exactly what I need. Again, thank you both and I hope when I ask another question in the future I get feedback as receptive as you both.
    Saturday, April 26, 2014 9:29 PM