none
AVG function in a period based on condition RRS feed

  • Question

  • Hi, i have a table like this:

    TIMESTAMP FLOW
    18/07/2019 100
    17/07/2019 110
    16/07/2019 120
    15/07/2019 120
    14/07/2019 110
    13/07/2019 13
    12/07/2019 120
    11/07/2019 120
    10/07/2019 100
    09/07/2019 12
    08/07/2019 10
    07/07/2019 100
    06/07/2019 120
    05/07/2019 3
    04/07/2019 5
    03/07/2019 6
    02/07/2019 4
    01/07/2019 3
    30/06/2019 300
    29/06/2019 200
    28/06/2019 200


    I need to do the average of the flow between a the period from TODAY() and another one in the past that follow this rule: FLOW<=20 FOR a period>=5gg. In the table above it would be between 18/07/2019 and  05/07/2019.

    How can i write an SQL statement that can do that calculation for me?

    Thanks a lot for your help

    Thursday, July 18, 2019 1:05 PM

Answers

  • Could this be what you are looking for? The INSERT statement is only there to incldue the test data. The first CTE finds all days which is the last of five consecutive days with flow < 20. The next CTE picks the most recent of these days.

    The solution assumes that you have SQL 2012 or later.

    SET DATEFORMAT dmy
    CREATE TABLE tbl (dt date NOT NULL PRIMARY KEY, flow int NOT NULL)
    INSERT tbl (dt, flow) VALUES
    ('18/07/2019',    100  ),
    ('17/07/2019',    110  ),
    ('16/07/2019',    120  ),
    ('15/07/2019',    120  ),
    ('14/07/2019',    110  ),
    ('13/07/2019',    13   ),
    ('12/07/2019',    120  ),
    ('11/07/2019',    120  ),
    ('10/07/2019',    100  ),
    ('09/07/2019',    12   ),
    ('08/07/2019',    10   ),
    ('07/07/2019',    100  ),
    ('06/07/2019',    120  ),
    ('05/07/2019',    3    ),
    ('04/07/2019',    5    ),
    ('03/07/2019',    6    ),
    ('02/07/2019',    4    ),
    ('01/07/2019',    3    ),
    ('30/06/2019',    300  ),
    ('29/06/2019',    200  ),
    ('28/06/2019',    200  )
    go
    SELECT * FROM tbl
    go
    ; WITH fivedaysmax AS (
       SELECT dt, flow, MAX(flow) OVER (ORDER BY dt
                                        ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS fivedaysmax
       FROM   tbl
    ), mostrecentmax AS (
       SELECT TOP 1 (dt)
       FROM   fivedaysmax
       ORDER  BY fivedaysmax
    )
    SELECT avg(flow)
    FROM   tbl
    WHERE  dt BETWEEN (SELECT dt FROM mostrecentmax) AND convert(date, sysdatetime()) go
    DROP TABLE tbl


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se


    Monday, July 22, 2019 12:37 PM
    Moderator

All replies

  • SELECT AVG(Flow) FROM tbl WHERE TIMESTAMP BETWEEN '20190705' AND CAST(GETDATE() AS DATE)

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, July 18, 2019 1:10 PM
    Moderator
  • Thanks for your reply. The problem here is that 20190705 is not given. The table can change over time and the only method to discover the date is apply a condition where flow is <20 for at least a period of 5 days.

    I found a solution for a similar case where flow is a speed and the problem was "id, timestamp and numeric variable (speed). I need to identify time periods (start and end timestamps) where the average value of the speed is less than a threshold (say 2), but where the time period (end timestamp - start timestamp) is at least a minimum duration (say 5 hours or more). "

    id datetime velocity
    203 18/07/2019  02:51:00 13,8990
    203 17/07/2019  02:51:01 13,8990
    203 16/07/2019  02:51:02 13,5000
    203 15/07/2019  02:51:03 13,6990
    203 18/07/2019  02:51:04 13,6990

    #standardSQL
    CREATE TEMPORARY FUNCTION IdentifyTimeRanges(
      items ARRAY<STRUCT<ts INT64, speed FLOAT64, datetime TIMESTAMP>>, 
      min_length INT64, threshold FLOAT64, max_speed FLOAT64
    )
    RETURNS ARRAY<STRUCT<start_event TIMESTAMP, end_event TIMESTAMP, average_speed FLOAT64, duration_hrs FLOAT64>>
    LANGUAGE js AS """
      var result = [];
      var initial = 0;
      var candidate = items[initial].ts;
      var len = 0;
      var sum = 0;
      for (i = 0; i < items.length; i++) {
        len++;
        sum += items[i].speed
    
        if (items[i].ts - candidate < min_length) {
          if (items[i].speed > max_speed) {
            initial = i + 1;
            candidate = items[initial].ts;
            len = 0;
            sum = 0;
          }     
          continue;
        }
    
        if (sum / len > threshold || items[i].speed > max_speed) {
          avg_speed = (sum - items[i].speed) / (len - 1);
          if (avg_speed <= threshold && items[i - 1].ts - items[initial].ts >= min_length) {
            var o = [];
            o.start_event = items[initial].datetime;
            o.average_speed = avg_speed.toFixed(3);
            o.end_event = items[i - 1].datetime;
            o.duration_hrs = ((items[i - 1].ts - items[initial].ts)/60/60).toFixed(3)
            result.push(o)
          }
          initial = i;
          candidate = items[initial].ts;
          len = 1;
          sum = items[initial].speed;
        }
    
      };
    
      return result;
    """;
    
    WITH data AS (
      SELECT id, PARSE_TIMESTAMP('%m/%d/%y %H:%M', datetime) AS datetime, speed
      FROM `yourTable`
    ), compact_data AS (
      SELECT id, ARRAY_AGG(STRUCT<ts INT64, speed FLOAT64, datetime TIMESTAMP>(UNIX_SECONDS(datetime), speed, datetime) ORDER BY UNIX_SECONDS(datetime)) AS points
      FROM data
      GROUP BY id
    )
    SELECT 
      id, start_event, end_event, average_speed, duration_hrs
    FROM compact_data, UNNEST(IdentifyTimeRanges(points, 5*60*60, 2, 3.1)) AS segment
    ORDER BY id, start_even
    I need a similar solution for my case...


    • Edited by Marvolo93 Thursday, July 18, 2019 1:41 PM
    Thursday, July 18, 2019 1:40 PM
  • FLOW<=20

    It also returns min date 20190701 and not 20190705, trying to understand the logic 


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, July 18, 2019 1:49 PM
    Moderator
  • I need to do the average of the flow between a the period from TODAY() and another one in the past that follow this rule: FLOW<=20 FOR a period>=5gg. In the table above it would be between 18/07/2019 and  05/07/2019.

    5gg? Colour me stupid, but what does that mean?


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Friday, July 19, 2019 9:25 PM
    Moderator
  • Sorry, my bad.

    the condition that must be respected is that the flow is less than 20 for at least five days in a row. when this condition is verified I want to take the most recent date.

    In this case i have that flow in less than 20 from 1/7/2019 to 05/07/2019, so i take the most recent date:05/07/2019.

    Monday, July 22, 2019 9:22 AM
  • Could this be what you are looking for? The INSERT statement is only there to incldue the test data. The first CTE finds all days which is the last of five consecutive days with flow < 20. The next CTE picks the most recent of these days.

    The solution assumes that you have SQL 2012 or later.

    SET DATEFORMAT dmy
    CREATE TABLE tbl (dt date NOT NULL PRIMARY KEY, flow int NOT NULL)
    INSERT tbl (dt, flow) VALUES
    ('18/07/2019',    100  ),
    ('17/07/2019',    110  ),
    ('16/07/2019',    120  ),
    ('15/07/2019',    120  ),
    ('14/07/2019',    110  ),
    ('13/07/2019',    13   ),
    ('12/07/2019',    120  ),
    ('11/07/2019',    120  ),
    ('10/07/2019',    100  ),
    ('09/07/2019',    12   ),
    ('08/07/2019',    10   ),
    ('07/07/2019',    100  ),
    ('06/07/2019',    120  ),
    ('05/07/2019',    3    ),
    ('04/07/2019',    5    ),
    ('03/07/2019',    6    ),
    ('02/07/2019',    4    ),
    ('01/07/2019',    3    ),
    ('30/06/2019',    300  ),
    ('29/06/2019',    200  ),
    ('28/06/2019',    200  )
    go
    SELECT * FROM tbl
    go
    ; WITH fivedaysmax AS (
       SELECT dt, flow, MAX(flow) OVER (ORDER BY dt
                                        ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS fivedaysmax
       FROM   tbl
    ), mostrecentmax AS (
       SELECT TOP 1 (dt)
       FROM   fivedaysmax
       ORDER  BY fivedaysmax
    )
    SELECT avg(flow)
    FROM   tbl
    WHERE  dt BETWEEN (SELECT dt FROM mostrecentmax) AND convert(date, sysdatetime()) go
    DROP TABLE tbl


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se


    Monday, July 22, 2019 12:37 PM
    Moderator
  • Could this be what you are looking for? The INSERT statement is only there to incldue the test data. The first CTE finds all days which is the last of five consecutive days with flow < 20. The next CTE picks the most recent of these days.

    The solution assumes that you have SQL 2012 or later.

    SET DATEFORMAT dmy
    CREATE TABLE tbl (dt date NOT NULL PRIMARY KEY, flow int NOT NULL)
    INSERT tbl (dt, flow) VALUES
    ('18/07/2019',    100  ),
    ('17/07/2019',    110  ),
    ('16/07/2019',    120  ),
    ('15/07/2019',    120  ),
    ('14/07/2019',    110  ),
    ('13/07/2019',    13   ),
    ('12/07/2019',    120  ),
    ('11/07/2019',    120  ),
    ('10/07/2019',    100  ),
    ('09/07/2019',    12   ),
    ('08/07/2019',    10   ),
    ('07/07/2019',    100  ),
    ('06/07/2019',    120  ),
    ('05/07/2019',    3    ),
    ('04/07/2019',    5    ),
    ('03/07/2019',    6    ),
    ('02/07/2019',    4    ),
    ('01/07/2019',    3    ),
    ('30/06/2019',    300  ),
    ('29/06/2019',    200  ),
    ('28/06/2019',    200  )
    go
    SELECT * FROM tbl
    go
    ; WITH fivedaysmax AS (
       SELECT dt, flow, MAX(flow) OVER (ORDER BY dt
                                        ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS fivedaysmax
       FROM   tbl
    ), mostrecentmax AS (
       SELECT TOP 1 (dt)
       FROM   fivedaysmax
       ORDER  BY fivedaysmax
    )
    SELECT avg(flow)
    FROM   tbl
    WHERE  dt BETWEEN (SELECT dt FROM mostrecentmax) AND convert(date, sysdatetime()) go
    DROP TABLE tbl


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se


    Thank you so much!

    It works! 

    Monday, July 22, 2019 2:37 PM