# AVG function in a period based on condition

• ### Question

• Hi, i have a table like this:

 TIMESTAMP FLOW 18/07/2019 100 17/07/2019 110 16/07/2019 120 15/07/2019 120 14/07/2019 110 13/07/2019 13 12/07/2019 120 11/07/2019 120 10/07/2019 100 09/07/2019 12 08/07/2019 10 07/07/2019 100 06/07/2019 120 05/07/2019 3 04/07/2019 5 03/07/2019 6 02/07/2019 4 01/07/2019 3 30/06/2019 300 29/06/2019 200 28/06/2019 200

I need to do the average of the flow between a the period from TODAY() and another one in the past that follow this rule: FLOW<=20 FOR a period>=5gg. In the table above it would be between 18/07/2019 and  05/07/2019.

How can i write an SQL statement that can do that calculation for me?

Thanks a lot for your help

Thursday, July 18, 2019 1:05 PM

• Could this be what you are looking for? The INSERT statement is only there to incldue the test data. The first CTE finds all days which is the last of five consecutive days with flow < 20. The next CTE picks the most recent of these days.

The solution assumes that you have SQL 2012 or later.

SET DATEFORMAT dmy
CREATE TABLE tbl (dt date NOT NULL PRIMARY KEY, flow int NOT NULL)
INSERT tbl (dt, flow) VALUES
('18/07/2019',    100  ),
('17/07/2019',    110  ),
('16/07/2019',    120  ),
('15/07/2019',    120  ),
('14/07/2019',    110  ),
('13/07/2019',    13   ),
('12/07/2019',    120  ),
('11/07/2019',    120  ),
('10/07/2019',    100  ),
('09/07/2019',    12   ),
('08/07/2019',    10   ),
('07/07/2019',    100  ),
('06/07/2019',    120  ),
('05/07/2019',    3    ),
('04/07/2019',    5    ),
('03/07/2019',    6    ),
('02/07/2019',    4    ),
('01/07/2019',    3    ),
('30/06/2019',    300  ),
('29/06/2019',    200  ),
('28/06/2019',    200  )
go
SELECT * FROM tbl
go
; WITH fivedaysmax AS (
SELECT dt, flow, MAX(flow) OVER (ORDER BY dt
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS fivedaysmax
FROM   tbl
), mostrecentmax AS (
SELECT TOP 1 (dt)
FROM   fivedaysmax
ORDER  BY fivedaysmax
)
SELECT avg(flow)
FROM   tbl
WHERE  dt BETWEEN (SELECT dt FROM mostrecentmax) AND convert(date, sysdatetime()) go
DROP TABLE tbl

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Monday, July 22, 2019 12:37 PM

### All replies

• SELECT AVG(Flow) FROM tbl WHERE TIMESTAMP BETWEEN '20190705' AND CAST(GETDATE() AS DATE)

Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

MS SQL optimization: MS SQL Development and Optimization
MS SQL Consulting: Large scale of database and data cleansing
Remote DBA Services: Improves MS SQL Database Performance
SQL Server Integration Services: Business Intelligence

Thursday, July 18, 2019 1:10 PM
• Thanks for your reply. The problem here is that 20190705 is not given. The table can change over time and the only method to discover the date is apply a condition where flow is <20 for at least a period of 5 days.

I found a solution for a similar case where flow is a speed and the problem was "id, timestamp and numeric variable (speed). I need to identify time periods (start and end timestamps) where the average value of the speed is less than a threshold (say 2), but where the time period (end timestamp - start timestamp) is at least a minimum duration (say 5 hours or more). "

 id datetime velocity 203 18/07/2019  02:51:00 13,8990 203 17/07/2019  02:51:01 13,8990 203 16/07/2019  02:51:02 13,5000 203 15/07/2019  02:51:03 13,6990 203 18/07/2019  02:51:04 13,6990

```#standardSQL
CREATE TEMPORARY FUNCTION IdentifyTimeRanges(
items ARRAY<STRUCT<ts INT64, speed FLOAT64, datetime TIMESTAMP>>,
min_length INT64, threshold FLOAT64, max_speed FLOAT64
)
RETURNS ARRAY<STRUCT<start_event TIMESTAMP, end_event TIMESTAMP, average_speed FLOAT64, duration_hrs FLOAT64>>
LANGUAGE js AS """
var result = [];
var initial = 0;
var candidate = items[initial].ts;
var len = 0;
var sum = 0;
for (i = 0; i < items.length; i++) {
len++;
sum += items[i].speed

if (items[i].ts - candidate < min_length) {
if (items[i].speed > max_speed) {
initial = i + 1;
candidate = items[initial].ts;
len = 0;
sum = 0;
}
continue;
}

if (sum / len > threshold || items[i].speed > max_speed) {
avg_speed = (sum - items[i].speed) / (len - 1);
if (avg_speed <= threshold && items[i - 1].ts - items[initial].ts >= min_length) {
var o = [];
o.start_event = items[initial].datetime;
o.average_speed = avg_speed.toFixed(3);
o.end_event = items[i - 1].datetime;
o.duration_hrs = ((items[i - 1].ts - items[initial].ts)/60/60).toFixed(3)
result.push(o)
}
initial = i;
candidate = items[initial].ts;
len = 1;
sum = items[initial].speed;
}

};

return result;
""";

WITH data AS (
SELECT id, PARSE_TIMESTAMP('%m/%d/%y %H:%M', datetime) AS datetime, speed
FROM `yourTable`
), compact_data AS (
SELECT id, ARRAY_AGG(STRUCT<ts INT64, speed FLOAT64, datetime TIMESTAMP>(UNIX_SECONDS(datetime), speed, datetime) ORDER BY UNIX_SECONDS(datetime)) AS points
FROM data
GROUP BY id
)
SELECT
id, start_event, end_event, average_speed, duration_hrs
FROM compact_data, UNNEST(IdentifyTimeRanges(points, 5*60*60, 2, 3.1)) AS segment
ORDER BY id, start_even```
I need a similar solution for my case...

• Edited by Thursday, July 18, 2019 1:41 PM
Thursday, July 18, 2019 1:40 PM
• FLOW<=20

It also returns min date 20190701 and not 20190705, trying to understand the logic

Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

MS SQL optimization: MS SQL Development and Optimization
MS SQL Consulting: Large scale of database and data cleansing
Remote DBA Services: Improves MS SQL Database Performance
SQL Server Integration Services: Business Intelligence

Thursday, July 18, 2019 1:49 PM
• I need to do the average of the flow between a the period from TODAY() and another one in the past that follow this rule: FLOW<=20 FOR a period>=5gg. In the table above it would be between 18/07/2019 and  05/07/2019.

5gg? Colour me stupid, but what does that mean?

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Friday, July 19, 2019 9:25 PM

the condition that must be respected is that the flow is less than 20 for at least five days in a row. when this condition is verified I want to take the most recent date.

In this case i have that flow in less than 20 from 1/7/2019 to 05/07/2019, so i take the most recent date:05/07/2019.

Monday, July 22, 2019 9:22 AM
• Could this be what you are looking for? The INSERT statement is only there to incldue the test data. The first CTE finds all days which is the last of five consecutive days with flow < 20. The next CTE picks the most recent of these days.

The solution assumes that you have SQL 2012 or later.

SET DATEFORMAT dmy
CREATE TABLE tbl (dt date NOT NULL PRIMARY KEY, flow int NOT NULL)
INSERT tbl (dt, flow) VALUES
('18/07/2019',    100  ),
('17/07/2019',    110  ),
('16/07/2019',    120  ),
('15/07/2019',    120  ),
('14/07/2019',    110  ),
('13/07/2019',    13   ),
('12/07/2019',    120  ),
('11/07/2019',    120  ),
('10/07/2019',    100  ),
('09/07/2019',    12   ),
('08/07/2019',    10   ),
('07/07/2019',    100  ),
('06/07/2019',    120  ),
('05/07/2019',    3    ),
('04/07/2019',    5    ),
('03/07/2019',    6    ),
('02/07/2019',    4    ),
('01/07/2019',    3    ),
('30/06/2019',    300  ),
('29/06/2019',    200  ),
('28/06/2019',    200  )
go
SELECT * FROM tbl
go
; WITH fivedaysmax AS (
SELECT dt, flow, MAX(flow) OVER (ORDER BY dt
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS fivedaysmax
FROM   tbl
), mostrecentmax AS (
SELECT TOP 1 (dt)
FROM   fivedaysmax
ORDER  BY fivedaysmax
)
SELECT avg(flow)
FROM   tbl
WHERE  dt BETWEEN (SELECT dt FROM mostrecentmax) AND convert(date, sysdatetime()) go
DROP TABLE tbl

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Monday, July 22, 2019 12:37 PM
• Could this be what you are looking for? The INSERT statement is only there to incldue the test data. The first CTE finds all days which is the last of five consecutive days with flow < 20. The next CTE picks the most recent of these days.

The solution assumes that you have SQL 2012 or later.

SET DATEFORMAT dmy
CREATE TABLE tbl (dt date NOT NULL PRIMARY KEY, flow int NOT NULL)
INSERT tbl (dt, flow) VALUES
('18/07/2019',    100  ),
('17/07/2019',    110  ),
('16/07/2019',    120  ),
('15/07/2019',    120  ),
('14/07/2019',    110  ),
('13/07/2019',    13   ),
('12/07/2019',    120  ),
('11/07/2019',    120  ),
('10/07/2019',    100  ),
('09/07/2019',    12   ),
('08/07/2019',    10   ),
('07/07/2019',    100  ),
('06/07/2019',    120  ),
('05/07/2019',    3    ),
('04/07/2019',    5    ),
('03/07/2019',    6    ),
('02/07/2019',    4    ),
('01/07/2019',    3    ),
('30/06/2019',    300  ),
('29/06/2019',    200  ),
('28/06/2019',    200  )
go
SELECT * FROM tbl
go
; WITH fivedaysmax AS (
SELECT dt, flow, MAX(flow) OVER (ORDER BY dt
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS fivedaysmax
FROM   tbl
), mostrecentmax AS (
SELECT TOP 1 (dt)
FROM   fivedaysmax
ORDER  BY fivedaysmax
)
SELECT avg(flow)
FROM   tbl
WHERE  dt BETWEEN (SELECT dt FROM mostrecentmax) AND convert(date, sysdatetime()) go
DROP TABLE tbl

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Thank you so much!

It works!

Monday, July 22, 2019 2:37 PM