Answered by:
Random Number being changed to the same number for all rows

Question
-
hello all,
first of all I want to mention that this problem start happening after 2.14 instalation of Power Query.
I'm creating new column in my query and setting all values to random numbers. The problem is that next step (doesn't matter what it do itself) somehow changes all those random numbers to the same random number for all lines.
here's code sample:
....
InsertedCustom = Table.AddColumn(RemovedColumns3, "Random Number", each Number.Random()),
InsertedCustom2 = Table.AddColumn(InsertedCustom, "Analyst Full Name", each [Analyst First Name]&" "&[Analyst Last Name]),....
when I'm checking step by step, InsertedCustom creates new column and all values are being set randomly for all lines BUT when going to the next step (InsertedCustom2) all values in the column "Random Number" are being changed to the same number (no matter that InsertedCustom2 itself should not be doing anything to column named "Random Number") . I even tried moving InsertedCustom step to the bottom of my code but it didn't help and still facing this issue.
p.s. updated my PQ to 2.15 - didn't help
- Edited by MrMiegas Tuesday, September 2, 2014 1:47 PM
Tuesday, September 2, 2014 1:40 PM
Answers
-
A possible workaround might be to use List.Random with an index column e.g.
(1) Insert an index column (Add Column-->Insert Index Column)
(2) Add a custom column (Add Column-->Insert Custom Column) with the following formula:
= List.Random(Table.RowCount(<previous step name>)){[Index]}
The formula forces regeneration of the random numbers.
- Edited by Ehren - MSFTMicrosoft employee Monday, September 8, 2014 5:13 PM Missing paren
- Proposed as answer by Ehren - MSFTMicrosoft employee Monday, September 8, 2014 5:14 PM
- Marked as answer by Ehren - MSFTMicrosoft employee Tuesday, September 9, 2014 11:00 PM
Saturday, September 6, 2014 10:13 PM
All replies
-
This is a known issue. Power Query assumes that functions are idempotent (given the same arguments, they produce the same result), which isn't true for Number.Random. Our optimization pipeline is turning Number.Random() into a constant in the scenario you encountered, which results in all rows having the same value.
There's been some discussion of how to fix this, but that won't help you in the near term. Can you describe a bit more about your scenario and what you're trying to accomplish? Perhaps we can help you find a workaround.
Ehren
Friday, September 5, 2014 6:03 PM -
A possible workaround might be to use List.Random with an index column e.g.
(1) Insert an index column (Add Column-->Insert Index Column)
(2) Add a custom column (Add Column-->Insert Custom Column) with the following formula:
= List.Random(Table.RowCount(<previous step name>)){[Index]}
The formula forces regeneration of the random numbers.
- Edited by Ehren - MSFTMicrosoft employee Monday, September 8, 2014 5:13 PM Missing paren
- Proposed as answer by Ehren - MSFTMicrosoft employee Monday, September 8, 2014 5:14 PM
- Marked as answer by Ehren - MSFTMicrosoft employee Tuesday, September 9, 2014 11:00 PM
Saturday, September 6, 2014 10:13 PM -
thanks!!! it solved my problem. :)Tuesday, September 16, 2014 7:38 AM
-
Agree -- using List.Random is the right way to go - however - I'd simply insert '1' as the argument then expand the list -- this scales to +250k rows with not much delay.
"List.Random(Table.RowCount(<previous step name>)){[Index]}" previous example had trouble scaling.
Sample code:
AddedCustom1 = Table.AddColumn(GroupedRows, "Custom.1", each List.Random(1)),
ExpandedCustom.2 = Table.ExpandListColumn(AddedCustom1, "Custom.1"),
aabc
Friday, July 17, 2015 10:22 PM -
Following Ehren's workaround, I just found out a simplified version.
After you use Number.Random, add a step that creates the Index column (Add Column-->Insert Index Column).
Tuesday, March 8, 2016 6:16 PM -
An even simpler solution is to return the table to Excel and then add a randbetween column to that. If you want a new random sample every refresh then this will do that, if you want it fixed then you can take a paste values copy against a primary key and then merge that back into your main query result.
In my case I wanted a random whole number between I and 10 for each record generated. I guess the list method could be tweaked to do that but the method I used seems a lot simpler.
We all await a new function from MS to do this in PQ out of the box.
androo2351
- Edited by androo2351 Tuesday, February 21, 2017 10:52 AM
Tuesday, February 21, 2017 10:52 AM -
Edit: I removed my previous edit as it was incorrect.
My suggestion would be to use Number.RandomBetween in PQ and use some number column to enforce recalculation (just a neutral [SomeNumber]-[SomeNumber]). Note that Number.RandomBetween in PQ generates numbers with decimals, so in order to have whole numbers distributed evenly, you need to round random numbers between 0.5 and 10.5,
= Table.AddColumn(PreviousStep, "Random1to10", each Number.Round(Number.RandomBetween(0.5+[SomeNumber]-[SomeNumber],10.5),0))
- Edited by MarcelBeug Tuesday, February 21, 2017 11:57 AM
Tuesday, February 21, 2017 11:12 AM -
An even simpler solution is to return the table to Excel and then add a randbetween column to that. If you want a new random sample every refresh then this will do that, if you want it fixed then you can take a paste values copy against a primary key and then merge that back into your main query result.
In my case I wanted a random whole number between I and 10 for each record generated. I guess the list method could be tweaked to do that but the method I used seems a lot simpler.
We all await a new function from MS to do this in PQ out of the box.
androo2351
Excel is not a universal solution, since an Excel table is not the sole target for a PQ transformation. The result of the transformation can be loaded to the Data Model. If you're working in Power BI, there is no Excel to return anything to. Furthermore, if you have more than 1M rows of data, it won't fit into Excel.
The subject of using RandBetween belongs in a different thread - not this one.
- Edited by Colin Banfield Tuesday, February 21, 2017 2:15 PM
Tuesday, February 21, 2017 2:06 PM -
Well, there are very few universal solutions in life, but I think the Excel approach will work for most situations. Personally I've yet to come across a real dataset with more than, or for that matter, anything like, 1m rows.
androo2351
Wednesday, March 8, 2017 3:09 AM -
This didn't work for me :(
Tuesday, September 5, 2017 12:37 PM -
What worked for me is to buffer the step where the random number is generated like this:
let Source = #table({"A"}, List.Transform({1..10}, each {_})), #"Added Custom" = Table.Buffer(Table.AddColumn(Source, "RandomNumber", each Number.Random())), #"Changed Type" = Table.TransformColumnTypes(#"Added Custom",{{"RandomNumber", type number}}) in #"Changed Type"
Make sure that no further steps are involved in the formula that contains the "Number.Random", as this would trigger the transformation to unique values already.
Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com
Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!Monday, January 1, 2018 12:18 PM -
Hello,
Interesting post, but when I load the following into the Data Model, the random numbers become the same value, any suggestions?
let
Source = #table({"A"}, List.Transform({1..1000000}, each {_})),
#"Added Custom" = Table.Buffer(Table.AddColumn(Source, "RandomNumber", each Number.Round( Number.RandomBetween(1, 30))))
in
#"Added Custom"Wednesday, February 21, 2018 10:20 AM -
Hi wil1john,
First, note that in the query editor, the random numbers are generated even after removing Table.Buffer from #"Added Custom." In fact, Table.Buffer slows down the recalculation of the random numbers. There are two reasons why buffering makes no difference in this case:
1) The table is constructed literally, i.e. it is not coming from an external source. Therefore, the entire table is already "fixed" in memory, so buffering is not required.
2) Wrapping Table.Buffer after the random numbers are already generated (in the inner function) doesn't make sense. If the original source was external, then it would make sense to buffer the table in a step before adding the random number column.In any case, as you indicated, after loading to the data model, the random numbers from the query editor are all fixed to the same value. I am not sure why this is happening.
The technique that I described in a previous post works, but only to a point. Modifying the code to work in your case, we end up with:
let Source = #table({"A"}, List.Transform({1..10000}, each {_})), AddedIndex = Table.AddIndexColumn(Source, "Index", 0, 1), RandBetweenList = List.Transform(List.Random(Table.RowCount(Source)), each Number.Round(_ * (30-1) + 1)), AddedCustom = Table.AddColumn(AddedIndex, "RandomNumber", each RandBetweenList{[Index]}), RemovedIndex = Table.RemoveColumns(AddedCustom,{"Index"}) in RemovedIndex
The technique works up to 10,000 rows order of magnitude. Note that with just these many rows, you will get a lot of duplicated "random" numbers because there are only 30 values to allocate. Loading to the data model with 100,000 rows is very, very slow, and loading 1,000,000 rows is out of the question. I believe that the random numbers are being recalculated over and over as each row is loaded, which would explain the slow loading process.
The moral of the story is that you use random numbers in Power Query if you need to perform a calculation based on a random value, or if you need a random sample of your source data(sort on the random column and select the top n rows). Otherwise, you are better off creating a random number calculated column in the data model using DAX.
Wednesday, February 21, 2018 3:38 PM -
Hello Colin,
Thank you for taking the time to respond to the post and your thoughtful response. You raised some interesting points and I’ve learnt a great deal.
Wednesday, February 21, 2018 10:15 PM -
Many thanks, I was pulling my hair out. Greatly appreciated.
Mark
Friday, February 23, 2018 10:09 AM -
A bit like doing it in Excel, which more people will be familiar with, as I suggested then, but without the 1m rows limitation.
androo2351
- Edited by androo2351 Saturday, February 24, 2018 11:27 AM
Saturday, February 24, 2018 11:26 AM -
This doesn't work in Power BI in the query window. It gives the following error:
Expression.Error: There is an unknown identifier. Did you use the [field] shorthand for a _[field] outside of an 'each' expression?
I would be interested in how to fix it, but right now, I can use the RAND() function in DAX with the results from PQ to get random results per record.
Monday, March 19, 2018 11:45 PM -
What's the code that you are using in the query?Tuesday, March 20, 2018 12:48 AM