Introduction

Let's face it, our data is not always as clean as we would like it to be. There are many cases where we would like to transform our data into something more friendly to end users or standardize known data issues. For example, a customer might be referenced in our data as:

  • Customer Name
  • Customer Name, Inc
  • Cust

Something like a VLOOKUP function would come in handy to transform such data. This article provides several different methods of recreating VLOOKUP functionality with Power BI. 

Solution Setup

To demonstrate lookups in Power BI, we will use US state names and abbreviations. To follow along with the example, create two comma-separated value (CSV) files such as:

StatePopulations.csv

Rank,State Name,Population
1,California,"37,253,956"
2,Texas,"25,145,561"
3,New York,"19,378,102"
4,Florida,"18,801,310"
5,Illinois,"12,830,632"
6,Pennsylvania,"12,702,379"
7,Ohio,"11,536,504"
8,Michigan,"9,883,640"
9,Georgia,"9,687,653"
10,North Carolina,"9,535,483"
11,New Jersey,"8,791,894"
12,Virginia,"8,001,024"
13,Washington,"6,724,540"
14,Massachusetts,"6,547,629"
15,Indiana,"6,483,802"
16,Arizona,"6,392,017"
17,Tennessee,"6,346,105"
18,Missouri,"5,988,927"
19,Maryland,"5,773,552"
20,Wisconsin,"5,686,986"
21,Minnesota,"5,303,925"
22,Colorado,"5,029,196"
23,Alabama,"4,779,736"
24,South Carolina,"4,625,364"
25,Louisiana,"4,533,372"
26,Kentucky,"4,339,367"
27,Oregon,"3,831,074"
28,Oklahoma,"3,751,351"
29,Connecticut,"3,574,097"
30,Iowa,"3,046,355"
31,Mississippi,"2,967,297"
32,Arkansas,"2,915,918"
33,Kansas,"2,853,118"
34,Utah,"2,763,885"
35,Nevada,"2,700,551"
36,New Mexico,"2,059,179"
37,West Virginia,"1,852,994"
38,Nebraska,"1,826,341"
39,Idaho,"1,567,582"
40,Hawaii,"1,360,301"
41,Maine,"1,328,361"
42,New Hampshire,"1,316,470"
43,Rhode Island,"1,052,567"
44,Montana,"989,415"
45,Delaware,"897,934"
46,South Dakota,"814,180"
47,Alaska,"710,231"
48,North Dakota,"672,591"
49,Vermont,"625,741"
50,"Washington, D. C.","601,723"
51,Wyoming,"563,626"

StateAbbreviations.csv

State Name,State
ALABAMA,AL
ALASKA,AK
ARIZONA,AZ
ARKANSAS,AR
CALIFORNIA,CA
COLORADO,CO
CONNECTICUT,CT
DELAWARE,DE
FLORIDA,FL
GEORGIA,GA
HAWAII,HI
IDAHO,ID
ILLINOIS,IL
INDIANA,IN
IOWA,IA
KANSAS,KS
KENTUCKY,KY
LOUISIANA,LA
MAINE,ME
MARYLAND,MD
MASSACHUSETTS,MA
MICHIGAN,MI
MINNESOTA,MN
MISSISSIPPI,MS
MISSOURI,MO
MONTANA,MT
NEBRASKA,NE
NEVADA,NV
NEW HAMPSHIRE,NH
NEW JERSEY,NJ
NEW MEXICO,NM
NEW YORK,NY
NORTH CAROLINA,NC
NORTH DAKOTA,ND
OHIO,OH
OKLAHOMA,OK
OREGON,OR
PENNSYLVANIA,PA
RHODE ISLAND,RI
SOUTH CAROLINA,SC
SOUTH DAKOTA,SD
TENNESSEE,TN
TEXAS,TX
UTAH,UT
VERMONT,VT
VIRGINIA,VA
WASHINGTON,WA
WEST VIRGINIA,WV
WISCONSIN,WI
WYOMING,WY

Create a query to import each of these as a table into your Power BI Desktop data model making sure to promote the first row as headers in each query.

No Code

The simplest way to solve our lookup issue doesn't involve any code but simply uses the relational capabilities of Power BI Desktop. Once both of these CSV files are imported as tables into the data model, we simply use the relationships tab to relate the two "State Name" columns to one another.

vlookup1.png

We can create any visualization we wish and substitute the "State" column from "StateAbbreviations" instead of the "State Name" from "StatePopulations".

vlookup2.png

First, one should note that when you related the two tables, Power BI figured out the relationships between the "State Name" columns even though they had different capitalization. Cool. However, we immediately see a problem. The first column in our visualization is labeled "(Blank)". This is because our population table includes "Washington DC" but our abbreviations table does not. We could fix this by putting in a lookup, but if you are using this for an "exception" list, you may only have a small number of replacement values compared to "good" values. Bottom line, you really do not have much control over the lookups when using relationships. Additional problems include.

First, if a lookup is inadvertently duplicated in StateAbbreviations, the entire thing breaks and it is error city. This is not a problem for our simple example, but imagine if there are a couple thousand replacement records and it could easily happen, especially if you do not control the data source. Could it be fixed by adding a "remove duplicates" step to your query. Sure, but what if you sometimes need those other replacement values? Bottom line, when you have "fuzzy" matching scenarios, using only relationships is not going to get the job done. 

Second, you effectively need to import all of the data into the model (assuming you have a data source that does not support DirectQuery). If your lookup table is millions of rows, this can get unwieldy and increase the overall size of your Desktop file tremendously.

DAX

We can build upon our relationship defined above by using DAX and the RELATED function in column formulas to gain more control.

In StatePopulations table, create a new custom column with the following formula:

State = RELATED(StateAbbreviations[State])

This provides the same result as our report visualization but now we have it in our data model in our fact table, which is useful for things like Sort By.

This also provides us the ability to wrap some logic around this lookup. For example, we can fix "Blank" values by returning the original lookup value

State2 = IF(COUNTROWS(RELATEDTABLE(StateAbbreviations)) = 0,[State Name],RELATED(StateAbbreviations[State]))

This returns the original "State Name" instead of blank for values like Washington DC.

If we have some known variations, we can use a formula like this one:

State3 = IF([State Name] = "Washington, D. C.","DC",RELATED(StateAbbreviations[State]))

This version provides a specific translation but uses the state abbreviations for everything else. 

We can even get crazy and combine these:

State4 = IF([State Name] = "Washington, D. C.","DC",IF(COUNTROWS(RELATEDTABLE(StateAbbreviations)) = 0,[State Name],RELATED(StateAbbreviations[State])))

This provides a specific translation for "Washington DC", failing that determines if an appropriate replacement exists and if not uses the original value. Whew!!

You can add as much logic as necessary but will get messy quickly.

You could create an alternate lookup table an "Enter Data" query.

One the Home tab of Power BI Desktop, click the Enter Data button and create the following table:

vlookup3.png

 

Now we can create a secondary relationship from StatePopulations to this table like so:vlookup4.png

 

Create a column:

State5 = IF(COUNTROWS(RELATEDTABLE(StateAbbreviations)) = 0,IF(COUNTROWS(RELATEDTABLE(AlternateLookup)) = 0,[State Name],RELATED(AlternateLookup[Replacement])),RELATED(StateAbbreviations[State]))

We now have a central table to control exceptions.

These DAX techniques help us gain more control over our lookups. We have solved the "(Blank)" issue. However, we are still relying on the 1-to-Many or 1-to-1 relationship to work and we still must import all of the lookups into our model.

We can eliminate our reliance on table relationships by using DAX's equivalent of VLOOKUP, LOOKUPVALUE, documented here:

https://msdn.microsoft.com/en-us/library/gg492170.aspx

Note however, that this still essentially requires a 1-to-1 relationship in effect as LOOKUPVALUE returns an error if a search returns multiple rows and the result column for all of those rows does not contain the same value. In addition, we are still left with the requirement to import all of our lookups into our data model.

Power Query "M"

We've probably taken DAX about as far as we can so let's switch to Power BI's other language, Power Query (informally known as "M"). A wonderful thing about Power BI, virtually no end to the different ways you can solve the same problem!

First up, can we essentially replicate the functionality that we have in DAX in Power Query? Sure we can. There's this handy little function called Table.Join that essentially replicates the functionality of a table join in SQL or our relationships in our data model. Create a new "Blank Query", go to "Advanced Editor" mode and paste in the following code:

let
 Source = Csv.Document(File.Contents("C:\temp\powerbi\vlookup\StatePopulations.csv"),[Delimiter=",", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),
 #"Promoted Headers" = Table.PromoteHeaders(Source),
 #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Rank", Int64.Type}, {"State Name", type text}, {"Population", Int64.Type}}),
 #"Uppercased Text" = Table.TransformColumns(#"Changed Type",{{"State Name", Text.Upper}}),
 #"Merged Queries" = Table.NestedJoin(#"Uppercased Text",{"State Name"},StateAbbreviations,{"State Name"},"NewColumn",JoinKind.LeftOuter),
 #"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {"State"}, {"NewColumn.State"})
in
 #"Expanded NewColumn"

This will essentially transform your "StatePopulations" query into a query that matches up the "State Name" column from both queries and returns a single table with the replacement values in a new column called "State". Nifty. Not so nifty that Power Query automagically corrects for case, hence the "#Uppercased Text" step where we transform the "State Names" column in StatePopulations to all UPPERCASE (not yelling) so that it matches with the values in our "StateAbbreviations" query. In pseudocode, "Power Query < smart as Power BI Desktop".

The end result though is that we have our state abbreviations in our state populations table without having to import all of our translations. More efficient storage-wise I suppose, but what about the non-state per the Constitution "Washington, D. C."? Alas, null. Sadness. Can we fix it? After a fashion, we could add a column called "State" such as:

= if [NewColumn.State] = null then [State Name] else [NewColumn.State]

Can we implement the same kind of "exception" logic that we had in DAX? Sure we can. Add a State1 column with the formula:

if [State Name] = "WASHINGTON, D. C." then "DC" else if[NewColumn.State] = null then [State Name] else [NewColumn.State]

Presto chango, we have our translation for "Washington, D. C.". And hey, looky there, nested if statements work in "M". Good to know!

Thus far, however, we are still sort of, kind of relying on the "cleanliness" of our data. What if someone adds a duplicate value to our StateAbbreviations table? Back to error city. Sadness. Must rectify. In developer terms, our lookups are "fragile". That's pronounced "Frah GEEL lay" from where I'm from. Must be Italian. Basically, we are not accounting for the "edge conditions". Sot what if we get absolutely bananas and get rid of our reliance on "relationships" all together? Can we do that and what would that look like?

Well, as it turns out, we are consummate BI professionals with the sheer awesomeness of Power BI at our disposal, of course we can do that. Let's start with a basic example of a non-relation based lookup and see where it goes. Create a Blank Query. Make sure to name this new query "fnLookup" before you save it!! Go to "Advanced Editor" mode and paste in the following code:

let

fnLookup = (input) =>

let

values = {

{"ALABAMA","AL"},

{"ALASKA","AK"},

{"ARIZONA","AZ"},

{"ARKANSAS","AR"},

{"CALIFORNIA","CA"},

{"COLORADO","CO"},

{"CONNECTICUT","CT"},

{"DELAWARE","DE"},

{"FLORIDA","FL"},

{"GEORGIA","GA"},

{"HAWAII","HI"},

{"IDAHO","ID"},

{"ILLINOIS","IL"},

{"INDIANA","IN"},

{"IOWA","IA"},

{"KANSAS","KS"},

{"KENTUCKY","KY"},

{"LOUISIANA","LA"},

{"MAINE","ME"},

{"MARYLAND","MD"},

{"MASSACHUSETTS","MA"},

{"MICHIGAN","MI"},

{"MINNESOTA","MN"},

{"MISSISSIPPI","MS"},

{"MISSOURI","MO"},

{"MONTANA","MT"},

{"NEBRASKA","NE"},

{"NEVADA","NV"},

{"NEW HAMPSHIRE","NH"},

{"NEW JERSEY","NJ"},

{"NEW MEXICO","NM"},

{"NEW YORK","NY"},

{"NORTH CAROLINA","NC"},

{"NORTH DAKOTA","ND"},

{"OHIO","OH"},

{"OKLAHOMA","OK"},

{"OREGON","OR"},

{"PENNSYLVANIA","PA"},

{"RHODE ISLAND","RI"},

{"SOUTH CAROLINA","SC"},

{"SOUTH DAKOTA","SD"},

{"TENNESSEE","TN"},

{"TEXAS","TX"},

{"UTAH","UT"},

{"VERMONT","VT"},

{"VIRGINIA","VA"},

{"WASHINGTON","WA"},

{"WEST VIRGINIA","WV"},

{"WISCONSIN","WI"},

{"WYOMING","WY"}

},

Result = List.First(List.Select(values, each _{0}=input)){1}

in

Result

in

fnLookup

We can now create a new column "State2" in our "StatePopulations" query with the formula:

=fnLookup([State Name])

Wait a minute, what is this "List" thing showing in our column? Hit the "expand" arrows in the column header. Problem solved. Hey, how about that? A lookup that is NOT dependent on any kind of rigid "relationship" between any tables! But what about "Washington, D. C."? Boo! Error.

Let's fix that. 

Change State2 to a formula of:

=try fnLookup([State Name]) otherwise 0

Then create a new column "State3" with the formula:

=if [State2] = 0 then [State Name] else [State2]

Hmm, error handling in Power Query, nifty. Sure, it might take two columns for the price of one, but useful none-the-less. We can even get down with our fancy selves and change State2's formula to be:

=if [State Name] = "WASHINGTON, D. C." then "DC" else if[State2] = 0 then [State Name] else [State2]

Nested if statements in Power Query...still work.

OK, we have apparently solved the problem of our lookups being "Fra GEEL lay" so why are you looking at me like that? Don't look at me like that, I can feel your look. You are saying to yourself. "Good job Greg, you've solved the 'Fra GEEL lay' problem but now I have my lookups in some namby pamby Power Query "M" code that I have to maintain. Fail!

Fair enough, we don't do "namby pamby" here, we serve up the awesomesauce...

Edit your "fnLookup" query and paste in the following (note that you will likely have to change the path to the CSV file in the Source line):

let fnLookup = (input) =>

let
 Source = Csv.Document(File.Contents("C:\temp\powerbi\vlookup\StateAbbreviations.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
 #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
 #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type"),
 Record = Table.First(Table.SelectRows(#"Promoted Headers", each [State Name] = input)),
 Result = Record.Field(Record,"State")
in
 Result
in
 fnLookup

You will probably see some errors related to referencing external stuff, yadda, yadda, yadda. To fix that, open up your "StatePopulations" query in Advanced Editor and paste in:

 

let
 Source = Csv.Document(File.Contents("C:\temp\powerbi\vlookup\StatePopulations.csv"),[Delimiter=",", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),
 #"Promoted Headers" = Table.PromoteHeaders(Source),
 #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Rank", Int64.Type}, {"State Name", type text}, {"Population", Int64.Type}}),
 #"Uppercased Text" = Table.TransformColumns(#"Changed Type",{{"State Name", Text.Upper}}),
 #"Added Custom4" = Table.AddColumn(#"Uppercased Text", "State5", each fnLookup([State Name]))
in
 #"Added Custom4"

This clears out all of the various columns we have been using during testing and replaces them with our clean new function which does not require us to keep that list of lookup and replacement values in the query itself, but rather in some source that we have defined (in this case a CSV file).

Uh oh, back to "Error" for "Washington D.C.", we already know how to fix that (see above) so I won't belabor the point here.

So, let's take stock of where we are. First, using "M" code, we have solved the problem with having to store all of the lookup and replacement values in our data model. Second, we know that we have some logical control over the process of lookups and replacements to account for lookups that do not exist and/or specific exceptions that we want to enforce. Granted, this method is going to put more stress on our source system in terms of data access as each function call is presumably going to hit our data source but such is the price for saving space in our data model and ultimate control over our lookups.

However, make no mistake, we can make yet another refinement to our lookup code. Currently, the query we have for "fnLookup" returns ALL of the data from the lookup source and THEN refines it down to the specific value we are looking for. That's a bit inefficient. We know that we want a single value from the lookup source, can we query the source in such a way that we only return the particular value we are interested in from the source when we query it.

Stop me if you have heard this one before. Of course, we can.

Replace the code in "fnLookup" with the following:

let fnLookup = (input) =>

let
 Source = Csv.Document(File.Contents("C:\temp\powerbi\vlookup\StateAbbreviations.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
 #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
 #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type"),
 Record = Table.First(Table.SelectRows(#"Promoted Headers", each ([State Name] = input))),
 Result = Record.Field(Record,"State")
in
 Result
in
 fnLookup
 

Basically, now we are pre-filtering the return from our function call so that ostensibly we are not grabbing all of the values and THEN filtering but rather only grabbing the values from the data source that we care about. Does this really make a difference? Tough to say given the "black box" that is Power Query in some regards. Does it make us feel better? Sure. It is left to the reader to determine if there is truly a performance difference.

Conclusion

While it is impossible to cover every conceivable lookup scenario you might face, this article covers the vast majority of the basic techniques and building blocks for doing lookups in Power BI.