PQ Performance in Comparing Two Tables

cr731

Well-known Member
Joined
Sep 17, 2010
Messages
611
I am using Power Query to compare column values in two tables, where each table has a Key column as well as around 20 dimension columns, which I want to check to make sure that the dimensions align between the two tables for each Key. For example

Table1
Key
Dim1
Dim2
Dim3
ABC
xxx
yyy
zzz
DEF
aaa
bbb
ccc

<tbody>
</tbody>

Table2
Key
Dim1
Dim2
Dim3
ABC
hhh
jjj
zzz
DEF
aaa
bbb
ddd

<tbody>
</tbody>


And I want a resulting table that gives me all instances where the dimensions do not match, like this

Key
Dimension
Table1 Value
Table2 Value
ABC
Dim1
xxx
hhh
ABC
Dim2
yyy
jjj
DEF
Dim3
ccc
ddd

<tbody>
</tbody>


To accomplish this, I tried this:

  • Append tables together
  • Unpivot dimensions
  • Remove duplicates to get all unique combinations of Key, Dimension
  • Merge the unique combinations to Table1 and Table2 to get the values in each table, and compare

My query is

Code:
let
    Source = Table.Combine({Table1, Table2}),
    #"Grouped Rows" = Table.Group(#"Source", List.Combine({ ColumnsToKeep_Key, {"Source"} }), {{"Volume", each List.Sum([Volume]), type number}, {"Revenue", each List.Sum([Revenue]), type number}}),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Grouped Rows", {"Key", "Volume", "Revenue"}, "Attribute", "Value"),
    #"Table1 Table" = Table.SelectRows(#"Unpivoted Other Columns", each [Source] = "Table1"),
    #"Table2 Table" = Table.SelectRows(#"Unpivoted Other Columns", each [Source] = "Table2"),
    #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Value", "Source"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Columns"),
    #"Merged Table1" = Table.NestedJoin(#"Removed Duplicates", {"Key", "Attribute"}, #"Table1 Table", {"Key", "Attribute"}, "Table1_Table", JoinKind.LeftOuter),
    #"Merged Table2" = Table.NestedJoin(#"Merged Table1", {"Key", "Attribute"}, #"Table2 Table", {"Key", "Attribute"}, "Table2_Table", JoinKind.LeftOuter),
    #"Expanded Table1_Table" = Table.ExpandTableColumn(#"Merged Table2", "Table1_Table", {"Value"}, {"Table1_Table.Value"}),
    #"Expanded Table2_Table" = Table.ExpandTableColumn(#"Expanded Table1_Table", "Table2_Table", {"Value"}, {"Table2_Table.Value"}),
    #"Added Value Check" = Table.AddColumn(#"Expanded Table2_Table", "Value Check", each if [Table1_Table.Value] = [Table2_Table.Value] then "OK" else "Check"),
    #"Filtered Rows" = Table.SelectRows(#"Added Value Check", each [Value Check] <> "OK"),
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Value Check"}),
in
    #"Removed Columns1"

It works, but is extremely slow (30+ minutes refresh time). Each of Table1 and Table2 have about 100,000 rows, and 20 dimension columns, so the unpivoted version of each would have around 2,000,000 rows.

This seems like it shouldn't be that difficult for PQ to handle, am I doing something wrong here that is hurting performance this much?



Thanks in advance.
 
Last edited:

Excel Facts

How to fill five years of quarters?
Type 1Q-2023 in a cell. Grab the fill handle and drag down or right. After 4Q-2023, Excel will jump to 1Q-2024. Dash can be any character.
Try this:

Table2:

Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Key", type text}, {"Dim1", type text}, {"Dim2", type text}, {"Dim3", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "TableName", each "Table2")
in
    #"Added Custom"

Table1:

Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Key", type  text}, {"Dim1", type text}, {"Dim2", type text}, {"Dim3", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "TableName", each "Table1"),
    #"Appended Query" = Table.Combine({#"Added Custom", Table2}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Appended Query", {"Key", "TableName"}, "Attribute", "Value"),
    #"Pivoted Column" = Table.Pivot(#"Unpivoted Columns", List.Distinct(#"Unpivoted Columns"[TableName]), "TableName", "Value"),
    #"Added Custom1" = Table.AddColumn(#"Pivoted Column", "ConditionSign", each if [Table1] =[Table2] then 0 else 1),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([ConditionSign] = 1)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"ConditionSign"})
in
    #"Removed Columns"
 
Upvote 0
Hi
It is a little shorter.
Code:
let    
    table1 = Excel.CurrentWorkbook(){[Name="table1"]}[Content],
    table2 = Excel.CurrentWorkbook(){[Name="table2"]}[Content],
    unpTable1 = Table.UnpivotOtherColumns(table1, {"Key"}, "Dimension", "Table1 Value"),
    unpTable2 = Table.UnpivotOtherColumns(table2, {"Key"}, "Dimension", "Table2 Value"),
    joined = Table.Join(unpTable1, {"Key", "Dimension"}, unpTable2, {"Key", "Dimension"}),
    result = Table.SelectRows(joined, each [#"Table1 Value"] <> [#"Table2 Value"])
in
    result
Regards,
 
Last edited:
Upvote 0

Forum statistics

Threads
1,213,484
Messages
6,113,927
Members
448,533
Latest member
thietbibeboiwasaco

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top