Power Query - Remove Duplicates

joslaz

Board Regular
Joined
May 24, 2018
Messages
76
Hello!

I'm just trying to develop a logic / approach in Excel, which cleans up my bad bookings and corrections.

The data records have the following structure (table above)following structure (table above)


If there is a number in column XX, it is a correction. This means that this line, and the line that contains the same number in YY, must be deleted.

How is this to be implemented in Power Query?

greeting
 

Excel Facts

Who is Mr Spreadsheet?
Author John Walkenbach was Mr Spreadsheet until his retirement in June 2019.
You can do it like this,

Code:
let
    // load data
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtT1SszTNbRQ0lEyBGIzKwMDENMEQgORhYWFUqwONoWm6AotcShENhFkHJC0NACrNSJku6GBAUSlMUGVhjhVmmKoNCTCTLCBYMpIKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Day = _t, ID = _t, T1 = _t, T2 = _t, XX = _t, YY = _t]),
    ChangedType = Table.TransformColumnTypes(Source,{{"Day", type date}, {"ID", Int64.Type}, {"T1", type time}, {"T2", type time}, {"XX", Int64.Type}, {"YY", Int64.Type}}),
    
    // my steps
    SelfJoin = Table.NestedJoin( ChangedType, {"YY"}, ChangedType, {"XX"}, "TableJoin", JoinKind.LeftOuter),
    SelectRows = Table.SelectRows( SelfJoin, each [XX] = null and Table.IsEmpty( [TableJoin] ) ),
    RemovedColumns = Table.RemoveColumns(SelectRows,{"TableJoin"})
    
in
    RemovedColumns
 
Upvote 0
Many thanks!

I have followed your approach and this is my result:
let
Quelle = Excel.CurrentWorkbook(){[Name="Source1"]}[Content],
ZusammenführteAbfragen = Table.NestedJoin(Quelle,{"YY"},Quelle,{"XX"},"Geänderter Typ",JoinKind.LeftOuter),
ErweiterteGeänderterTyp = Table.ExpandTableColumn(ZusammenführteAbfragen, "Geänderter Typ", {"XX"}, {"XX.1"}),
GefilterteZeilen = Table.SelectRows(ErweiterteGeänderterTyp, each ([XX.1] = null)),
EntfernteSpalten = Table.RemoveColumns(GefilterteZeilen,{"XX.1"})
in
EntfernteSpalten

Do you have any suggestions, to optimize the code?
How can I adapt your approach in the code above?
Why do you have "Table.IsEmpty( [TableJoin] )"?


Best regards
 
Upvote 0
Hi joslaz

you have Table.IsEmpty because the the current row you are looking at must not be associated with any cancellations

The optimization of your code depends on the layout of your data, does the table you provided match the real layout?
 
Upvote 0
Thanks!

The structure to the original is near the same.

What do you think about a "JoinKind.LeftAnti)"?
 
Upvote 0
I guess leftanti canalso be used in your case, First you would perform a leftAnti join and then remove rows where XX is not empty
 
Upvote 0

Forum statistics

Threads
1,214,424
Messages
6,119,407
Members
448,894
Latest member
spenstar

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top