Duplicate rows based on entire row content

MLExcel

New Member
Joined
Nov 1, 2016
Messages
24
Hi everybody,

I am searching for a solution to remove duplicate rows in Power Query.
In a table of more than 60 columns I want to remove all rows that are identical in all columns (because they are double imports in the process happening before).

So if I have for example this table:
ID
Date
Value
a
1.1.2000
10
b
1.1.2000
10
a
1.1.2000
10
a
1.1.1999
10

<tbody>
</tbody>

I want to eliminate row 3 for it is entirely identical to row 1.

In Excel I would concatenate all colums and compare the result for duplicates but this seams to be not an easy/good solution in Query.

Any help is very highly appreciated.

Best,

MLE
 

Excel Facts

Save Often
If you start asking yourself if now is a good time to save your Excel workbook, the answer is Yes
Thank you for that. And WOW even with Video.

However, in the concerned query I don't have this option:
(Sorry it's in German)

-- removed inline image ---


In another query the option exists:

-- removed inline image ---


Some more background info to my queries:

The query where I want to remove eventual duplicates ("2_Faktura_Daten_Detail") :
let
Quelle = Excel.CurrentWorkbook(){[Name="_2_Faktura_Daten_Detail"]}[Content],
#"Geänderter Typ" = Table.TransformColumnTypes(Quelle,{{"Hersteller", Int64.Type}, {"HerstBez", type text}, ..., {"Column1", type any}}),
#"Angefügte Abfrage" = Table.Combine({#"Geänderter Typ", #"2_Faktura_Daten_Detail_CSV"})
in
#"Angefügte Abfrage"

The second query ("2_Faktura_Daten_Detail_CSV"):
let
Quelle = Csv.Document(File.Contents("X:\...\2_Faktura_Daten_Detail.csv"),[Delimiter=" ", Columns=62, Encoding=1200, QuoteStyle=QuoteStyle.None]),
#"Höher gestufte Header" = Table.PromoteHeaders(Quelle),
#"Geänderter Typ" = Table.TransformColumnTypes(#"Höher gestufte Header",{{"Hersteller", Int64.Type}, {"HerstBez", type text}, ..., {"", type text}})
in
#"Geänderter Typ"

The reason behind the queries:
I have a table ("_2_Faktura_Daten_Detail")which contains relevant data. This table is each month appended with new data from a csv file.


Any ideas?
 
Upvote 0
Just use the formula from the video, in your case something like:
Code:
let
    Quelle = Excel.CurrentWorkbook(){[Name="_2_Faktura_Daten_Detail"]}[Content],
    #"Geänderter Typ" = Table.TransformColumnTypes(Quelle,{{"Hersteller", Int64.Type}, {"HerstBez", type text}, ..., {"Column1", type any}}),
    #"Angefügte Abfrage" = Table.Combine({#"Geänderter Typ", #"2_Faktura_Daten_Detail_CSV"}),
    #"Einzigartige Abfrage"  = Table.Distinct(#"Angefügte Abfrage")
in
    #"Einzigartige Abfrage"
 
Upvote 0
Marcel, that almost does it. Thanks a million.

What work's: If I realize this in the query editor the result is fine. That is the number of lines is as expected in my test file (in the query editor)

What doesn't work: It does not load those lines to the table in the worksheet (and the datamodell). Some - not all - lines are missing.

I will have to dig in to that a little more, before I can describe it in more detail. I will be back asap.

Thanks in the meantime for your help.
 
Upvote 0
One possibility would be case-sensitivity in the Query Editor and case-insensitivity in the datamodel: unique values in the query editor (e.g. "a" and "A") may still be duplicates in the datamodel.
 
Upvote 0
I have now limited the error down a bit.

The case sensitivity should not be the issue, since the resulting duplicates are recognized as duplicates by the function EXACT which is case-sensitive

Here is what happens:
I start with 75 unique lines, the csv file that is to be added has 48 lines (all unique)

1.) When I run the query once all 48 lines are imported. So I end up with 123 lines - which is correct.
2.) If I run the query again (csv file is not changed), 19 lines are importet again - on top of the existing - -resulting in 142 lines.

These 19 lines are now duplicate, which should not happen. **

3.) If I run the query again, the result stays at 142 lines!

This makes me think, that in the query that collects my exisiting data from the table (the 142 in above mentioned step 3) the duplicates are recognized but the 19 coming from the csv file are not recognized as being already there. This would explain why the 19 are added twice but then not more often if the query is run again.


** I had a look at those duplicates:
Visually the look identical.
If I concatenate all columns of two duplicate entries the result
is NOT recognized as being identical by conditional formatting (highlight double entries)
IS POSITIVLY recognize as being identical by the function EXACT


All in all it looks to me as if some (invisble) cell value is only set when the query writes the data back in the table. This way we could have different values at the point when Table.Distinct is executed and then have identical values in the final table, which are also recognized as identical when the query runs once more.


Here are the two queries:

let
Quelle = Excel.CurrentWorkbook(){[Name="_Table"]}[Content],
#"Geänderter Typ" = Table.TransformColumnTypes(Quelle,{{"Hersteller", Int64.Type}, {"HerstBez", type text}, {"Bestellreferenz", type text}, {"Bestelldatum", type date}, {"AuftragsArt", type text}, {"AuftragsArtTxt", type text}, {"Auftrag", Int64.Type}, {"AG_ID", Int64.Type}, {"AG_Titel", type text}, {"AG_Name1", type text}, {"AG_Name2", type text}, {"AG_Strasse", type text}, {"AG_PLZ", Int64.Type}, {"AG_Ort", type text}, {"AG_Land", type text}, {"WE_ID", Int64.Type}, {"WE_Titel", type text}, {"WE_Name1", type text}, {"WE_Name2", type text}, {"WE_Strasse", type text}, {"WE_PLZ", Int64.Type}, {"WE_Ort", type text}, {"WE_Land", type text}, {"Lieferung", Int64.Type}, {"Lieferdatum", type date}, {"LieferPos", Int64.Type}, {"Material", type text}, {"PZN", type text}, {"HerstellerArtNummer", type text}, {"Material Bezeichnung", type text}, {"Charge", type text}, {"Verfallsdatum", type date}, {"Liefermenge", Int64.Type}, {"Einheit", type text}, {"Kategorie", type text}, {"FakturaArt", type text}, {"FakturaArtTxt", type text}, {"FaktPosTyp", type text}, {"FaktPosTypTxt", type text}, {"Faktura", Int64.Type}, {"FakturaDatum", type date}, {"FakturaPos", Int64.Type}, {"FakturaMenge", Int64.Type}, {"Nettopreis", type number}, {"Nettowert", type number}, {"Währung", type text}, {"USt_Satz", Int64.Type}, {"Zahlungsbed. Code", type text}, {"Zahlungsbedingung", type text}, {"RE_ID", Int64.Type}, {"RE_Titel", type text}, {"RE_Name1", type text}, {"RE_Name2", type text}, {"RE_Strasse", type text}, {"RE_PLZ", Int64.Type}, {"RE_Ort", type text}, {"RE_Land", type text}}),
#"Angefügte Abfrage" = Table.Combine({#"Geänderter Typ", #"2_Faktura_Daten_Detail_CSV"}),
#"remove duplicates" = Table.Distinct(#"Angefügte Abfrage")
in
#"remove duplicates"

let
Quelle = Csv.Document(File.Contents("X:\...\2_Faktura_Daten_Detail.csv"),[Delimiter=" ", Columns=62, Encoding=1200, QuoteStyle=QuoteStyle.None]),
#"Höher gestufte Header" = Table.PromoteHeaders(Quelle),
#"Entfernte Spalten" = Table.RemoveColumns(#"Höher gestufte Header",{"ZF1", "ZF2", "ZF3", "ZF4", ""}),
#"Geänderter Typ" = Table.TransformColumnTypes(#"Entfernte Spalten",{{"Hersteller", Int64.Type}, {"HerstBez", type text}, {"Bestellreferenz", type text}, {"Bestelldatum", type date}, {"AuftragsArt", type text}, {"AuftragsArtTxt", type text}, {"Auftrag", Int64.Type}, {"AG_ID", Int64.Type}, {"AG_Titel", type text}, {"AG_Name1", type text}, {"AG_Name2", type text}, {"AG_Strasse", type text}, {"AG_PLZ", Int64.Type}, {"AG_Ort", type text}, {"AG_Land", type text}, {"WE_ID", Int64.Type}, {"WE_Titel", type text}, {"WE_Name1", type text}, {"WE_Name2", type text}, {"WE_Strasse", type text}, {"WE_PLZ", Int64.Type}, {"WE_Ort", type text}, {"WE_Land", type text}, {"Lieferung", Int64.Type}, {"Lieferdatum", type date}, {"LieferPos", Int64.Type}, {"Material", type text}, {"PZN", type text}, {"HerstellerArtNummer", type text}, {"Material Bezeichnung", type text}, {"Charge", type text}, {"Verfallsdatum", type date}, {"Liefermenge", Int64.Type}, {"Einheit", type text}, {"Kategorie", type text}, {"FakturaArt", type text}, {"FakturaArtTxt", type text}, {"FaktPosTyp", type text}, {"FaktPosTypTxt", type text}, {"Faktura", Int64.Type}, {"FakturaDatum", type date}, {"FakturaPos", Int64.Type}, {"FakturaMenge", Int64.Type}, {"Nettopreis", type number}, {"Nettowert", type number}, {"Währung", type text}, {"USt_Satz", Int64.Type}, {"Zahlungsbed. Code", type text}, {"Zahlungsbedingung", type text}, {"RE_ID", Int64.Type}, {"RE_Titel", type text}, {"RE_Name1", type text}, {"RE_Name2", type text}, {"RE_Strasse", type text}, {"RE_PLZ", Int64.Type}, {"RE_Ort", type text}, {"RE_Land", type text}})
in
#"Geänderter Typ"
 
Last edited:
Upvote 0
If I understand correctly, you have an Excel tabel that is both input and output of your first query.
So (42) new records flow from csv to Excel (append); this Excel table is imported in Power Query and when the 42 newly added records are compared with the same 42 records from csv, 19 of them are considered different, so it looks like something has changed when data was written to Excel and imported back in Power Query.

My first suspicion would be with type number fields as these may cause very small rounding errors and maybe these errors are different in Excel and Power Query.
My approach would be to try and remove duplicates from several columns: you can select several columns and then choose remove duplicates on the Home tab.
If you repeat this with different columns, you may be able to find the columns with the issue.

Alternatively, you might try and apply rounding to number columns. This might solve the issue.
 
Upvote 0
It took me a while but now I found the issue.
In one column the customer name was "xyz GmbH" in the existing data and "xyz Gmbh" in the new import data. How this came to be I don't know.

I solved it now for the future by converting this column to lower case in the query. Now all duplicates are correctly identified.

Thanks for the support.
 
Upvote 0

Forum statistics

Threads
1,213,484
Messages
6,113,923
Members
448,533
Latest member
thietbibeboiwasaco

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top