Power Query - Remove duplicates from Folder Download

pjmorris

Well-known Member
Joined
Aug 2, 2012
Messages
1,902
Office Version
2016
Platform
Windows
Hi,

I'm creating a personal account analysis system and using the 'From File, From Folder' process in Power Query. The issue I'm having is to identify where two downloads from a bank or credit card account cover the same period and hence have duplicate entries. In doing so I want to avoid removing genuine duplicate transactions. For example:

Download File 1 includes two transaction on 10th June for £12 at Weatherspoons.
Download File 2 covers the same period by accident and also has the two transactions for £12 at Weatherspoons on 10th June.

A simple removal of duplicates will result in only one transaction, which is incorrect. Including the file download file name will result in two entries, but will also result in two entries if I had only a single transaction that occurs in both downloads. I believe the definition of a duplicate is where:

The date, Amount and Payee are all identical, but the Download File Name is different. I know its counterintuitive, but...

I'd be grateful for any advice you give.

Many thanks.
 

Some videos you may like

Excel Facts

Return population for a City
If you have a list of cities in A2:A100, use Data, Geography. Then =A2.Population and copy down.

Matt Allington

MrExcel MVP
Joined
Dec 18, 2014
Messages
1,221
At a guess.
1 Add a new Merged column using the 3 columns to make a unique key
2 branch the query and do a group by the key, distinct count of the file name
3 filter where count >1. This is a list of the duplicates
4 branch again from point 2, inner join with 3, remove duplicates. This is a list of duplicates, but 1 copy only
5 branch again from 2, left join, and remove matches. These are the non duplicated records
6 append 4 and 5

there may be a better way, but that is what comes to mind.
 

pjmorris

Well-known Member
Joined
Aug 2, 2012
Messages
1,902
Office Version
2016
Platform
Windows
Hi Matt,

Many thanks for this. I have a couple of queries:

1. I can follow the logic, just not sure what you mean by 'branch the query'? Do you mean 'Reference'? (that is what I've assumed.
2. The first four steps seem to remove the duplicates within a download, but these may be valid (buying two rounds at the pub will result in two exactly similar records as the download doesn't show time, only date). Its only those records in the second download that match something in the first that need to be removed.

I'll keep trying with your strategy as that occurred to me, but had no idea how to start.

Many thanks.

Peter
 

pjmorris

Well-known Member
Joined
Aug 2, 2012
Messages
1,902
Office Version
2016
Platform
Windows
Hi Matt,

I believe I've found the solution:

1. Group by Download File, Date, Payee and Amount - this collects any duplicate records on that day into a table.
2. Select Date, Payee and Amount and delete rows of duplicates. This leaves a single entry for each transaction including the table that contains the information about how may duplicates there are on one day.
3. Expand the table and delete any unnecessary columns, which is probably all the new column(s) you select.

Hope this helps someone.

Best Regards

Peter
 

Watch MrExcel Video

Forum statistics

Threads
1,101,991
Messages
5,484,064
Members
407,426
Latest member
Owen Chia

This Week's Hot Topics

  • Finding issue in If elseif else with For each Loop
    Finding issue in If elseif else with For each Loop I have tried this below code but i'm getting in Y column filled with W005. Colud you please...
  • MsgBox Error
    Hi Guys, I have the below error show up when i try and run my macro in File1 but works fine if i copy and paste the same code into file2. [ATTACH...
  • CELL FORMAT - IF CONDITION
    My Cell Format is [B]""0.00" Cr". [/B]But in the cell, it is showing 123.00 for editing. (123 is entry figure). (Data imported from other...
  • Show numbers nearly the same
    Is this possible. I have a number that can change very time eg 0.00001234 Then I have a lot of numbers 0.0000001, 0.0000002, 0.00000004...
  • Please i need your help to create formula
    I need a formula in cell B8 to do this >>if b1=1 then multiply ( cell b8) by 10% ,if b1=2 multiply by 20%,if=3 multiply by 30%. Thank you in...
  • Got error while adding column and filter
    Got error while adding column and filter In column Z has some like "Success" and "Error". I want to add column in AA if the Z cell value is...
Top