Power Query - Remove duplicates from Folder Download

pjmorris

Well-known Member
Joined
Aug 2, 2012
Messages
1,909
Office Version
2016
Platform
Windows
Hi,

I'm creating a personal account analysis system and using the 'From File, From Folder' process in Power Query. The issue I'm having is to identify where two downloads from a bank or credit card account cover the same period and hence have duplicate entries. In doing so I want to avoid removing genuine duplicate transactions. For example:

Download File 1 includes two transaction on 10th June for £12 at Weatherspoons.
Download File 2 covers the same period by accident and also has the two transactions for £12 at Weatherspoons on 10th June.

A simple removal of duplicates will result in only one transaction, which is incorrect. Including the file download file name will result in two entries, but will also result in two entries if I had only a single transaction that occurs in both downloads. I believe the definition of a duplicate is where:

The date, Amount and Payee are all identical, but the Download File Name is different. I know its counterintuitive, but...

I'd be grateful for any advice you give.

Many thanks.
 

Some videos you may like

Excel Facts

Repeat Last Command
Pressing F4 adds dollar signs when editing a formula. When not editing, F4 repeats last command.

Matt Allington

MrExcel MVP
Joined
Dec 18, 2014
Messages
1,227
At a guess.
1 Add a new Merged column using the 3 columns to make a unique key
2 branch the query and do a group by the key, distinct count of the file name
3 filter where count >1. This is a list of the duplicates
4 branch again from point 2, inner join with 3, remove duplicates. This is a list of duplicates, but 1 copy only
5 branch again from 2, left join, and remove matches. These are the non duplicated records
6 append 4 and 5

there may be a better way, but that is what comes to mind.
 

pjmorris

Well-known Member
Joined
Aug 2, 2012
Messages
1,909
Office Version
2016
Platform
Windows
Hi Matt,

Many thanks for this. I have a couple of queries:

1. I can follow the logic, just not sure what you mean by 'branch the query'? Do you mean 'Reference'? (that is what I've assumed.
2. The first four steps seem to remove the duplicates within a download, but these may be valid (buying two rounds at the pub will result in two exactly similar records as the download doesn't show time, only date). Its only those records in the second download that match something in the first that need to be removed.

I'll keep trying with your strategy as that occurred to me, but had no idea how to start.

Many thanks.

Peter
 

pjmorris

Well-known Member
Joined
Aug 2, 2012
Messages
1,909
Office Version
2016
Platform
Windows
Hi Matt,

I believe I've found the solution:

1. Group by Download File, Date, Payee and Amount - this collects any duplicate records on that day into a table.
2. Select Date, Payee and Amount and delete rows of duplicates. This leaves a single entry for each transaction including the table that contains the information about how may duplicates there are on one day.
3. Expand the table and delete any unnecessary columns, which is probably all the new column(s) you select.

Hope this helps someone.

Best Regards

Peter
 

Subscribe on YouTube

Watch MrExcel Video

Forum statistics

Threads
1,106,341
Messages
5,510,745
Members
408,809
Latest member
Matthiasek

This Week's Hot Topics

  • Turn fraction around
    Hello I need to turn a fraction around, for example I have 1/3 but I need to present as 3/1
  • TIme Clock record reformatting to ???
    Hello All, I'd like some help formatting this (Tbl-A)(Loaded via Power Query) [ATTACH type="full" width="511px" alt="PQdata.png"]22252[/ATTACH]...
  • TextBox Match
    hi, I am having a few issues with my code below, what I need it to do is when they enter a value in textbox8 (QTY) either 1,2 or 3 the 3 textboxes...
  • Using Large function based on Multiple Criteria
    Hello, I can't seem to get a Large formula to work based on two criteria's. I can easily get a oldest value based one value, but I'm struggling...
  • Can you check my code please
    Hi, Im going round in circles with a Compil Error End With Without With Here is the code [CODE=rich] Private Sub...
  • Combining 2 pivot tables into 1 chart
    Hello everyone, My question sounds simple but I do not know the answer. I have 2 pivot tables and 2 charts that go with this. However I want to...
Top