Power Query - Remove duplicates from Folder Download
Results 1 to 4 of 4

Thread: Power Query - Remove duplicates from Folder Download
Thanks Thanks: 0 Likes Likes: 0

  1. #1
    Board Regular
    Join Date
    Aug 2012
    Location
    Auckland, NZ
    Posts
    1,528
    Post Thanks / Like
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Default Power Query - Remove duplicates from Folder Download

    Hi,

    I'm creating a personal account analysis system and using the 'From File, From Folder' process in Power Query. The issue I'm having is to identify where two downloads from a bank or credit card account cover the same period and hence have duplicate entries. In doing so I want to avoid removing genuine duplicate transactions. For example:

    Download File 1 includes two transaction on 10th June for £12 at Weatherspoons.
    Download File 2 covers the same period by accident and also has the two transactions for £12 at Weatherspoons on 10th June.

    A simple removal of duplicates will result in only one transaction, which is incorrect. Including the file download file name will result in two entries, but will also result in two entries if I had only a single transaction that occurs in both downloads. I believe the definition of a duplicate is where:

    The date, Amount and Payee are all identical, but the Download File Name is different. I know its counterintuitive, but...

    I'd be grateful for any advice you give.

    Many thanks.
    Peter

    Excel 2016, Windows 10
    Accuracy in posting formula's to the forum will help enormously, clairvoyancy is a skill I've yet to develop!!

  2. #2
    MrExcel MVP
    Join Date
    Dec 2014
    Location
    Sydney, Australia
    Posts
    1,180
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Power Query - Remove duplicates from Folder Download

    At a guess.
    1 Add a new Merged column using the 3 columns to make a unique key
    2 branch the query and do a group by the key, distinct count of the file name
    3 filter where count >1. This is a list of the duplicates
    4 branch again from point 2, inner join with 3, remove duplicates. This is a list of duplicates, but 1 copy only
    5 branch again from 2, left join, and remove matches. These are the non duplicated records
    6 append 4 and 5

    there may be a better way, but that is what comes to mind.
    Matt Allington is a professional Power BI Consultant, Power BI Trainer and Microsoft MVP based in Sydney Australia. Matt is author of the Power BI book Supercharge Power BI and the Power Pivot Book Supercharge Excel.
    What is Power BI?

  3. #3
    Board Regular
    Join Date
    Aug 2012
    Location
    Auckland, NZ
    Posts
    1,528
    Post Thanks / Like
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Power Query - Remove duplicates from Folder Download

    Hi Matt,

    Many thanks for this. I have a couple of queries:

    1. I can follow the logic, just not sure what you mean by 'branch the query'? Do you mean 'Reference'? (that is what I've assumed.
    2. The first four steps seem to remove the duplicates within a download, but these may be valid (buying two rounds at the pub will result in two exactly similar records as the download doesn't show time, only date). Its only those records in the second download that match something in the first that need to be removed.

    I'll keep trying with your strategy as that occurred to me, but had no idea how to start.

    Many thanks.

    Peter
    Peter

    Excel 2016, Windows 10
    Accuracy in posting formula's to the forum will help enormously, clairvoyancy is a skill I've yet to develop!!

  4. #4
    Board Regular
    Join Date
    Aug 2012
    Location
    Auckland, NZ
    Posts
    1,528
    Post Thanks / Like
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Power Query - Remove duplicates from Folder Download

    Hi Matt,

    I believe I've found the solution:

    1. Group by Download File, Date, Payee and Amount - this collects any duplicate records on that day into a table.
    2. Select Date, Payee and Amount and delete rows of duplicates. This leaves a single entry for each transaction including the table that contains the information about how may duplicates there are on one day.
    3. Expand the table and delete any unnecessary columns, which is probably all the new column(s) you select.

    Hope this helps someone.

    Best Regards

    Peter
    Peter

    Excel 2016, Windows 10
    Accuracy in posting formula's to the forum will help enormously, clairvoyancy is a skill I've yet to develop!!

Some videos you may like

User Tag List

Tags for this Thread

Like this thread? Share it with others

Like this thread? Share it with others

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •