Merge duplicates conditionally

Tigerexcel

Active Member
Joined
Mar 6, 2020
Messages
281
Office Version
2016
Platform
Windows
I have a file with duplicate records but the critical field (Description) has been updated in some cases and that is the one that needs to be returned.

ProcessAffectsRevisionDescription
Building entryAllEntering the building
Building entryEntering the building, requires ID card
OHSRefer manual
OHSRefer manual
RecruitmentJun-20Refer HR
PhotocopyingOfficeRequires training
PhotocopyingOfficeRequires training. Update imminent

What I need as output would be:
Building entry All Entering the building, requires ID card

so that each process is only listed once but it picks up the Description field with the most information ie has more text in it. As far as I can see no process has more than 2 entries, there are a lot of entries though where there's just one entry. How do I merge these records and remove duplicates?
 

Some videos you may like

Excel Facts

Repeat Last Command
Pressing F4 adds dollar signs when editing a formula. When not editing, F4 repeats last command.

GraH

Active Member
Joined
Mar 22, 2020
Messages
473
Office Version
365, 2016
Platform
Windows
Hi Tigerexcel,

2 formula options:

Book1
ABCDEFGHI
1ProcessAffectsRevisionDescriptionKeep/DeleteLenKeep
2Building entryAllEntering the building
Delete
21
FALSE
3Building entryEntering the building, requires ID card
Keep
39
TRUE
4OHSRefer manual
Delete
12
TRUE
5OHSRefer manual
Keep
12
TRUE
6Recruitmentjun/20Refer HR
Keep
8
TRUE
7PhotocopyingOfficeRequires training
Delete
17
FALSE
8PhotocopyingOfficeRequires training. Update imminent
Keep
34
TRUE
Sheet2
Cell Formulas
RangeFormula
G2:G8G2=LEN(D2)
H2:H8H2=MAXIFS($G$2:$G$8,$A$2:$A$8,A2)=G2
E2:E8E2=IF(COUNTIF($A$2:$A$8,A2)=COUNTIF($A$2:A2,A2),"Keep","Delete")


Then filter and delete rows accordingly.

Same logic can be followed with Power Query, all with UI:
Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Process", type text}, {"Affects", type text}, {"Revision", type datetime}, {"Description", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Length", each Text.Length([Description])),
    #"Grouped Rows" = Table.Group(#"Added Custom", {"Process"}, {{"Last", each List.Max([Length]), type number}, {"All", each _, type table [Process=text, Affects=text, Revision=datetime, Description=text, Length=number]}}),
    #"Expanded All" = Table.ExpandTableColumn(#"Grouped Rows", "All", {"Affects", "Revision", "Description", "Length"}, {"Affects", "Revision", "Description", "Length"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded All", "Keep", each [Last]=[Length]),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Keep] = true)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Last", "Length", "Keep"})
in
    #"Removed Columns"
 

Tigerexcel

Active Member
Joined
Mar 6, 2020
Messages
281
Office Version
2016
Platform
Windows
Thanks G., rescued me again. I particularly like that you provide a variety of options from formulae to PQ. I haven't come across the MAXIFS function, thanks for sharing.
 

Watch MrExcel Video

Forum statistics

Threads
1,099,005
Messages
5,465,959
Members
406,456
Latest member
jmishra91

This Week's Hot Topics

Top