Find best partial match(es) for substring(s)

Sam_NY

New Member
Joined
Jul 9, 2018
Messages
8
This might be an impossible task, but I'm in a de-duping project and was hoping to write a formula that could plug in a string from before the @ in an email address, say "harrystruman" and then compare that to a range of values in a column or columns, and find the best match or partial match (or even, say top 10 matches) of any substring within the original string.

So searching through the columns of something like below:

A,B,C
First,Middle,Last
Harry,S,Truman
George,W,Bush
Franklin,D,Roosevelt

The thing is, I don't know how to search for all the substrings in "harrystruman" (there are 78 of them in this example: h, ha, har, harry, harrys, etc.). Each one would need to be compared to column A and columns C the example and then...maybe scored as a % match? Or list full matches of a string (Truman = Truman) somewhere? I guess I'd want to look for 100% match or matching 3 or more characters from the search value.

But honestly, don't even know where to get started. Most promising was MATCH, using wildcards for the search value, but in this example there would need to be 78 different search values.

My instinct is to go try this in python so I could loop through the substrings while iterating through a csv, but is there a way in Excel?

Thanks for any advice.
 

Excel Facts

Create a Pivot Table on a Map
If your data has zip codes, postal codes, or city names, select the data and use Insert, 3D Map. (Found to right of chart icons).
Hi,

This:

A,B,C
First,Middle,Last
Harry,S,Truman
George,W,Bush
Franklin,D,Roosevelt

Is that in a Single Column, or 3 separate columns?
 
Upvote 0
Sorry, yes, three columns as currently formatted. I could concatenate them, though. I guess the problem is I'm trying to look for partial matches between a list of email addresses (htruman1884@...) and standard database name fields (Harry, S, Truman). But I'd like to write a formula that would retrieve the Harry S. Truman row as a partial match based on the fact that the email address has "truman" contained within it, if that makes sense.
 
Upvote 0
This is one way I can think of, not sure if that's what you want.

This solution requires a "Concatenated" helper column D, First name match result in column G, Last name match result in column H, then Final result in column I :


Book1
ABCDEFGHI
1FirstMiddleLastConcatenated HelperFirst Name MatchLast Name MatchMatched Result
2HarrySTrumanHarry S Trumanhtruman1884@... TrumanHarry S Truman
3GeorgeWBushGeorge W BushHarry, S, TrumanHarryTrumanHarry S Truman
4FranklinDRooseveltFranklin D RooseveltharrystrumanHarryTrumanHarry S Truman
5gwbushBushGeorge W Bush
6George, BushGeorgeBushGeorge W Bush
7georgebushGeorgeBushGeorge W Bush
Sheet352
Cell Formulas
RangeFormula
G2=IFERROR(LOOKUP(2,1/SEARCH(A$2:A$4,F2),A$2:A$4),"")
H2=IFERROR(LOOKUP(2,1/SEARCH(C$2:C$4,F2),C$2:C$4),"")
I2=INDEX(D$2:D$4,IFERROR(MATCH(G2,A$2:A$4,0),MATCH(H2,C$2:C$4,0)))
D2=A2&" "&B2&" "&C2


Formulas copied down.
 
Upvote 0

Forum statistics

Threads
1,214,858
Messages
6,121,960
Members
449,057
Latest member
FreeCricketId

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top