Reversed Name Recognition and Match

jdavisz

New Member
Joined
May 8, 2020
Messages
8
Office Version
  1. 2016
Platform
  1. Windows
Hi,

I'm trying to find a formula or UDF code that will allow me to compare name cells from one year (~30,000) records to another year. I am aware of vlookup and fuzzylookup add in, etc., and they all work well to an extent. My data is really ugly with some names being entered as firstname lastname and others as lastname firstname with no way for me to actually know which is which in any particular record with confidence. I'd like to find a way for a formula or code that would recognize John Davis and Davis John as the same person (and return a true or 1 or something like that) while it would not recognize Jane Davis as the same person (and would return false or 0). I found a Similarity UDF that works for many cases, but it does not perform well for simple name reversals (Davis John vs. John Davis). It will score that around a .5 while it will score John M. Davis vs. John F. Davis as .9 or so yet those are two different people. Having a way to filter out simple reversals will help reduce the number I have to manually review. Any insight would be appreciated.

Thanks!
John Davis
 
Hi @jdavisz,
The following cases could exist in your database:
What would be the result? It could be 2 of 4 and 3 of 4, or .5 and .75.
Could you tell us.

varios 08may2020.xlsm
ABC
1YEAR 1YEAR 2
2John F. S. DavisDavis John
3John M Davis FJohn M Davis
Hoja2
 
Upvote 0

Excel Facts

Workdays for a market open Mon, Wed, Friday?
Yes! Use "0101011" for the weekend argument in NETWORKDAYS.INTL or WORKDAY.INTL. The 7 digits start on Monday. 1 means it is a weekend.
On which line of the macro it stops.
You can try a sample of data, maybe 100 or 200 names.
What does your data have, some combined cell, some empty cell, some cell with error: #N/A, #VALUE, etc.
How many records are you testing.

The macro in post #6 replaces the previous one, you only have to run a macro.

HOW TO INSTALL MACROs
------------------------------------
If you are new to macros, they are easy to install and use. To install it, simply press ALT+F11 to go into the VB editor and, once there, click Insert/Module on its menu bar, then copy/paste the above code into the code window that just opened up. That's it.... you are done. To use the macro, go back to the worksheet with your data on it and press ALT+F8, select the macro name (Name_Recognition) from the list that appears and click the Run button. The macro will execute and perform the action(s) you asked for. If you will need to do this again in this same workbook, and if you are using XL2007 or above, make sure you save your file as an "Excel Macro-Enabled Workbook (*.xlsm) and answer the "do you want to enable macros" question as "Yes" or "OK" (depending on the button label for your version of Excel) the next time you open your workbook.
Danté, I am trying the macro on a small subset of 20 names and the error appeared as soon as I clicked run. I do not know how to tell which record threw the error. There are no blanks or N/A or data other than names in my test columns. Does that help?
 
Upvote 0
Does that help?

Unfortunately that doesn't help.

I did a test with 34,000 records and it works.

You can put your test data here, but use XL2BB tool so that I can copy the data and review it. Or you can upload your test file to the cloud to dropbox or google drive.
 
Upvote 0
All,

Thank you. I'll try these approaches and see how they work. I appreciate your time and willingness to help!

John Davis
 
Upvote 0
I have reviewed the code I posted earlier and found a few possible anomalies. The amended code below addresses those anomalies and also ignores "." so that the first row in my sample below now shows as a perfect match. If not a perfect match I am sticking to just providing the number of matched "words" rather than a fraction of matches as I see some difficulties with using fractions - see my comments to Dante below.

@DanteAmor
Hi Dante, You may wish to consider tweaking your code if you think that any of the last 4 rows in my sample are possible as your latest code reports all as perfect matches. In relation to rows 10 and 11, I believe that it relates to providing fractional answers as your code finds two words that match and since column A in those rows contain two words, your fraction equates to 1.

VBA Code:
Sub CheckNamesv2()
  Dim RX As Object, M As Object
  Dim a As Variant, b As Variant, itm As Variant
  Dim i As Long, k As Long
  Dim s As String, t As String
  
  a = Range("A1", Range("B" & Rows.Count).End(xlUp)).Value
  ReDim b(1 To UBound(a), 1 To 1)
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.IgnoreCase = True
  For i = 2 To UBound(a)
    s = "( " & Replace(Replace(a(i, 2), ".", ""), " ", "| ") & ")(?= )"
    t = " " & Replace(Replace(a(i, 1), ".", ""), " ", "  ") & " "
    k = 0
    RX.Pattern = s
    s = Split(Replace(s, ")", "|"), "(")(1)
    Set M = RX.Execute(" " & Replace(a(i, 1), ".", "") & " ")
    For Each itm In M
      If InStr(s, itm & "|") > 0 Then k = k + 1
      s = Replace(s, itm & "|", "", 1, 1, 1)
      t = Replace(t, itm & " ", "", 1, 1, 1)
    Next itm
    If Len(s & t) = 0 Then
      b(i, 1) = "Yes"
    Else
      b(i, 1) = k
    End If
  Next i
  Range("C1").Resize(UBound(b)).Value = b
End Sub


Sample data and results
jdavisz 2020-05-10 1.xlsm
ABC
1YEAR 1YEAR 2
2Randy D BallardBallard Randy D.Yes
3John JohnsJohnson John1
4John DavisJohn DavisYes
5Davis JohnJohn DavisYes
6Robert LanceBob Lance1
7John DavisJon Davison0
8Randy D. BallardBallard Ranald D2
9David John Paul Ewan JonesJohn Jones2
10David JonesDavid John Paul Ewan Jones2
11David JohnJohn David Jones2
12Allan AllanAllan Jones1
13Allan JonesAllan Allan1
Sheet1
 
Upvote 0
Hi Dante, You may wish to consider tweaking your code if you think that any of the last 4 rows in my sample are possible as your latest code reports all as perfect matches.
Hi Peter, excellent comments, I also noticed those combinations and I was planning to subtract a tenth or something, so that the coincidence is not perfect, but I would like to hear the opinion of the OP before making changes and changes, I asked the OP in post #11.
Thanks again for taking the time to review my code (y).
 
Upvote 0
Dante and Peter, I spent the day with family yesterday, but I'm now back to my data. Both of your methods provide tools to help me accomplish what I need. Thank you very much! Dante, I must apologize. I DID have a blank in my data. I fixed that and your code performs as you designed it to. It was my error. Peter, I like the tweak that ignores periods and will run that to see how it matches. Gentlemen, for now I will say thank you and won't ask for more of your time you have already generously given me. I'm new to this forum and am very happy with the help I've received. Thank you.
 
Upvote 0

Forum statistics

Threads
1,214,938
Messages
6,122,346
Members
449,080
Latest member
Armadillos

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top