Reversed Name Recognition and Match

jdavisz

New Member
Joined
May 8, 2020
Messages
8
Office Version
  1. 2016
Platform
  1. Windows
Hi,

I'm trying to find a formula or UDF code that will allow me to compare name cells from one year (~30,000) records to another year. I am aware of vlookup and fuzzylookup add in, etc., and they all work well to an extent. My data is really ugly with some names being entered as firstname lastname and others as lastname firstname with no way for me to actually know which is which in any particular record with confidence. I'd like to find a way for a formula or code that would recognize John Davis and Davis John as the same person (and return a true or 1 or something like that) while it would not recognize Jane Davis as the same person (and would return false or 0). I found a Similarity UDF that works for many cases, but it does not perform well for simple name reversals (Davis John vs. John Davis). It will score that around a .5 while it will score John M. Davis vs. John F. Davis as .9 or so yet those are two different people. Having a way to filter out simple reversals will help reduce the number I have to manually review. Any insight would be appreciated.

Thanks!
John Davis
 

Excel Facts

How to calculate loan payments in Excel?
Use the PMT function: =PMT(5%/12,60,-25000) is for a $25,000 loan, 5% annual interest, 60 month loan.
Hi and welcome to MrExcel.

Assuming your data in this way:
Dante Amor
ABC
1YEAR 1YEAR 2Exists
2John F. S. DavisDavis John4
3John M. DavisJane Davis6
4John DavisAmor
5John F. DavisDante7
6Jane DavisDavis F. JOHN S.2
7Dante
Hoja2


The following macro (Name_Recognition) will review the combinations of up to 4 names, for example:

John F. S. Davis, John F. Davis S., John S. F. Davis, John S. Davis F., John Davis F. S., John Davis S. F. etc (24 combinations)

So the macro takes the name from column B and looks it up in column A (with all the name combinations), if it finds it, in column C puts the row number of column A where it found the name. For example Jane Davis is in row 6 of column A (check the example above). If the name is not found, the cell in column C remains empty.

Run Name_Recognition macro.
VBA Code:
Sub Name_Recognition()
  Dim a As Variant, b As Variant, c As Variant
  Dim i As Long, dic As Object, sItems As Variant
  '
  a = Range("A2", Range("A" & Rows.Count).End(3)).Value2
  b = Range("B2", Range("B" & Rows.Count).End(3)).Value2
  ReDim c(1 To UBound(b), 1 To 1)
  Set dic = CreateObject("Scripting.Dictionary")
  dic.comparemode = vbTextCompare
  '
  For i = 1 To UBound(a)
    sItems = Split(Trim(a(i, 1)), " ")
    Select Case UBound(sItems)
      Case 0
        dic(sItems(0)) = i
      Case 1
        dic(sItems(0) & " " & sItems(1)) = i
        dic(sItems(1) & " " & sItems(0)) = i
      Case 2
        Call comb3(i, sItems, dic)
      Case 3
        Call comb4(i, sItems, dic)
      Case Else
        dic(a(i, 1)) = i
    End Select
  Next
  '
  For i = 1 To UBound(b)
    If dic.exists(b(i, 1)) Then
      c(i, 1) = dic(b(i, 1)) + 1
    End If
  Next
  Range("C2").Resize(UBound(c)).Value = c
End Sub

Sub comb3(i, sItems, dic)
  dic(sItems(0) & " " & sItems(1) & " " & sItems(2)) = i
  dic(sItems(0) & " " & sItems(2) & " " & sItems(1)) = i
  dic(sItems(1) & " " & sItems(0) & " " & sItems(2)) = i
  dic(sItems(1) & " " & sItems(2) & " " & sItems(0)) = i
  dic(sItems(2) & " " & sItems(0) & " " & sItems(1)) = i
  dic(sItems(2) & " " & sItems(1) & " " & sItems(0)) = i
End Sub

Sub comb4(i, sItems, dic)
  dic(sItems(0) & " " & sItems(1) & " " & sItems(2) & " " & sItems(3)) = i
  dic(sItems(0) & " " & sItems(1) & " " & sItems(3) & " " & sItems(2)) = i
  dic(sItems(0) & " " & sItems(2) & " " & sItems(1) & " " & sItems(3)) = i
  dic(sItems(0) & " " & sItems(2) & " " & sItems(3) & " " & sItems(1)) = i
  dic(sItems(0) & " " & sItems(3) & " " & sItems(1) & " " & sItems(2)) = i
  dic(sItems(0) & " " & sItems(3) & " " & sItems(2) & " " & sItems(1)) = i
  dic(sItems(1) & " " & sItems(0) & " " & sItems(2) & " " & sItems(3)) = i
  dic(sItems(1) & " " & sItems(0) & " " & sItems(3) & " " & sItems(2)) = i
  dic(sItems(1) & " " & sItems(2) & " " & sItems(0) & " " & sItems(3)) = i
  dic(sItems(1) & " " & sItems(2) & " " & sItems(3) & " " & sItems(0)) = i
  dic(sItems(1) & " " & sItems(3) & " " & sItems(0) & " " & sItems(2)) = i
  dic(sItems(1) & " " & sItems(3) & " " & sItems(2) & " " & sItems(0)) = i
  dic(sItems(2) & " " & sItems(0) & " " & sItems(1) & " " & sItems(3)) = i
  dic(sItems(2) & " " & sItems(0) & " " & sItems(3) & " " & sItems(1)) = i
  dic(sItems(2) & " " & sItems(1) & " " & sItems(0) & " " & sItems(3)) = i
  dic(sItems(2) & " " & sItems(1) & " " & sItems(3) & " " & sItems(0)) = i
  dic(sItems(2) & " " & sItems(3) & " " & sItems(0) & " " & sItems(1)) = i
  dic(sItems(2) & " " & sItems(3) & " " & sItems(1) & " " & sItems(0)) = i
  dic(sItems(3) & " " & sItems(0) & " " & sItems(1) & " " & sItems(2)) = i
  dic(sItems(3) & " " & sItems(0) & " " & sItems(2) & " " & sItems(1)) = i
  dic(sItems(3) & " " & sItems(1) & " " & sItems(0) & " " & sItems(2)) = i
  dic(sItems(3) & " " & sItems(1) & " " & sItems(2) & " " & sItems(0)) = i
  dic(sItems(3) & " " & sItems(2) & " " & sItems(0) & " " & sItems(1)) = i
  dic(sItems(3) & " " & sItems(2) & " " & sItems(1) & " " & sItems(0)) = i
End Sub

I don't think there are many people with more than 4 names, those you could identify manually, probably.
 
Upvote 0
DanteAmor, Thank you! I will work with this to see how it will help speed my work. I sincerely appreciate your help! Since I am not a power user of excel, I'm going to look up the difference between a macro and a user defined formula. You mentioned this was a macro, so I'll have to do some reading to see how to deploy it, but you have pointed me in a good direction. Thank you.
 
Upvote 0
DanteAmor, I entered the code and ran the macro and it worked as it is was designed. However, it looks in all of the reference column to find a match. I am hoping to find a way to only compare two cells with names in them and have excel tell me if they are the same person even if the are not exact because the first and last names are reversed. An example I've crudely pasted below:

ABC
12017Name2018NameSameName
2Randy D. BallardBallard Randy D.Yes
3Jim JohnsonJan JohnsonNo
4John DavisJohn DavisYes
5Davis JohnJohn DavisYes
6Robert LanceBob LanceNo

I'd like a way for excel (in cell C2) to tell me if the names in A2 and B2 are the same person while ignoring the name order. Then I would copy the formula down column C to have it compare each row's pair of names and make a judgement. Hopefully this makes sense.

I could even make use of a score (as in a the "Similarity" UDF I found on a discussion board) if only it scored reversals as matches or "1". Here is an example that I could make work if there were code or a way to make a UDF that would do this.

ABC
12017Name2018NameSameNameScore
2Randy D. BallardBallard Randy
0.9​
3Jim JohnsonJan Johnson
0.5​
4John DavisJohn Davis
1​
5Davis JohnJohn Davis
1​
7Robert LanceRob Lance
0.5​
6Tim JonesRobert Aveda
0​

Thank you for any insight you may offer.

John Davis
 
Upvote 0
I have used a UDF that works from Levenshtein Similarity and in many cases, it works well. It scores John Davis and John N. Davis well, but not Davis John. I am hoping someone knows how to write a UDF that can read the first and last names and recognize them as the same even if they are reversed. I don't know enough to do this.

Thanks!
John.
 
Upvote 0
With 30,000 names on your sheet I do not recommend a UDF, because you would have 30,000 formulas, each time you modify, order or filter, the 30,000 formulas are recalculated, that would slow your sheet.
I recommend the following macro, compare A2 with B2 and the result in C2 and so on.
The macro reverses the last name and also counts the number of matching names

Try and tell me.

VBA Code:
Sub Name_Recognition()
  Dim a As Variant, b As Variant, s1 As Variant, s2 As Variant
  Dim i As Long, j As Long, k As Long, n As Double, t1 As String, t2 As String
  '
  a = Range("A2", Range("B" & Rows.Count).End(3)).Value2
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a, 1)
    If a(i, 1) = a(i, 2) Then
      b(i, 1) = 1
    Else
      s1 = Split(Trim(Replace(Replace(LCase(a(i, 1)), ".", ""), ",", "")), " ")
      s2 = Split(Trim(Replace(Replace(LCase(a(i, 2)), ".", ""), ",", "")), " ")
      For j = 0 To UBound(s1)
        t1 = t1 & s1(j)
      Next
      t2 = s2(UBound(s2))
      For j = 0 To UBound(s2) - 1
        t2 = t2 & s2(j)
      Next
      If t1 = t2 Then
        b(i, 1) = 1
      Else
        n = 0
        For j = 0 To UBound(s2)
          For k = 0 To UBound(s1)
            If s2(j) = s1(k) Then n = n + (1 / (UBound(s1) + 1))
          Next
        Next
        b(i, 1) = n
      End If
    End If
  Next
  Range("C2").Resize(UBound(b)).Value = b
End Sub
 
Upvote 0
DanteAmor, I appreciate your advice and insight. I entered the macro code into a module and placed the cursor in the desired cell in C and ran the macro and got this error:

Run time error "9":
Subscript out of range

What am I doing wrong?

Thanks!
John Davis
 
Upvote 0
On which line of the macro it stops.
You can try a sample of data, maybe 100 or 200 names.
What does your data have, some combined cell, some empty cell, some cell with error: #N/A, #VALUE, etc.
How many records are you testing.

The macro in post #6 replaces the previous one, you only have to run a macro.

HOW TO INSTALL MACROs
------------------------------------
If you are new to macros, they are easy to install and use. To install it, simply press ALT+F11 to go into the VB editor and, once there, click Insert/Module on its menu bar, then copy/paste the above code into the code window that just opened up. That's it.... you are done. To use the macro, go back to the worksheet with your data on it and press ALT+F8, select the macro name (Name_Recognition) from the list that appears and click the Run button. The macro will execute and perform the action(s) you asked for. If you will need to do this again in this same workbook, and if you are using XL2007 or above, make sure you save your file as an "Excel Macro-Enabled Workbook (*.xlsm) and answer the "do you want to enable macros" question as "Yes" or "OK" (depending on the button label for your version of Excel) the next time you open your workbook.
 
Upvote 0
From what you have shown in post #4 it seems that you are just trying to do a line-by-line comparison.
If that is the case then here is another macro that you could try with a copy of your data.
I am also assuming data is in columns A:B with results to go in column C.

If each word/term exactly matches in both columns then "Yes" is returned.
If none of the words/term match then "No" is returned.
If some, but not all, match then the number of matches is returned.

A couple of extra points to note about my code:
  • The parts of the name must match exactly. For example, my code would only give a "2" for "Don A. Lane" and "Don A Lane" because of the "." in one but not the other. However, the code could be altered to ignore periods.
  • My code as it stands could give a False "Yes" if there is an exact repeated name. For example "Allan Allan" and "Allen Jones". I will consider that further if you see it as a real problem for your likely data.
VBA Code:
Sub CheckNames()
  Dim RX As Object
  Dim a As Variant, b As Variant
  Dim i As Long, k As Long
  Dim s As String
 
  a = Range("A1", Range("B" & Rows.Count).End(xlUp)).Value
  ReDim b(1 To UBound(a), 1 To 1)
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.IgnoreCase = True
  For i = 2 To UBound(a)
    RX.Pattern = "( " & Replace(Replace(a(i, 2), " ", "| "), ".", "\.") & ")(?= )"
      s = " " & a(i, 1) & " "
      k = RX.Execute(s).Count
      If k = UBound(Split(a(i, 2))) + 1 Then
        s = RX.Replace(" " & a(i, 1) & " ", "")
        If Trim(s) = vbNullString Then b(i, 1) = "Yes"
      ElseIf k = 0 Then
        b(i, 1) = "No"
      Else
        b(i, 1) = k
      End If
  Next i
  Range("C1").Resize(UBound(b)).Value = b
End Sub

My sample data and results
jdavisz 2020-05-10 1.xlsm
ABC
1YEAR 1YEAR 2
2Randy D. BallardBallard Randy D.Yes
3Jim JohnsonJan Johnson1
4John DavisJohn DavisYes
5Davis JohnJohn DavisYes
6Robert LanceBob Lance1
7John DavisJon DavisonNo
8Randy D. BallardBallard Ranald D.2
Sheet1
 
Upvote 0
Check if you want the comparison in this way, if there are 3 names of 4, then .75, if there are 1 d 3, then .33.
If you only want Yes or No, it's a simple change in the macro.

Dante Amor
ABC
1YEAR 1YEAR 2Exists
2John F. S. DavisDavis John0.50
3John M. DavisJane Davis0.33
4John M Davis FJohn M Davis0.75
5John DavisAmor0.00
6John F. DavisDante0.00
7Jane DavisDavis F. JOHN S.0.50
8Danteamor0.00
Hoja2


I did a test with 34,000 records and it works.
 
Upvote 0

Forum statistics

Threads
1,214,651
Messages
6,120,738
Members
448,988
Latest member
BB_Unlv

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top