Custom Lookup Function

delos001

New Member
Joined
Oct 7, 2014
Messages
25
Hi,

I am new to vba and this is my first function attempt. I can't get this function to work. As a sub, it returns the value I want but when I changed it to a function, I get #VALUE! error.

I want users to be able to type in a formula and get the correct value and fill the formula down across all rows.

Purpose: lookup and return a value from range in table 2 that most closely matches the value in the specified cell in table 1. Its essentially a VLookup but with special code for when the strings being compared aren't exact.

Code summary: lets say Brian M Smith is in A2 in the left table. Names of people are in column B of the right table but the names are different (ie: in the right table, the name is Smith, Brian). The code splits the name Brian M Smith into discrete elements and stores in an array. For each cell in the range of table 2, the name is also split and stored in a separate array. The code then compares each array element individually and gets a score when they match. The function should return the name from the right table that has the highest score.

Here is the code:
Function JDELookup(ByVal lookupValue As Range, _
ByVal lookupArray As Range)

Dim LRarray As Integer 'last row of right table
Dim lookupCell As Range 'each cell in the lookupArray


Dim strLen1 As Integer 'length of string in cell from left table
Dim strLen2 As Integer 'length of string in cell from right table
Dim replLen1 As Integer 'length of string in cell from left table after formatting the string by replacing commas and spaces with single space
Dim replLen2 As Integer 'length of string in cell from right table after formatting the string by replacing commas and spaces with single space
Dim spaceCnt1 As Integer 'number of space found in the formatted string of cell in left table
Dim spaceCnt2 As Integer 'number of space found in the formatted string of cell in right table
Dim componCnt1 As Integer 'component count: counts the number of components in the cell from left table separated by spaces
Dim componCnt2 As Integer 'component count: counts the number of components in the cell from right table separated by spaces
Dim componString1() As String 'the string of each component from the left table
Dim componString2() As String 'the string of each component from the right table


Dim imax As Single 'the highest count of matches between the two strings
Dim ieval As Single 'the count of matches of the two strings currently being evaluated


Dim delValue As String 'the value of the sucessful lookup


lookupValue = LCase(Application.Trim(Replace(Replace(Replace(lookupValue, ",", " "), " ", " "), " ", " ")))
strLen1 = Len(lookupValue)
replLen1 = Len(Replace(lookupValue, " ", ""))
spaceCnt1 = strLen1 - replLen1
componCnt1 = spaceCnt1 + 1
componString1 = Split(lookupValue, " ")

LRarray = lookupArray.Cells(lookupArray.Rows.Count, 1).End(xlUp).Row


imax = 0
For Each lookupCell In Range(lookupArray.Cells(1, 1), lookupArray.Cells(LRarray, 1))
lookupCell = LCase(Application.Trim(Replace(Replace(Replace(lookupCell.Value, ",", " "), " ", " "), " ", " ")))
strLen2 = Len(lookupCell)
replLen2 = Len(Replace(lookupCell, " ", ""))
spaceCnt2 = strLen2 - replLen2
componCnt2 = spaceCnt2 + 1


componString2 = Split(lookupCell, " ")
ieval = 0
If lookupValue <> lookupCell Then
If InStr(1, lookupCell, componString1(0)) Then
For spaceCnt1 = 0 To UBound(componString1)
For spaceCnt2 = 0 To UBound(componString2)
If InStr(1, componString1(spaceCnt1), componString2(spaceCnt2)) Then
If Len(componString1(spaceCnt1)) = 1 Then
ieval = ieval + 0.1
Else
ieval = ieval + 1
End If
End If
Next
Next
If ieval > imax Then
imax = ieval
delValue = lookupCell.Value
End If
End If
Else
delValue = lookupCell.Value
Exit For
End If
Next lookupCell
JDELookup = delValue
End Function

This is stored in module 2 of the project (not in "Sheet1" module).

Can someone help me trouble shoot this?

Thanks,

Jason
 
Last edited:

Some videos you may like

Excel Facts

Copy formula down without changing references
If you have =SUM(F2:F49) in F50; type Alt+' in F51 to copy =SUM(F2:F49) to F51, leaving the formula in edit mode. Change SUM to COUNT.

jasonb75

Well-known Member
Joined
Dec 30, 2008
Messages
10,887
Office Version
  1. 2019
Platform
  1. Windows
Try this

Code:
Function JDELookup(ByVal lookupValue As String, ByVal lookupArray As Range)
Dim lcell As Range 'each cell in the lookupArray
Dim lookupcell As String ' value of lcell
Dim strLen1 As Integer 'length of string in cell from left table
Dim strLen2 As Integer 'length of string in cell from right table
Dim replLen1 As Integer 'length of string in cell from left table after formatting the string by replacing commas and spaces with single space
Dim replLen2 As Integer 'length of string in cell from right table after formatting the string by replacing commas and spaces with single space
Dim spaceCnt1 As Integer 'number of space found in the formatted string of cell in left table
Dim spaceCnt2 As Integer 'number of space found in the formatted string of cell in right table
Dim componCnt1 As Integer 'component count: counts the number of components in the cell from left table separated by spaces
Dim componCnt2 As Integer 'component count: counts the number of components in the cell from right table separated by spaces
Dim componString1() As String 'the string of each component from the left table
Dim componString2() As String 'the string of each component from the right table
Dim imax As Single 'the highest count of matches between the two strings
Dim ieval As Single 'the count of matches of the two strings currently being evaluated
Dim delValue As String 'the value of the sucessful lookup

lookupValue = LCase(Application.Trim(Replace(Replace(Replace(lookupValue, ",", " "), " ", " "), " ", " ")))
strLen1 = Len(lookupValue)
replLen1 = Len(Replace(lookupValue, " ", ""))
spaceCnt1 = strLen1 - replLen1
componCnt1 = spaceCnt1 + 1
componString1 = Split(lookupValue, " ")

imax = 0
For Each lcell In lookupArray
    lookupcell = LCase(Application.Trim(Replace(Replace(Replace(lcell.Value, ",", " "), " ", " "), " ", " ")))
    strLen2 = Len(lookupcell)
    replLen2 = Len(Replace(lookupcell, " ", ""))
    spaceCnt2 = strLen2 - replLen2
    componCnt2 = spaceCnt2 + 1
    componString2 = Split(lookupcell, " ")
    ieval = 0

    If lookupValue <> lookupcell Then
        If InStr(1, lookupcell, componString1(0)) Then
            For spaceCnt1 = 0 To UBound(componString1)
                For spaceCnt2 = 0 To UBound(componString2)
                    If InStr(1, componString1(spaceCnt1), componString2(spaceCnt2)) Then
                        If Len(componString1(spaceCnt1)) = 1 Then
                            ieval = ieval + 0.1
                        Else
                            ieval = ieval + 1
                        End If
                    End If
                Next
            Next
            
            If ieval > imax Then
                imax = ieval
                delValue = lookupcell
            End If
        End If
    Else
        delValue = lookupcell
        Exit For
    End If
Next lcell
    JDELookup = delValue
End Function

If that works, I'm sure it can be cleaned up a bit.
 

delos001

New Member
Joined
Oct 7, 2014
Messages
25
Thanks jasonb75. This is working! I guess it was a problem with how I set up my right table array. I'm not quite sure why yours works while mine didn't but I am happy with the fact that it works.

I do see a couple problems with my code that I will work on to make it more accurate when names are very similar. You mentioned cleaning it up: I am new to vba so this is kind of my best attempt right now. I am not asking for you to clean it up for me, but if you wanted to point me in a direction, I would appreciate it. For example, I feel like I can speed it up by reducing the if statements and handling the arrays better but not sure how to approach or what VBA functionality would be appropriate. I also saw I might have a problem with names that are very similar when using the InStr function. I originally tried using CASE and plain old IF statements, but that didn't work out for me. I also tried putting all calculations in a separate array and getting a "max" value from the array but couldn't figure out how to pull the text string associated with the max value. Searched the internet for quite a while...but multi variable arrays is a little over my head. Some posted some code called fuzzyVlookup and I think he/she used arrays to do it but I can't follow the code...too complicated. Any suggestions would be great. Again, don't do it for me...just tell me what to google. If not, no biggie.....I just ran it through a 16K row range and it took less than a second so I can live with that.

Thank you again for your help!

Jason
 

jasonb75

Well-known Member
Joined
Dec 30, 2008
Messages
10,887
Office Version
  1. 2019
Platform
  1. Windows
I'm not sure of the technical reasons why, but I found the cause of the problem in your code was this line

Code:
lookupCell = LCase(Application.Trim(Replace(Replace(Replace(lookupCell.Value, ",", " "), " ", " "), " ", " ")))

lookupcell is declared as a range, the edited value is a string. There might be a way to make it work with a single variable, but I find it best to separate them.

As for fuzzyLookup, I can't follow that code either, I think that might be a little in excess of what you need though, that would only be necessary if you needed to look for different spellings as well.

I'll continue to help you with this but first need to clarify the layout of your data.

Will the name entered in the left table always be in the same format? (First, middle, last)

In the right table, will it always be one of these formats? (First, middle, last) or (Last, first, middle)

Or are there other possibilities for either table, other than middle name being only an initial, or not included?
 

delos001

New Member
Joined
Oct 7, 2014
Messages
25

ADVERTISEMENT

Unfortunately, the short answer is that there is no consistency with the layouts of the names. I actually wrote this because I am dealing with multiple source system files each with their own layout. So the left table could be LName, FName, MName for one application and could be FName MInitial LName for another application. To make it worse, there are inconsistencies of layouts within the same source file. I have one source file that has some names as LName, FName MInitial and other names listed as LName, FName MName, and yet other names laid out as LName FName MName with no comma separating them. This become especially difficult to handle with dealing with people from Asia Pac region who frequently have multiple components to their last name or first name or both. With no comma to distinguish, I have no idea where last name ends and first begins.

So this has been the challenge and in addition to my less than stellar VB skills, a reason for some rather imprecise code.

An example is below. This could potentially be a left table or right table:
Names
Abernathy,Laura Anne
Bittenbender,Amy L
Chen Ginger
Chen Sunny
da Silva Vale Ferreira,Maria
Flores Romero,Karina Mariana
Hidayati Heidi
Huang Meredith
Ku Jacqueline
Miranda Hernandez,Yazmin Sofia
Mohd Khamis Adila
Noor Shams Munira
Phuah Sze Yee
S,Hithaishi
Ueda,Taeko

<colgroup><col></colgroup><tbody>
</tbody>

You can see that some have commas and some dont and some have multiple components before and after the commas. These would be compared to the R table which for example could have the first name on the list displayed at Laura A Abernathy or Abernathy, Laura A or Abernathy Laura Anne or Laura Anne Abernathy. All the source systems are slightly different.

I hope this helps explain. I think my code is working ok for the challenges that I have with the source data but any suggestions would be great. I am going to add some conditionals to the code you fixed to make sure names aren't returned with only one component match: For example: if componCnt = 2, then delValue must be >=1.1, else error.

Thanks again for your input.
 

jasonb75

Well-known Member
Joined
Dec 30, 2008
Messages
10,887
Office Version
  1. 2019
Platform
  1. Windows
I think that however you do this, you will have a number of wrong results.

Consider that there are several similar names in the list and the function finds multiple 'best' matches, how would go about deciding which to use? Inevitably, what is right for one will be wrong for another.
 

delos001

New Member
Joined
Oct 7, 2014
Messages
25

ADVERTISEMENT

I have to test it on the full data set, but the fake data so far has been promising. Bob M Smith in the LTable looking against Bob Michael Smith vs. Bob Smith gives me Bob Michael Smith with a score of 2.1 vs. 2.0 for Bob Smith. I will have errors when I get Bob Michael Smith and Bob Michaud Smith in the Rtable as both will get 2.1 and the function will return the first of those two whether its right or not. I will also be working with the people who manage the source systems as this error would need to be fixed on that level anyways since Bob M Smith is arbitrary being that there are two Bob M Smiths.

Thanks again for your input.
 

jasonb75

Well-known Member
Joined
Dec 30, 2008
Messages
10,887
Office Version
  1. 2019
Platform
  1. Windows
Working with the sources to try and establish a consistent format is probably the best place to start.

In addition to the existing possible problems, you also have variants of same names to allow for, Bob or Robert, Michael or Mike, etc. You could end up with pages of code for something that should need just a few lines.

My personal preference would be to have all source data in the format of Last, First Middle. All with full names, not initials, that way any confusion in multiple component names, first or last should be eliminated.
 

delos001

New Member
Joined
Oct 7, 2014
Messages
25
I totally agree....fixing the source is the best way. I'm going to do my best with that...but we are so compartmentalized that getting stuff like that done is how I, a novice, decided I could write a code easier than get them to comply. :) I hadn't thought about the name variants. I'm hoping since we do have some standard operating procedures in place even for our vendor systems, that legal names are always used at least for F&L Names. But thanks for the thought on that....now I can at least watch out for it and have the individuals address it on case by case if it exists.

Thanks again for all your input.
 

Watch MrExcel Video

Forum statistics

Threads
1,109,550
Messages
5,529,472
Members
409,884
Latest member
Msinmath
Top