Find The Closest Match


May 21, 2021 - by Bill Jelen

Find The Closest Match

Challenge: People enter data in various ways. If you ask 50 sales reps to record a forecast for General Motors, you will find that there are a dozen ways to spell and/or abbreviate the name of that customer. Combine all the forecasts from all the sales reps, and you will have the same customer spelled a multitude of ways. Column A in Figure 126 shows some of the different ways to enter the names of customers whose official names are listed in column D.

Figure 126. When asked to type customer names, various employees will spell or abbreviate them in various ways.
Figure 126. When asked to type customer names, various employees will spell or abbreviate them in various ways.

Setup: This is a classic problem known as the fuzzy match problem. The problem was first discussed at the MrExcel message board back in fall 2001, with the fuzzy match challenge of the month. At that time, Juan Pablo Gonzalez wrote in with a routine to determine the percentage match between two strings. Damon Ostrander and others later followed up.

The question arose again at the message board in late 2003. Al_B_Cnu adapted the code from the challenge to write complete FuzzyVLOOKUP, FuzzyHLOOKUP, and FuzzyPercent functions. I won’t reprint the 373 lines of code here, but you can examine them in the sample file for this topic (download from www.MrExcel.com/gurufiles.html).

Solution: The FuzzyPercent function compares text from two cells and determines what percentage of the characters in the first cell are in the same sequence in the second cell. In Figure 127, cells A2 and B2 share 11 characters in common. Because cell A2 contains 11 characters total, 73% of the characters match cell B2, and the FuzzyPercent is 73%. Note that if you reverse A2 and B2, the result may be different, as shown in row 3. Here, A3 and B3 share the same 11 characters, but 11 characters is only 50% of the 22 characters found in A3.

Figure 127. The FuzzyPercent user-defined function in this workbook calculates the percentage of the characters that are in the same sequence.
Figure 127. The FuzzyPercent user-defined function in this workbook calculates the percentage of the characters that are in the same sequence.

Note that the algorithm is not perfect. Elvis and lives contains exactly the same characters, but in a completely different order. But is it completely different? Both cells have the characters l-v-s in the same sequence, so it appears to be a 3-out-of-5, or 60%, match.

The problem becomes more complex when you have to find the best matches from two lists. In Figure 128, a two-dimensional table shows how well each item from the forecast list in column A matches up with the official customer list in row 1.

Note that GM Lordstown shares just as many characters in common with General Motors as it does with Abbott Laboratories. To combat this, Al_B_ Cnu offered alternative algorithms for the FuzzyPercent test. You can try out =FuzzyPercent2() in the sample workbook to see if is matches up your data better.

Figure 128. This table highlights the best match in each row using FuzzyPercent.
Figure 128. This table highlights the best match in each row using FuzzyPercent.

To make Juan Pablo Gonzalez’s function easier, Al_B_Cnu wrote FuzzyVLOOKUP and FuzzyHLookup. Like VLOOKUP, the FuzzyVLOOKUP function can return a specific column from a table. It can also return the best match, second-best match, and so on. In Figure 129, FuzzyVLOOKUP returns the three best matches for each forecasted customer. Someone is going to have to go through the choices to figure out which entries are correct matches.

Figure 129. FuzzyVLOOKUP compares every item in the list to find the best, second-best, and third-best matches.
Figure 129. FuzzyVLOOKUP compares every item in the list to find the best, second-best, and third-best matches.

Summary: Custom functions in VBA help solve the fuzzy match problem.

Source: Closest match on the MrExcel Message Board.

Title Photo: Gordon Williams on Unsplash


This article is an excerpt from Excel Gurus Gone Wild.



Bill Jelen is the author / co-author of:
MrExcel 2021 – Unmasking Excel

This is a 5th edition of MrExcel XL. Updates for 2021 include: LAMBDA, LET, Power Query Fuzzy Match, Sort & Filter in Sheet View, Cut-out people, Save object as image, STOCKHISTORY, Wolfram Alpha Data Types, Custom Data Types from Power Query, Weather data types, bilingual spreadsheets, Performance improvements, Unhide multiple worksheets, Action pen, Collapsible task panes, LET function to re-use calculations, store formulas using LAMBDA, Recursive LAMBDA, Branching LAMBDA, Lambda to return a picture, Excel function quick reference.