Fuzzy Matching - new version plus explanation

al_b_cnu

Well-known Member
Joined
Jul 18, 2003
Messages
4,545
It has been a while since I originally posted my Fuzzy matching UDF’s on the board, and several variants have appeared subsequently.

I thought it time to ‘put the record straight’ & post a definitive version which contains slightly more efficient code, and better matching algorithms, so here it is.

Firstly, I must state that the Fuzzy matching algorithms are very CPU hungry, and should be used sparingly. If for instance you require to lookup a match for a string which starts with, contains or ends with a specified value, this can be performed far more efficiently using the MATCH function:
Fuzzy Examples.xls
ABCDE
1Starts WithEndsContains
2BilljelenBill
3Mr Bill Jelen433
4Bill Jelen
5Joe Bloggs
6Fred Smith
MATCH Example


... Continued ...
 
Hi Juke,

Welcome to the board, and thanks for the plaudits.
Given the heavy CPU usage of the function, I'm guessing that it hasn't saved you THAT much time, and you've had to set Calculation to 'Manual', but at least you get to drink lots of coffee ;)
 
Upvote 0

Excel Facts

Can a formula spear through sheets?
Use =SUM(January:December!E7) to sum E7 on all of the sheets from January through December
Hi Juke,

Welcome to the board, and thanks for the plaudits.
Given the heavy CPU usage of the function, I'm guessing that it hasn't saved you THAT much time, and you've had to set Calculation to 'Manual', but at least you get to drink lots of coffee ;)

Hahaha Sandybridge i7 + 8 gig of DDR3 means I'm doing alright. 2 parts of my role mean I'm a sysadmin and a data analyst all in one. Best of both worlds in one skillset. :)

Only 300 or so records this time round across 50 extracted schemas so not too bad. Pasting the results back in as values before going any further helps too!
 
Upvote 0
As a matter of interest, which algorithm did you prefer?
I tend to go for algorithm 2 which matches pairs, triples etc which seems to be the most useful (and gives more consistent results over algorithm 1.

I'm sure you've realised that the default Algorithm 3 is merely an average of results from algorithms 1 & 2.
 
Upvote 0
My whole problem with all vba solutions is they are inefficient in utilizing the full power of the system. I almost never get past the 25% CPU usage because of the single thread construct of vba. I still trying to understand and implement Excel DNA which promised to allow multithreading for vba by tapping into the .Net infrastructure.

Any thoughts or experience?

AMAS

P.S. This might be obvious but Alg 4 is naturally the slowest, but all seem to be using the full 25% power that is allotted to them.
 
Upvote 0
As a matter of interest, which algorithm did you prefer?
I tend to go for algorithm 2 which matches pairs, triples etc which seems to be the most useful (and gives more consistent results over algorithm 1.

I'm sure you've realised that the default Algorithm 3 is merely an average of results from algorithms 1 & 2.
I went with 3 in the end because the string I was matching was a combination of a common reference number with a unique identifier for annual returns for that organisation attached to the end. Algorithm 2 probably would have done the trick.

My whole problem with all vba solutions is they are inefficient in utilizing the full power of the system. I almost never get past the 25% CPU usage because of the single thread construct of vba. I still trying to understand and implement Excel DNA which promised to allow multithreading for vba by tapping into the .Net infrastructure.

Any thoughts or experience?

AMAS

P.S. This might be obvious but Alg 4 is naturally the slowest, but all seem to be using the full 25% power that is allotted to them.

I've noticed Excel seems to limit itself to 25% CPU utilisation as a general rule so regardless of whether you're using out of the box functions or VBA macros it's unlikely you'll go past that thresh-hold.

Never tried this but it may help - with Excel running go into task manager > Processes tab, right click on excel.exe and set priority to high. (realtime usually causes crashes or system instability).

1005201284851am.jpg


EDIT: Can we post images from IMGUR here? Wanted to attach it to my post but I don't have the necessary privileges.
 
Last edited:
Upvote 0
Hi Juke,

Built-in Excel formulas can absorb all 100% CPU if you are using post-2003 office and have clicked on allowing mult-threading. Unfortunately, vba is not naturally multi-threaded and so we have to get creative :)

AMAS
 
Upvote 0
Hi Juke,

Built-in Excel formulas can absorb all 100% CPU if you are using post-2003 office and have clicked on allowing mult-threading. Unfortunately, vba is not naturally multi-threaded and so we have to get creative :)

AMAS

Yeah I have multi-threading enabled but am running 32bit Office on a 64 bit system due to my organisation not having a clue.

Thanks though :)
 
Upvote 0
Hi Juke,

32-bit shouldn't stop you from using 100% CPU. Create a workbook and put a 1000 sumproduct formulas or Index/ match or indirect (something volatile) and and watch that CPU run like there's no tomorrow :). You just won't be able to go over the 2 GB RAM limit.

AMAS
 
Upvote 0
Hi Juke,

32-bit shouldn't stop you from using 100% CPU. Create a workbook and put a 1000 sumproduct formulas or Index/ match or indirect (something volatile) and and watch that CPU run like there's no tomorrow :). You just won't be able to go over the 2 GB RAM limit.

AMAS
Limits to 25% unless I unleash the priority over normal. Not sure how you're getting it to do that!
 
Upvote 0
Dear al_b_cnu,

I am working on a project that requires use of your FuzzyVLookup program, but I cannot get it to work. I am relatively new to VBA so I am probably making a novice mistake.
I made an excel file identical to the one you created in the very first post (to try it on a sample sheet before applying it to my actual project), and I pasted in your formulas, but I always get a "#NAME?" error. I have made sure all the appropriate fields (Column A, Column B, and Cell C2 with value 1) are filled out. Could you, or anyone else, offer me an explanation as to fixing this problem?


Thank you,

Ankush

NewNamesFiles.bat</SPAN>
Copy</SPAN>
=fuzzyvlookup($B3,$A:$A,1,,,C$2)</SPAN>
=fuzzyvlookup($B3,$A:$A,1,,,D$2)</SPAN>
=fuzzyvlookup($B3,$A:$A,1,,,E$2)</SPAN>
the other bat files.bat</SPAN>
Dat</SPAN>
=fuzzyvlookup($B4,$A:$A,1,,,C$2)</SPAN>
=fuzzyvlookup($B4,$A:$A,1,,,D$2)</SPAN>
=fuzzyvlookup($B4,$A:$A,1,,,E$2)</SPAN>
thebatfiles.bat</SPAN>
Example movie</SPAN>
=fuzzyvlookup($B5,$A:$A,1,,,C$2)</SPAN>
=fuzzyvlookup($B5,$A:$A,1,,,D$2)</SPAN>
=fuzzyvlookup($B5,$A:$A,1,,,E$2)</SPAN>
Copy.docx</SPAN>
Other Bat File</SPAN>
=fuzzyvlookup($B6,$A:$A,1,,,C$2)</SPAN>
=fuzzyvlookup($B6,$A:$A,1,,,D$2)</SPAN>
=fuzzyvlookup($B6,$A:$A,1,,,E$2)</SPAN>

<TBODY>
</TBODY>
 
Upvote 0

Forum statistics

Threads
1,215,510
Messages
6,125,234
Members
449,216
Latest member
biglake87

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top