Sorin Sion
New Member
- Joined
- Oct 26, 2011
- Messages
- 3
A common problem in geomarketing (and not only) is matching sets of addresses/names from various sources. A fuzzy matching algorithm aids in matching "dirty" data with some form of "standard" data, based on a similarity score. The length of the strings and of the compared lists greatly influences the matching speed, so you need fast algorithms to do the core job, that of scoring pairs of strings.
After trying several approaches I am now mildly content regarding the speed of the algorithm I developed but I am sure that there might be some other ways to tackle this problem and to further accelerate the matching process.
The algorithm in it’s current form computes the frequency of common characters between the two input strings and also the frequency of identical tuples (two-character sequences), weights them and builds a normalized score in the range of [0…1].
I released the source code under GNU Lesser GPL at http://code.google.com/p/fast-vba-fuzzy-scoring-algorithm/source/browse/trunk/Fuzzy1.
The project's main objective is to build the fastest possible similarity scoring algorithm and migrate it's logic in a DLL to be called in Excel/Access VBA modules.
Please visit the project’s page, check the code and, if interested, contribute in some way to it’s development.
Kind regards,
Sorin
After trying several approaches I am now mildly content regarding the speed of the algorithm I developed but I am sure that there might be some other ways to tackle this problem and to further accelerate the matching process.
The algorithm in it’s current form computes the frequency of common characters between the two input strings and also the frequency of identical tuples (two-character sequences), weights them and builds a normalized score in the range of [0…1].
I released the source code under GNU Lesser GPL at http://code.google.com/p/fast-vba-fuzzy-scoring-algorithm/source/browse/trunk/Fuzzy1.
The project's main objective is to build the fastest possible similarity scoring algorithm and migrate it's logic in a DLL to be called in Excel/Access VBA modules.
Please visit the project’s page, check the code and, if interested, contribute in some way to it’s development.
Kind regards,
Sorin