Fuzzy Duplicates, must return values "True" or "False"

Ricktables

New Member
Joined
Jul 4, 2012
Messages
9
Hi, I've read as much of this forum as I could, and I have also seen FuzzyVLookup, but it doesn't suit my needs (or I don't know how to use it properly, could be the case too.)

I have a database of 90 000 lines, which contain clients, company names, emails, logos, etc. by column.

I need to find all the duplicates by company name, even if there is a typo, or if the company name is partial ( THING, THING LTD, THING LIMITED, and THING INC all need to turn up). If the company name is a duplicate, I need the function to return "True" in another cell.

I work under Windows 7, with Excel 2010. If I need to give any more info out, I will do so.

Thank you very much for your time.
 

Excel Facts

Format cells as currency
Select range and press Ctrl+Shift+4 to format cells as currency. (Shift 4 is the $ sign).
Hi, I've read as much of this forum as I could, and I have also seen FuzzyVLookup, but it doesn't suit my needs (or I don't know how to use it properly, could be the case too.)

I have a database of 90 000 lines, which contain clients, company names, emails, logos, etc. by column.

I need to find all the duplicates by company name, even if there is a typo, or if the company name is partial ( THING, THING LTD, THING LIMITED, and THING INC all need to turn up). If the company name is a duplicate, I need the function to return "True" in another cell.

I work under Windows 7, with Excel 2010. If I need to give any more info out, I will do so.

Thank you very much for your time.

This procedure will look in a column of company names for variations of "GB Amalgamated Things, Ltd." It uses wild cards on either side of a portion of the company name.
Adapt sheet name and column numbers to your needs:

Code:
Sub lime()
Dim sh As Worksheet, rng As Range, lr As Long, c As Range
Set sh = Sheets(1)
lr = sh.Cells(Rows.Count, 6).End(xlUp).Row 'Find last row in column
Set rng = sh.Range("F2:F" & lr) 'Create a limited range to loop through
For Each c In rng 'Initiate the loop
If LCase(c.Value) Like "*thin*" Or LCase(c.Value) Like "thin*" Then 'match anything with part of name in it.
c.Offset(0, 6) = "True"
End If
Next
End Sub
Code:

Of course, this will also return "Joe's Paint Thinner Co.", "Thinline Clock Co.", etc. but it will be easier to weed those out of the results than to go through 90, 000 records manually. The more letters that can be confidently used with the wild cards (*), the more pure the results. For instance: "*amal*thin*" might be better, but you would need to be confident that all of the records had at least the "amal" in the name. I would personally go for using the least sized string that can reasonably be expected to be in each record.

Using the wild card can be tricky in that if you put an asterisk before the string, VBA will expect there to be characters before the string to match and if there are none, will ignore it. Same thing on the back end of the string. So, when using it in code with the Like Operator, be sure that the configuration of the wildcard will simulate the configuration of the search item.
Good luck!
 
Last edited:
Upvote 0
Hi,

Thanks !

Unfortunatly With your macro I need to modify the Company name for every instance of the macro that I use (about 70 000 times, seeing as not that many company are duplicates but that at times there can be as many as 20 duplicates...).

I thought about replacing the "thin" part with "ActiveCell" but that doesn't give the option to use wildcards. Any ideas ?
 
Upvote 0
Hi,

Thanks !

Unfortunatly With your macro I need to modify the Company name for every instance of the macro that I use (about 70 000 times, seeing as not that many company are duplicates but that at times there can be as many as 20 duplicates...).

I thought about replacing the "thin" part with "ActiveCell" but that doesn't give the option to use wildcards. Any ideas ?

You can use wild cards with ActiveCell

If LCase(c.Value) Like LCase("*" & ActiveCell & "*") Then

But again, be cautious about the leading wild card. If you do not anticipate any text before your ActiveCell value, then do not include the leading wild card.
 
Upvote 0

Forum statistics

Threads
1,214,920
Messages
6,122,276
Members
449,075
Latest member
staticfluids

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top