I'm leading a project to remove "fuzzy" duplicates from a large supplier listing and I could use some more ideas.

TheProvince

New Member
Joined
Jul 29, 2013
Messages
1
Our database of suppliers has 60,000 supplier sites, many of which (we have no idea how many) are duplicated.

The system enforces unique vendor names, but often times departments will request new suppliers be set up even though they already exist in the system, and data entry staff will append the names, or make a typo, etc. and create a duplicate.

Further complicating this is that vendors that have multiple "sites" (addresses, payment terms, operating name etc) should have the same vendor #, but have each site listed in the system. So if a business we deal with is actually "CORPORATION 123 INC", but it has "The Window Store", "Windows are us" etc., there should only be one Vendor name (CORPORATION 123 INC), and that vendor should have multiple sites.

We've come up with a few Heuristics to generate shorter lists that can be reviewed manually, like:
  • =left( pulls out the first ~5 characters, then I use another column to count items that are listed more than once, filter it, then send it to a team member for manual review
  • using a similar method to find Tax ID numbers associated with multiple vendor IDs
  • manually (but very quickly) reviewing the whole listing
  • manually reviewing listings that contain parenthesis (using filter), as those frequently occur in duplicates

Any other ideas would be most appreciated! I could probably convince my boss to spend money on applications, software, etc. if it would be worthwhile.
 

Excel Facts

Formula for Yesterday
Name Manager, New Name. Yesterday =TODAY()-1. OK. Then, use =YESTERDAY in any cell. Tomorrow could be =TODAY()+1.
You could start by trying this on a copy of your data. You will need to change the sheet name to the name of the sheet containing your data, your data MUST have headers and you will need to change the number of the column to the number of the column containing your 'Vendor' name. Both highlighted with red text. This macro will create a separate worksheet for every vendor in the vendor column, you can then sort by any column you like and do a duplicate search or whatever you want. Hope it helps
Code:
Option Explicit


Sub ParseItems()
'Author:    Jerry Beaucaire
'Date:      11/11/2009
'Summary:   Based on selected column, data is filtered to individual sheets
'           Creates sheets and sorts sheets alphabetically in workbook
'           6/10/2010 - added check to abort if only one value in vCol
'           7/22/2010 - added ability to parse numeric values consistently
'           11/16/2011 - changed way Unique values are collected, no Adv Filter


Dim LR As Long, Itm As Long, MyCount As Long, vCol As Long, iCol As Long
Dim ws As Worksheet, MyArr As Variant, vTitles As String, TitleRow As Long


Application.ScreenUpdating = False


'Column to evaluate from, column A = 1, B = 2, etc.
   vCol = 1 [COLOR=#ff0000]'change column number here[/COLOR]
 
'Sheet with data in it
   Set ws = Sheets("Data") [COLOR=#ff0000]'change sheet name here[/COLOR]


'Range where titles are across top of data, as string, data MUST
'have titles in this row, edit to suit your titles locale
    vTitles = "A1:Z1"
    TitleRow = Range(vTitles).Cells(1).Row


'Spot bottom row of data
   LR = ws.Cells(ws.Rows.count, vCol).End(xlUp).Row


'Get a temporary list of unique values from vCol
    iCol = ws.Columns.count
    ws.Cells(1, iCol) = "key"
    
    For Itm = 2 To LR
        On Error Resume Next
        If ws.Cells(Itm, vCol) <> "" And Application.WorksheetFunction _
            .Match(ws.Cells(Itm, vCol), ws.Columns(iCol), 0) = 0 Then
               ws.Cells(ws.Rows.count, iCol).End(xlUp).Offset(1) = ws.Cells(Itm, vCol)
        End If
    Next Itm
'Sort the temporary list
    ws.Columns(iCol).Sort Key1:=ws.Cells(2, iCol), Order1:=xlAscending, _
        Header:=xlYes, OrderCustom:=1, MatchCase:=False, _
        Orientation:=xlTopToBottom, DataOption1:=xlSortNormal


'Put list into an array for looping
    MyArr = Application.WorksheetFunction.Transpose _
        (ws.Columns(iCol).SpecialCells(xlCellTypeConstants))


'clear temporary list
    ws.Columns(iCol).Clear


'Turn on the autofilter
    ws.Range(vTitles).AutoFilter


'Loop through list one value at a time
'The array includes the title cell, so we start at the second value in the array
'In case values are numerical, we convert them to text with ""
    For Itm = 2 To UBound(MyArr)
        ws.Range(vTitles).AutoFilter Field:=vCol, Criteria1:=MyArr(Itm) & ""
    
        If Not Evaluate("=ISREF('" & MyArr(Itm) & "'!A1)") Then    'create sheet if needed
            Worksheets.Add(After:=Worksheets(Worksheets.count)).Name = MyArr(Itm) & ""
        Else                                                      'clear sheet if it exists
            Sheets(MyArr(Itm) & "").Move After:=Sheets(Sheets.count)
            Sheets(MyArr(Itm) & "").Cells.Clear
        End If
    
        ws.Range("A" & TitleRow & ":A" & LR).EntireRow.Copy _
            Sheets(MyArr(Itm) & "").Range("A1")
        
        ws.Range(vTitles).AutoFilter Field:=vCol
        MyCount = MyCount + Sheets(MyArr(Itm) & "").Range("A" & Rows.count) _
                             .End(xlUp).Row - Range(vTitles).Rows.count
        Sheets(MyArr(Itm) & "").Columns.AutoFit
    Next Itm
    
'Cleanup
    ws.AutoFilterMode = False
        ws.Activate
    MsgBox "Rows with data: " & (LR - TitleRow) & vbLf & "Rows copied to other sheets: " _
                & MyCount & vbLf & "Hope they match!!"


Application.ScreenUpdating = True
End Sub
 
Upvote 0

Forum statistics

Threads
1,216,114
Messages
6,128,910
Members
449,478
Latest member
Davenil

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top