TheProvince
New Member
- Joined
- Jul 29, 2013
- Messages
- 1
Our database of suppliers has 60,000 supplier sites, many of which (we have no idea how many) are duplicated.
The system enforces unique vendor names, but often times departments will request new suppliers be set up even though they already exist in the system, and data entry staff will append the names, or make a typo, etc. and create a duplicate.
Further complicating this is that vendors that have multiple "sites" (addresses, payment terms, operating name etc) should have the same vendor #, but have each site listed in the system. So if a business we deal with is actually "CORPORATION 123 INC", but it has "The Window Store", "Windows are us" etc., there should only be one Vendor name (CORPORATION 123 INC), and that vendor should have multiple sites.
We've come up with a few Heuristics to generate shorter lists that can be reviewed manually, like:
Any other ideas would be most appreciated! I could probably convince my boss to spend money on applications, software, etc. if it would be worthwhile.
The system enforces unique vendor names, but often times departments will request new suppliers be set up even though they already exist in the system, and data entry staff will append the names, or make a typo, etc. and create a duplicate.
Further complicating this is that vendors that have multiple "sites" (addresses, payment terms, operating name etc) should have the same vendor #, but have each site listed in the system. So if a business we deal with is actually "CORPORATION 123 INC", but it has "The Window Store", "Windows are us" etc., there should only be one Vendor name (CORPORATION 123 INC), and that vendor should have multiple sites.
We've come up with a few Heuristics to generate shorter lists that can be reviewed manually, like:
- =left( pulls out the first ~5 characters, then I use another column to count items that are listed more than once, filter it, then send it to a team member for manual review
- using a similar method to find Tax ID numbers associated with multiple vendor IDs
- manually (but very quickly) reviewing the whole listing
- manually reviewing listings that contain parenthesis (using filter), as those frequently occur in duplicates
Any other ideas would be most appreciated! I could probably convince my boss to spend money on applications, software, etc. if it would be worthwhile.