Hi - I've a list of unique 8-digit numbers (identities). While I know that each of them is unique, some of them are very similar. For example only one digit might be different, or two digits might be in the opposite order, at any location within the 8-digit sequence.
People will be manually entering data against these 8-digit numbers. Some are not great at data entry, and I'm concerned that they may enter it against the wrong identity.
E.g. one identity is 12345678 and another is 12346578. The inattentive worker may put the data belonging to the former against the ID number for the latter.
Therefore I want to draw up a list of 'look-alikes' - pairs (or perhaps more - there could be clusters of three or four IDs similar to each other) of numbers within the database that I can highlight as needing special attention when entering data against them (and that can be double-checked afterwards).
I downloaded the Excel fuzzy lookup add-on from Microsoft, and set it up to compare the table of ID numbers with a copy of itself, but reporting a relatively low similarity threshold and a high number of matches. But even though I brought the similarity down to 0.25 and the number of matches up to 10 for experimental purposes, it still only ever reported one result for each number - the perfect similarity matches with the same ID numbers in the second copy of the table. Whereas what I'm really interested in are the ones with slightly lower similarity scores in order to pick out the non-identical but look-alike IDs.
I know there are definitely some look-alikes within the database because I've identified a few of them manually.
I'm probably doing something very basic wrong but I can't see what it is. Can anyone please help? Or alternatively suggest a different way of achieving what I want (flagging the look-alikes)?
(Note - the data entry has to be manual. It's first time entry of new data that doesn't yet exist in any database. Therefore it's not simply a question of using vlookup or similar to get it from one place to another.)
Many thanks for any help.
People will be manually entering data against these 8-digit numbers. Some are not great at data entry, and I'm concerned that they may enter it against the wrong identity.
E.g. one identity is 12345678 and another is 12346578. The inattentive worker may put the data belonging to the former against the ID number for the latter.
Therefore I want to draw up a list of 'look-alikes' - pairs (or perhaps more - there could be clusters of three or four IDs similar to each other) of numbers within the database that I can highlight as needing special attention when entering data against them (and that can be double-checked afterwards).
I downloaded the Excel fuzzy lookup add-on from Microsoft, and set it up to compare the table of ID numbers with a copy of itself, but reporting a relatively low similarity threshold and a high number of matches. But even though I brought the similarity down to 0.25 and the number of matches up to 10 for experimental purposes, it still only ever reported one result for each number - the perfect similarity matches with the same ID numbers in the second copy of the table. Whereas what I'm really interested in are the ones with slightly lower similarity scores in order to pick out the non-identical but look-alike IDs.
I know there are definitely some look-alikes within the database because I've identified a few of them manually.
I'm probably doing something very basic wrong but I can't see what it is. Can anyone please help? Or alternatively suggest a different way of achieving what I want (flagging the look-alikes)?
(Note - the data entry has to be manual. It's first time entry of new data that doesn't yet exist in any database. Therefore it's not simply a question of using vlookup or similar to get it from one place to another.)
Many thanks for any help.