gravanoc
Active Member
- Joined
- Oct 20, 2015
- Messages
- 346
- Office Version
- 365
- Platform
- Windows
- Mobile
Besides going through two lists of names one by one to find naming discrepancies, I'm trying to think of some formulas for finding these discrepancies. To illustrate, here is a name & some variations that may occur: 1) Andrew Baltic, 2) Andrew Baltic , 3) Andréw Baltic, 4) Drew Baltic, 5) Andrew Baltik, 6) andrew baltic. 1) normal/default spelling, 2) extra space in between names & after last name, 3) accent or other diacritical mark, 4) first name replaced with nickname or alias, 5) misspelling of last name, 6) lowercase name. These seem to be the most common, but I may be leaving possibilities out.
Perfection isn't the goal since sometimes there is only a first name & I can't reliably match that with a full name, or because the name is too butchered to match the default or ideal name. In this case I typically use the first or most common instance to determine the default name.
So far I've used a few techniques to help find these discrepancies. Removing duplicates, sorting, and some formulas. The TRIM function helps with #2, using LOWER or both the default & positional name is good for #6 (that is, the position as I drag down a formula which may be a match or not), but the others are trickier. For #4 & #5, I'm considering using some combination of LEFT, MID, & RIGHT on the default & positional name. For #3 I'm not sure what to use other than trying something with UNICODE to identify accent marks. Thanks for any suggestions.
Perfection isn't the goal since sometimes there is only a first name & I can't reliably match that with a full name, or because the name is too butchered to match the default or ideal name. In this case I typically use the first or most common instance to determine the default name.
So far I've used a few techniques to help find these discrepancies. Removing duplicates, sorting, and some formulas. The TRIM function helps with #2, using LOWER or both the default & positional name is good for #6 (that is, the position as I drag down a formula which may be a match or not), but the others are trickier. For #4 & #5, I'm considering using some combination of LEFT, MID, & RIGHT on the default & positional name. For #3 I'm not sure what to use other than trying something with UNICODE to identify accent marks. Thanks for any suggestions.