nickbarnes
New Member
- Joined
- Aug 2, 2011
- Messages
- 5
Hi all,
I am trying to search two columns of strings to pick out a reference number. The standard for the reference number has been abused over the years so I'm hoping that someone here could help me with a suitable regex to pick out any one of the possibilities so that I can return the official version?
I have a list of all the reference variations but its over 300k items and it takes an age to search it all with the array formula I found.
Here are a couple of examples which should cover any eventuality, the true reference is on the left with every eventuality separated by a pipe:
ABC/012 -> ABC/012 | ABC-012 | ABC_012 | ABC012
ABC/012 -> ABC/12 | ABC-12 | ABC_12 | ABC12
D/EFG/012 -> D/EFG/012 | D-EFG-012 | D_EFG_012 | DEFG012
D/EFG/012 -> D/EFG/12 | D-EFG-12 | D_EFG_12 | DEFG12
Another region includes an additional component to the reference, its not a priority but it would be great if someone is able to include it in the search:
H/IJ/K/012 -> H/IJ/K/012 | H-IJ-K-012 | H_IJ_K_012 | HIJK012
H/IJ/K/012 -> H/IJ/K/12 | H-IJ-K-12 | H_IJ_K_12 | HIJK12
Some of the key features of the reference:
If the above variations can't be turned into a regex, perhaps someone could help me improve my lookup table efficiency?
=IFERROR(INDEX(<OfficialReferenceColumn>,MAX(IF(ISERROR(FIND(<VariationColumns>,UPPER(<TextToSearch1>))),-1,1)*ROW(<VariationColumns>)-1)),IFERROR(INDEX(<OfficialReferenceColumn>,MAX(IF(ISERROR(FIND(<VariationColumns>,UPPER(<TextToSearch2>))),-1,1)*ROW(<VariationColumns>)-1)),""))
I know this is a big ask so thanks to anyone who takes the time to have a look for me. I'd share my workbook but I'm not able to. Sorry.
Thanks again,
Nick
I am trying to search two columns of strings to pick out a reference number. The standard for the reference number has been abused over the years so I'm hoping that someone here could help me with a suitable regex to pick out any one of the possibilities so that I can return the official version?
I have a list of all the reference variations but its over 300k items and it takes an age to search it all with the array formula I found.
Here are a couple of examples which should cover any eventuality, the true reference is on the left with every eventuality separated by a pipe:
ABC/012 -> ABC/012 | ABC-012 | ABC_012 | ABC012
ABC/012 -> ABC/12 | ABC-12 | ABC_12 | ABC12
D/EFG/012 -> D/EFG/012 | D-EFG-012 | D_EFG_012 | DEFG012
D/EFG/012 -> D/EFG/12 | D-EFG-12 | D_EFG_12 | DEFG12
Another region includes an additional component to the reference, its not a priority but it would be great if someone is able to include it in the search:
H/IJ/K/012 -> H/IJ/K/012 | H-IJ-K-012 | H_IJ_K_012 | HIJK012
H/IJ/K/012 -> H/IJ/K/12 | H-IJ-K-12 | H_IJ_K_12 | HIJK12
Some of the key features of the reference:
- There are never more than 3 characters before a /
- There should always be 3 digits at the end, with leading zeroes for single and double digit numbers. Folks have sometimes left these off, hence the extra variation above
- Numbers and letters are never mixed together between two /
- The reference is always between 4 and 10 characters in length, including all numbers, letters and slashes
If the above variations can't be turned into a regex, perhaps someone could help me improve my lookup table efficiency?
=IFERROR(INDEX(<OfficialReferenceColumn>,MAX(IF(ISERROR(FIND(<VariationColumns>,UPPER(<TextToSearch1>))),-1,1)*ROW(<VariationColumns>)-1)),IFERROR(INDEX(<OfficialReferenceColumn>,MAX(IF(ISERROR(FIND(<VariationColumns>,UPPER(<TextToSearch2>))),-1,1)*ROW(<VariationColumns>)-1)),""))
I know this is a big ask so thanks to anyone who takes the time to have a look for me. I'd share my workbook but I'm not able to. Sorry.
Thanks again,
Nick