Find first matching substring in a list and return adjacent value from another list

chris32

New Member
Joined
May 23, 2021
Messages
11
Office Version
  1. 365
Platform
  1. Windows
Hi there,

I am trying to find a way to harmonise the way a product is coded from various ways people may have entered it (i.e., the cut of fish - in parentheses here but note that this is not always the case).
For example, in the sample data below, "Gutted" and "whole gutted" mean the same thing, but "headed & gutted" is a different thing, and "whole" alone is yet a different thing.

TAB "Raw data", column A:
1621826171716.png


I am trying to automate the translation of these multiple possible terms people may use into a standard state code, by means of a formula in column B.
I have created a list of possible ways to name various cuts of fish (i.e. StateName) in a different tab "States translation" (column A), and the corresponding standard code (StateCode) next to it (column B).

TAB "States translation", column C and D:
1621824108573.png


I have made these two lists into arrays that I have named "StateName" and "StateCode" and that I would like to refer to in the tab where my raw data will be.
As you see in the two columns above, different states may include the same terms (e.g., gutted is found in GGT, GHCO, HGT and HGU) and so I have organised my list of possible terms in a hierarchical order, so that the search should end when the first matching StateName is found withing the string in column D.

So for example, for the fifth term in column A, "Broadbill Swordfish (Headed & Gutted)", I'd like the formula to return HGU (rather than any of the codes above).

I have tried various methods (lookup, index, match) unsuccessfully as they either find multiple matches or result in lists of the same length as my arrays.

I have a very manual and inelegant solution to this problem, using IFS, ISNUMBER and SEARCH, but I am sure there must be a much more straightforward way than this:

=IF(D2<>"",IFS(ISNUMBER(SEARCH('States translation'!$C$2,D2)),'States translation'!$D$2,ISNUMBER(SEARCH('States translation'!$C$3,D2)),'States translation'!$D$3,ISNUMBER(SEARCH('States translation'!$C$4,D2)),'States translation'!$D$4,ISNUMBER(SEARCH('States translation'!$C$5,D2)),'States translation'!$D$5, [repeat same arguments with the entire list of 180+ items :eek:] , TRUE,"Check"),"")

Many thanks for your help!

PS: Note that more state names could always be added to my arrays later on if it turn out that people have used names that were not covered in my current lists.
 

Attachments

  • 1621823949407.png
    1621823949407.png
    23.2 KB · Views: 4

Excel Facts

Move date out one month or year
Use =EDATE(A2,1) for one month later. Use EDATE(A2,12) for one year later.

chris32

New Member
Joined
May 23, 2021
Messages
11
Office Version
  1. 365
Platform
  1. Windows
Sorry, I got my column names mixed up (in my file, column A in tab "Raw data" is actually column D too.

Here's how it is organised:

1621827464342.png


1621827311761.png
 

Attachments

  • 1621827274051.png
    1621827274051.png
    119.4 KB · Views: 4
  • 1621827438264.png
    1621827438264.png
    44.2 KB · Views: 3

Peter_SSs

MrExcel MVP, Moderator
Joined
May 28, 2005
Messages
50,509
Office Version
  1. 365
Platform
  1. Windows
Welcome to the MrExcel board!

MrExcel has a tool called “XL2BB” that lets you post samples of your data that will allow us to copy/paste it to our Excel spreadsheets, so we can work with the same copy of data that you are without a mass of manual typing from images like that. Instructions on using this tool can be found here: XL2BB Add-in

Note that there is also a "Test Here” forum on this board. This is a place where you can test using this tool (or any other posting techniques that you want to test) before trying to use those tools in your actual posts.
 

chris32

New Member
Joined
May 23, 2021
Messages
11
Office Version
  1. 365
Platform
  1. Windows
Welcome to the MrExcel board!

MrExcel has a tool called “XL2BB” that lets you post samples of your data that will allow us to copy/paste it to our Excel spreadsheets, so we can work with the same copy of data that you are without a mass of manual typing from images like that. Instructions on using this tool can be found here: XL2BB Add-in

Note that there is also a "Test Here” forum on this board. This is a place where you can test using this tool (or any other posting techniques that you want to test) before trying to use those tools in your actual posts.

Hi Peter,

thank you for the tip. I am not on a personal computer unfortunately, and can't add the add-in.

If it is of any help, here's a text version of the data sample (I have used semicolon as separators so that the data can be copy-pasted and separated into an excel sheet).
Any help is very much appreciated as I realised my inelegant solution won't work because it exceeds the maximum number of characters that excel allows within a cell!

Tab "raw data":
Product ;State
School Shark (Skinless Boneless Fillets) ;<formula with result>
Groper (Headed & Gutted) ;<formula with result>
Trumpeter (Headed & Gutted) ;<formula with result>
Ling (Dressed) ;<formula with result>
Trumpeter (Headed & Gutted) ;<formula with result>
Chilled Ling (Dressed) ;<formula with result>
School Shark (Dressed) ;<formula with result>
Groper (Headed & Gutted) ;<formula with result>
BlueCod (Whole Gutted) ;<formula with result>
Trumpeter (Headed & Gutted) ;<formula with result>
Bluenose (Whole) ;<formula with result>
Rig (Dressed) ;<formula with result>
Chilled Groper (Headed & Gutted) ;<formula with result>
Chilled Ling (Dressed) ;<formula with result>
Chilled Trumpeter (Headed & Gutted) ;<formula with result>
Chilled Ling (Dressed) ;<formula with result>
Chilled Trumpeter (Headed & Gutted) ;<formula with result>
Ling (Dressed) ;<formula with result>
Ling (Dressed) ;<formula with result>
Sandflounder (Whole Gutted) ;<formula with result>
Yellowbelly (Whole Gutted) ;<formula with result>
Chilled Yellowbelly (Whole Gutted) ;<formula with result>
Chilled Broadbill Swordfish (Headed & Gutted) ;<formula with result>
School Shark (Dressed) ;<formula with result>
Trumpeter (Headed & Gutted) ;<formula with result>
Bluenose (Whole) ;<formula with result>

Tab "States translation":
StateName; StateCode
gilled gutted tail-off; GGT
gilled, gutted, tail-off; GGT
gilled gutted tail off; GGT
gilled, gutted, tail off; GGT
gilled and gutted tail off; GGT
gilled and gutted, tail off; GGT
gilled & gutted tail off; GGT
gilled & gutted, tail off; GGT
gilled & gutted tail-off; GGT
gilled & gutted, tail-off; GGT
gilled, gutted, tailed; GGT
gilled gutted tailed; GGT
gilled and gutted, tailed; GGT
gilled & gutted, tailed; GGT
gilled and gutted; GGO
gilled & gutted; GGO
gilled gutted; GGO
gilled gutted tail-on; GGO
gilled, gutted, tail-on; GGO
gilled gutted tail on; GGO
gilled, gutted, tail on; GGO
headed gutted tailed; HGT
headed, gutted, tailed; HGT
headed and gutted, tailed; HGT
headed & gutted, tailed; HGT
headed and gutted tailed; HGT
headed & gutted tailed; HGT
headed gutted tail off; HGT
headed gutted tail-off; HGT
headed, gutted, tail off; HGT
headed, gutted, tail-off; HGT
headed & gutted; HGU
headed &gutted; HGU
headed& gutted; HGU
headed&gutted; HGU
headed and gutted; HGU
headed gutted; HGU
dressed-straight cut; DSC
dressed, straight cut; DSC
dressed straight cut; DSC
dressed - strait cut; DSC
straight cut; DSC
dressed-v cut; DVC
dressed, v cut; DVC
dressed v cut; DVC
dressed - v cut; DVC
v cut; DVC
dressed; DRE
skin-on untrimmed fillet; UTF
skin on untrimmed fillet; UTF
untrimmed fillet; UTF
untrimmed fillet skin on; UTF
untrimmed fillets skin on; UTF
untrimmed fillet skin-on; UTF
untrimmed fillets skin-on; UTF
skin-on trimmed fillet; TRF
skin on trimmed fillet; TRF
trimmed fillet; TRF
trimmed fillet skin on; TRF
trimmed fillets skin on; TRF
trimmed fillet skin-on; TRF
trimmed fillets skin-on; TRF
skin-off untrimmed fillet; USK
skin off untrimmed fillet; USK
untrimmed fillet skin off; USK
untrimmed fillets skin off; USK
untrimmed fillet skin-off; USK
untrimmed fillets skin-off; USK
skinned untrimmed fillet; USK
skinned untrimmed fillet; USK
skinless untrimmed fillet; USK
untrimmed fillet skinless; USK
skin-off trimmed fillet; TSK
skin off trimmed fillet; TSK
trimmed fillet skin off; TSK
trimmed fillets skin off; TSK
trimmed fillet skin-off; TSK
trimmed fillets skin-off; TSK
skinned trimmed fillet; TSK
skinned trimmed fillet; TSK
skinless trimmed fillet; TSK
trimmed fillet skinless; TSK
 

Peter_SSs

MrExcel MVP, Moderator
Joined
May 28, 2005
Messages
50,509
Office Version
  1. 365
Platform
  1. Windows

ADVERTISEMENT

If it is of any help, here's a text version of the data ..
Yes, that helps greatly, thanks. (y)

Is this what you are trying to do?
I am assuming the "Not Found" results arise from not having the entire data from 'States translation'

chris32.xlsm
CD
1Product State
2School Shark (Skinless Boneless Fillets)Not Found
3Groper (Headed & Gutted)HGU
4Trumpeter (Headed & Gutted)HGU
5Ling (Dressed)DRE
6Trumpeter (Headed & Gutted)HGU
7Chilled Ling (Dressed)DRE
8School Shark (Dressed)DRE
9Groper (Headed & Gutted)HGU
10BlueCod (Whole Gutted)Not Found
11Trumpeter (Headed & Gutted)HGU
12Bluenose (Whole)Not Found
13Rig (Dressed)DRE
14Chilled Groper (Headed & Gutted)HGU
15Chilled Ling (Dressed)DRE
16Chilled Trumpeter (Headed & Gutted)HGU
17Chilled Ling (Dressed)DRE
18Chilled Trumpeter (Headed & Gutted)HGU
19Ling (Dressed)DRE
20Ling (Dressed)DRE
21Sandflounder (Whole Gutted)Not Found
22Yellowbelly (Whole Gutted)Not Found
23Chilled Yellowbelly (Whole Gutted)Not Found
24Chilled Broadbill Swordfish (Headed & Gutted)HGU
25School Shark (Dressed)DRE
26Trumpeter (Headed & Gutted)HGU
27Bluenose (Whole)Not Found
Raw data
Cell Formulas
RangeFormula
D2:D27D2=IFNA(VLOOKUP(REPLACE(LEFT(C2,LEN(C2)-1),1,FIND("(",C2),""),'States translation'!C$2:D$83,2,0),"Not Found")
 

Jake975

Board Regular
Joined
Feb 9, 2020
Messages
114
Office Version
  1. 2016
Platform
  1. Windows
Maybe this will help if you modify it to your data and what you need returned.
VBA Code:
   Dim rng As Range, WS1 As Worksheet, WS2 As Worksheet, desWS As Worksheet, fnd As Range
    Set WS1 = Sheets("Temp")
    Set WS2 = Sheets("Data List")
For Each rng In WS2.Range("C2", WS2.Range("C" & WS2.Rows.Count).End(xlUp))
        Set fnd = WS1.Range("A:A").Find(rng, LookIn:=xlValues, lookat:=xlPart)
        If Not fnd Is Nothing Then
            fnd.Offset(, 1) = rng.Offset(, 1)
        End If
    Next rng
 

chris32

New Member
Joined
May 23, 2021
Messages
11
Office Version
  1. 365
Platform
  1. Windows

ADVERTISEMENT

Hi Peter,

thank you for the formula! I have just added $ signs so that it searches in the same array when I drag the formula down.

Excel Formula:
=IFNA(VLOOKUP(REPLACE(LEFT(D35,LEN(D35)-1),1,FIND("(",D35),""),'States translation'!$C$2:$D$200,2,0),"Not Found")

It works in most instances but not every time. I think it is because it relies on finding the closing parenthesis and then the next letter to the left from there, is that right?

Some entries returned a "Not found" when the states were in plural but in singular in my list (e.g., "Chilled Sea Perch (Skinned and Boned Fillets)" returned a "Not found" even if "Skinned and Boned Fillet" is in my list). I could solve this by pluralising terms in my list. However, some entries do not contain parentheses (e.g., "Frozen Tuna Whole") , in which case I've got a #VALUE! error.

One case I don't get is with this single search term "gutted":

"Butterfish (Gutted)" returns a "Not found" even if "Gutted" is in my list. It is at the bottom of my list because there are many other states that contain "gutted" associated with other words, like so:
fillet skin on; FIL
fillets skin on; FIL
fillets skin-on; FIL
fillet skin on; FIL
headed, gutted and finned; HGF
headed, gutted & finned; HGF
headed gutted finned; HGF
headed gutted and finned; HGF
headed gutted & finned; HGF
headed and gutted, finned; HGF
headed & gutted, finned; HGF
headed and gutted finned; HGF
headed & gutted finned; HGF
whole gutted; GUT
Gutted; GUT
whole; GRE
green; GRE

Is there a way to not rely on finding the parenthesis and independently searching for the substring?

Many thanks for your help!


Yes, that helps greatly, thanks. (y)

Is this what you are trying to do?
I am assuming the "Not Found" results arise from not having the entire data from 'States translation'
 

chris32

New Member
Joined
May 23, 2021
Messages
11
Office Version
  1. 365
Platform
  1. Windows
Maybe this will help if you modify it to your data and what you need returned.
VBA Code:
   Dim rng As Range, WS1 As Worksheet, WS2 As Worksheet, desWS As Worksheet, fnd As Range
    Set WS1 = Sheets("Temp")
    Set WS2 = Sheets("Data List")
For Each rng In WS2.Range("C2", WS2.Range("C" & WS2.Rows.Count).End(xlUp))
        Set fnd = WS1.Range("A:A").Find(rng, LookIn:=xlValues, lookat:=xlPart)
        If Not fnd Is Nothing Then
            fnd.Offset(, 1) = rng.Offset(, 1)
        End If
    Next rng
Hi Jake,

thank you. I am not great at VBA but will see if I can get somewhere with this code. Cheers!
 

Peter_SSs

MrExcel MVP, Moderator
Joined
May 28, 2005
Messages
50,509
Office Version
  1. 365
Platform
  1. Windows
I have just added $ signs so that it searches in the same array when I drag the formula down.
The $ signs that you added in from of the column letters will not hurt, but were not needed. Since the formula is being dragged down (not across) it is only the row numbers that need anchoring with $ signs and my formula already had $ signs before the row numbers. All you really needed to do was increase the 83 to 200.


it relies on finding the closing parenthesis and then the next letter to the left from there, is that right?
Correct

Some entries returned a "Not found" when the states were in plural but in singular in my list
We could accommodate that with
Excel Formula:
=IFNA(VLOOKUP(REPLACE(LEFT(D2,LEN(D2)-1),1,FIND("(",D2),""),'States translation'!C$2:D$200,2,0),IFNA(VLOOKUP(REPLACE(LEFT(D2,LEN(D2)-2),1,FIND("(",D2),""),'States translation'!C$2:D$200,2,0),"Not Found"))

One case I don't get is with this single search term "gutted":
Gutted works for me. Check that cell in 'States translation' and make sure that it does not have any trailing or leading spaces. Even just select that cell and type gutted again and see what happens.

Is there a way to not rely on finding the parenthesis and independently searching for the substring?
I don't see how you could logically decide what part of a multi-word string is the substring to look for if there is nothing to identify it?
 

chris32

New Member
Joined
May 23, 2021
Messages
11
Office Version
  1. 365
Platform
  1. Windows
One case I don't get is with this single search term "gutted":

"Butterfish (Gutted)" returns a "Not found" even if "Gutted" is in my list. It is at the bottom of my list because there are many other states that contain "gutted" associated with other words
Oh nevermind about that one Peter, it was my bad - there was an extra space at the end of the word "gutted" in my list - sorry about that!

Still very keen to know if there is a search function that would not rely on finding the parenthesis (had another case where the person entered "xxx ( whole gutted)" with an extra space after the parenthesis and so it did not find the matching state.

Thank you! :)
 

Forum statistics

Threads
1,141,000
Messages
5,703,652
Members
421,308
Latest member
NewBlood

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top