Find and remove list of substrings from larger list of strings

svendfj

Board Regular
Joined
Mar 19, 2011
Messages
75
Office Version
  1. 2016
Platform
  1. Windows
I have three lists:
> LIST 1 is a set of substrings that need to be removed from the main list - LIST 2. The substrings are either a prefix or a suffix (either the beginning or end of one of the strings in LIST 2).
> LIST 2, as mentioned, is the main list that needs to have one of the prefixes/suffixes shown in LIST 1 removed.
> LIST 3 is the result that I am seeking

In other words, for each string in LIST 2, I need to go through LIST 1, and when a LIST 1 substring is contained in the LIST 2 string, then remove that substring. LIST 2 strings generally have one LIST 1 substring, but will not have more than 1 substring.

The larger problem this is part of makes me be especially interested in finding the simplest, most seamless solution. I very much appreciate any suggestions you can please share.

Thanks, Sven

LIST 1
" King Kangaroo"
" bush Bee"
"Newt "
"Hare "
"Otto Owl "


LIST 2
He was surprised when suddenly a King Kangaroo
And sitting astride a twig of a bush Bee
Hare has lost his spectacles
Otto Owl loved to rest quietly whilst no one was watching.
Newt knew too much to be stopped by so small a problem


LIST 3
He was surprised when suddenly a
And sitting astride a twig of a bush
has lost his spectacles
loved to rest quietly whilst no one was watching.
knew too much to be stopped by so small a problem
 

Some videos you may like

Excel Facts

Can Excel fill bagel flavors?
You can teach Excel a new custom list. Type the list in cells, File, Options, Advanced, Edit Custom Lists, Import, OK

Nine Zero

Well-known Member
Joined
Mar 10, 2016
Messages
622
If you have LIST 1 in column A and LIST 2 in column B, then in column C do the following formula

=SUBSTITUTE($B1, $A1, "")

Then drag down
 

jtakw

Well-known Member
Joined
Jun 29, 2014
Messages
5,146
Hi,

Note List 1 and List 2 is Not in the Same order.

And isn't this in some way a backward/reverse/??? of what you requested here?: https://www.mrexcel.com/forum/excel...ok-substrings-list1-larger-strings-list2.html

Here's a solution for your new request, note List 3 is Not in the same order as List 2, but follows List 1:

<b></b><table cellpadding="2.5px" rules="all" style=";background-color: rgb(255,255,255);border: 1px solid;border-collapse: collapse; border-color: rgb(187,187,187)"><colgroup><col width="25px" style="background-color: rgb(218,231,245)" /><col /><col /><col /></colgroup><thead><tr style=" background-color: rgb(218,231,245);text-align: center;color: rgb(22,17,32)"><th></th><th>A</th><th>B</th><th>C</th></tr></thead><tbody><tr ><td style="color: rgb(22,17,32);text-align: center;">1</td><td style=";">LIST 1</td><td style=";">LIST 2</td><td style=";">LIST 3</td></tr><tr ><td style="color: rgb(22,17,32);text-align: center;">2</td><td style=";">King Kangaroo</td><td style=";">He was surprised when suddenly a King Kangaroo</td><td style=";">He was surprised when suddenly a</td></tr><tr ><td style="color: rgb(22,17,32);text-align: center;">3</td><td style=";">bush Bee</td><td style=";">And sitting astride a twig of a bush Bee</td><td style=";">And sitting astride a twig of a</td></tr><tr ><td style="color: rgb(22,17,32);text-align: center;">4</td><td style=";">Newt</td><td style=";">Hare has lost his spectacles</td><td style=";">knew too much to be stopped by so small a problem</td></tr><tr ><td style="color: rgb(22,17,32);text-align: center;">5</td><td style=";">Hare</td><td style=";">Otto Owl loved to rest quietly whilst no one was watching.</td><td style=";">has lost his spectacles</td></tr><tr ><td style="color: rgb(22,17,32);text-align: center;">6</td><td style=";">Otto Owl</td><td style=";">Newt knew too much to be stopped by so small a problem</td><td style=";">loved to rest quietly whilst no one was watching.</td></tr></tbody></table><p style="width:6.4em;font-weight:bold;margin:0;padding:0.2em 0.6em 0.2em 0.5em;border: 1px solid rgb(187,187,187);border-top:none;text-align: center;background-color: rgb(218,231,245);color: rgb(22,17,32)">Sheet131</p><br /><br /><table width="85%" cellpadding="2.5px" rules="all" style=";border: 2px solid black;border-collapse:collapse;padding: 0.4em;background-color: rgb(255,255,255)" ><tr><td style="padding:6px" ><b>Worksheet Formulas</b><table cellpadding="2.5px" width="100%" rules="all" style="border: 1px solid;text-align:center;background-color: rgb(255,255,255);border-collapse: collapse; border-color: rgb(187,187,187)"><thead><tr style=" background-color: rgb(218,231,245);color: rgb(22,17,32)"><th width="10px">Cell</th><th style="text-align:left;padding-left:5px;">Formula</th></tr></thead><tbody><tr><th width="10px" style=" background-color: rgb(218,231,245);color: rgb(22,17,32)">C2</th><td style="text-align:left">=TRIM(<font color="Blue">SUBSTITUTE(<font color="Red">LOOKUP(<font color="Green">2,1/SEARCH(<font color="Purple">A2,B$2:B$6</font>),B$2:B$6</font>),A2,""</font>)</font>)</td></tr></tbody></table></td></tr></table><br />

C2 formula copied down.
 
Last edited:

Peter_SSs

MrExcel MVP, Moderator
Joined
May 28, 2005
Messages
46,165
Office Version
  1. 365
Platform
  1. Windows
Here's a solution for your new request, ..
I'm not so sure - see sample data below with your formula in column D.
(I think you may have missed the space characters in the sample List 1 values. I also assumed that List 2 might be longer than List 1 - but could be wrong with that)

@svendfj
Am I right in thinking if a List 1 value has a space at the beginning, then you only want to remove that text if it occurs at the end of the string & vice-versa if the space is at the end?
That is, " King Kangaroo" should not get removed from my cell B7 string?

If so, you could try this user-defined function. To implement a UDF ..
1. Right click the sheet name tab and choose "View Code".
2. In the Visual Basic window use the menu to Insert|Module
3. Copy and Paste the code below into the main right hand pane that opens at step 2.
4. Close the Visual Basic window.
5. Enter the formula as shown in the screen shot below in column C and copy down.

Code:
Function TrimIt(s As String, PrefSuff As Range) As String
  Static RX As Object
  Dim m As Variant
  Dim c As Range
  Dim p As String
  
  If RX Is Nothing Then
    Set RX = CreateObject("VBScript.RegExp")
    RX.Global = True
    RX.IgnoreCase = True
  End If
  For Each c In PrefSuff
    p = p & "|(" & IIf(Left(c.Value, 1) = " ", "", "^") & c.Value & IIf(Left(c.Value, 1) = " ", "$", "") & ")"
  Next c
  RX.Pattern = Mid(p, 2)
  TrimIt = RX.Replace(s, "")
End Function

Excel Workbook
ABCD
1LIST 1LIST 2LIST 3
2King KangarooHe was surprised when suddenly a King KangarooHe was surprised when suddenly aNewt knew a king kangaroo
3bush BeeAnd sitting astride a twig of a bush BeeAnd sitting astride a twig of aAnd sitting astride a twig of a
4NewtHare has lost his spectacleshas lost his spectaclesknew a king kangaroo
5HareOtto Owl loved to rest quietly whilst no one was watching.loved to rest quietly whilst no one was watching.has lost his spectacles
6Otto OwlNewt knew too much to be stopped by so small a problemknew too much to be stopped by so small a problemloved to rest quietly whilst no one was watching.
7xxxxHe was surprised when suddenly a King Kangaroo jumped outHe was surprised when suddenly a King Kangaroo jumped out#N/A
8yyyyNewt knew a king kangarooknew a#N/A
Remove string



Edit:
LIST 2 strings generally have one LIST 1 substring, but will not have more than 1 substring.
I had missed that bit so my sample data in cell B8 above may not be a valid sample. :)
 
Last edited:

svendfj

Board Regular
Joined
Mar 19, 2011
Messages
75
Office Version
  1. 2016
Platform
  1. Windows

ADVERTISEMENT

Thank you jtakw. This is an addition to the process. I appreciate it.
 

svendfj

Board Regular
Joined
Mar 19, 2011
Messages
75
Office Version
  1. 2016
Platform
  1. Windows
Thank you Peter. I will take a look. The substrings in LIST 1 are basically either a prefix or suffix and are defined to take care of leading or lagging spaces. I appreciate your perspective.
 

Peter_SSs

MrExcel MVP, Moderator
Joined
May 28, 2005
Messages
46,165
Office Version
  1. 365
Platform
  1. Windows

ADVERTISEMENT

The substrings in LIST 1 are basically either a prefix or suffix and are defined to take care of leading or lagging spaces.
That is still not entirely clear to me.

1. Could "Newt " appear at the start of one string and " Newt" at the end of another string? If so, should "Newt" (& the associated space) get removed from both strings?

2. Can you confirm that for "John & Newt have arrived" nothing gets removed?
 

svendfj

Board Regular
Joined
Mar 19, 2011
Messages
75
Office Version
  1. 2016
Platform
  1. Windows
Thanks and drinks all around, after I have resolved the issue that prevented me from responding until now. Peter, your code is perfect. Thank you! The answer to your questions: 1) No, the substrings are either a prefix or suffix. 2) You are correct, but it is fine to not factor this in, due to the nature of the substrings in List 1.

Again, thank you.
 

Peter_SSs

MrExcel MVP, Moderator
Joined
May 28, 2005
Messages
46,165
Office Version
  1. 365
Platform
  1. Windows
Thanks and drinks all around, after I have resolved the issue that prevented me from responding until now. Peter, your code is perfect. Thank you!
Good news and you are welcome.

2) You are correct, but it is fine to not factor this in, ...
My code does already factor it in though. :)
That is why King Kangaroo is not removed in C7 even though the substring " King Kangaroo" (from cell A2) is found within cell B7.
 

Watch MrExcel Video

Forum statistics

Threads
1,108,645
Messages
5,524,054
Members
409,557
Latest member
Excelinho

This Week's Hot Topics

Top