Find and remove list of substrings from larger list of strings

svendfj

Board Regular
Joined
Mar 19, 2011
Messages
99
Office Version
  1. 2016
Platform
  1. Windows
I have three lists:
> LIST 1 is a set of substrings that need to be removed from the main list - LIST 2. The substrings are either a prefix or a suffix (either the beginning or end of one of the strings in LIST 2).
> LIST 2, as mentioned, is the main list that needs to have one of the prefixes/suffixes shown in LIST 1 removed.
> LIST 3 is the result that I am seeking

In other words, for each string in LIST 2, I need to go through LIST 1, and when a LIST 1 substring is contained in the LIST 2 string, then remove that substring. LIST 2 strings generally have one LIST 1 substring, but will not have more than 1 substring.

The larger problem this is part of makes me be especially interested in finding the simplest, most seamless solution. I very much appreciate any suggestions you can please share.

Thanks, Sven

LIST 1
" King Kangaroo"
" bush Bee"
"Newt "
"Hare "
"Otto Owl "


LIST 2
He was surprised when suddenly a King Kangaroo
And sitting astride a twig of a bush Bee
Hare has lost his spectacles
Otto Owl loved to rest quietly whilst no one was watching.
Newt knew too much to be stopped by so small a problem


LIST 3
He was surprised when suddenly a
And sitting astride a twig of a bush
has lost his spectacles
loved to rest quietly whilst no one was watching.
knew too much to be stopped by so small a problem
 

Excel Facts

Create a chart in one keystroke
Select the data and press Alt+F1 to insert a default chart. You can change the default chart to any chart type
If you have LIST 1 in column A and LIST 2 in column B, then in column C do the following formula

=SUBSTITUTE($B1, $A1, "")

Then drag down
 
Upvote 0
Hi,

Note List 1 and List 2 is Not in the Same order.

And isn't this in some way a backward/reverse/??? of what you requested here?: https://www.mrexcel.com/forum/excel...ok-substrings-list1-larger-strings-list2.html

Here's a solution for your new request, note List 3 is Not in the same order as List 2, but follows List 1:


Book1
ABC
1LIST 1LIST 2LIST 3
2King KangarooHe was surprised when suddenly a King KangarooHe was surprised when suddenly a
3bush BeeAnd sitting astride a twig of a bush BeeAnd sitting astride a twig of a
4NewtHare has lost his spectaclesknew too much to be stopped by so small a problem
5HareOtto Owl loved to rest quietly whilst no one was watching.has lost his spectacles
6Otto OwlNewt knew too much to be stopped by so small a problemloved to rest quietly whilst no one was watching.
Sheet131
Cell Formulas
RangeFormula
C2=TRIM(SUBSTITUTE(LOOKUP(2,1/SEARCH(A2,B$2:B$6),B$2:B$6),A2,""))


C2 formula copied down.
 
Last edited:
Upvote 0
Here's a solution for your new request, ..
I'm not so sure - see sample data below with your formula in column D.
(I think you may have missed the space characters in the sample List 1 values. I also assumed that List 2 might be longer than List 1 - but could be wrong with that)

@svendfj
Am I right in thinking if a List 1 value has a space at the beginning, then you only want to remove that text if it occurs at the end of the string & vice-versa if the space is at the end?
That is, " King Kangaroo" should not get removed from my cell B7 string?

If so, you could try this user-defined function. To implement a UDF ..
1. Right click the sheet name tab and choose "View Code".
2. In the Visual Basic window use the menu to Insert|Module
3. Copy and Paste the code below into the main right hand pane that opens at step 2.
4. Close the Visual Basic window.
5. Enter the formula as shown in the screen shot below in column C and copy down.

Code:
Function TrimIt(s As String, PrefSuff As Range) As String
  Static RX As Object
  Dim m As Variant
  Dim c As Range
  Dim p As String
  
  If RX Is Nothing Then
    Set RX = CreateObject("VBScript.RegExp")
    RX.Global = True
    RX.IgnoreCase = True
  End If
  For Each c In PrefSuff
    p = p & "|(" & IIf(Left(c.Value, 1) = " ", "", "^") & c.Value & IIf(Left(c.Value, 1) = " ", "$", "") & ")"
  Next c
  RX.Pattern = Mid(p, 2)
  TrimIt = RX.Replace(s, "")
End Function

Excel Workbook
ABCD
1LIST 1LIST 2LIST 3
2King KangarooHe was surprised when suddenly a King KangarooHe was surprised when suddenly aNewt knew a king kangaroo
3bush BeeAnd sitting astride a twig of a bush BeeAnd sitting astride a twig of aAnd sitting astride a twig of a
4NewtHare has lost his spectacleshas lost his spectaclesknew a king kangaroo
5HareOtto Owl loved to rest quietly whilst no one was watching.loved to rest quietly whilst no one was watching.has lost his spectacles
6Otto OwlNewt knew too much to be stopped by so small a problemknew too much to be stopped by so small a problemloved to rest quietly whilst no one was watching.
7xxxxHe was surprised when suddenly a King Kangaroo jumped outHe was surprised when suddenly a King Kangaroo jumped out#N/A
8yyyyNewt knew a king kangarooknew a#N/A
Remove string



Edit:
LIST 2 strings generally have one LIST 1 substring, but will not have more than 1 substring.
I had missed that bit so my sample data in cell B8 above may not be a valid sample. :)
 
Last edited:
Upvote 0
Thank you jtakw. This is an addition to the process. I appreciate it.
 
Upvote 0
Thank you Peter. I will take a look. The substrings in LIST 1 are basically either a prefix or suffix and are defined to take care of leading or lagging spaces. I appreciate your perspective.
 
Upvote 0
The substrings in LIST 1 are basically either a prefix or suffix and are defined to take care of leading or lagging spaces.
That is still not entirely clear to me.

1. Could "Newt " appear at the start of one string and " Newt" at the end of another string? If so, should "Newt" (& the associated space) get removed from both strings?

2. Can you confirm that for "John & Newt have arrived" nothing gets removed?
 
Upvote 0
Thanks and drinks all around, after I have resolved the issue that prevented me from responding until now. Peter, your code is perfect. Thank you! The answer to your questions: 1) No, the substrings are either a prefix or suffix. 2) You are correct, but it is fine to not factor this in, due to the nature of the substrings in List 1.

Again, thank you.
 
Upvote 0
Thanks and drinks all around, after I have resolved the issue that prevented me from responding until now. Peter, your code is perfect. Thank you!
Good news and you are welcome.

2) You are correct, but it is fine to not factor this in, ...
My code does already factor it in though. :)
That is why King Kangaroo is not removed in C7 even though the substring " King Kangaroo" (from cell A2) is found within cell B7.
 
Upvote 0

Forum statistics

Threads
1,213,531
Messages
6,114,172
Members
448,554
Latest member
Gleisner2

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top