Remove the Characters and the Content between Them Using FILTERXML & SUBSTITUTE

Joined
Jul 22, 2022
Messages
1
Office Version
  1. 365
Platform
  1. Windows
I have a table of input of words such as:
Input
Result
Biology[23], Chemistry, Physics[17], History, [21]Geography[11]Biology, Chemistry, Physics, History, Geography
Mathematics,[27] Science (Terminated), [51]English,[36] History,[79] Music[3]Mathematics, Science, English, History, Music
Accounting (Suspended), Economics[25]Accounting, Economics
Arts, Literature (Begins on 7/7/2022)[6]Arts, Literature
......
May I know what formula should I apply in the Result column so that:
  1. the square brackets along with the contents inside the square brackets can be removed?
  2. the round brackets along with the contents inside the round brackets can be removed?
  3. The formula in the result column should be the same for every row and it uses colon for cell reference (e.g. C3:C987) so that drag and fill of formula can be omitted.
  4. The result should be as shown in the column on the right.
I tried to use the formula below, but it can only remove the first instance of square bracket and its contents:
=IFERROR(FILTERXML("<a><b>"&IFERROR(SUBSTITUTE(C3:C987,MID(LEFT(C3:C987,FIND("]",C3:C987)),FIND("[",C3:C987),LEN(C3:C987)),""),C3:C987)&"</b></a>","//b"),"")

May I know how can I improve the formula? Or are there any better formula?
 

Excel Facts

What does custom number format of ;;; mean?
Three semi-colons will hide the value in the cell. Although most people use white font instead.
Welcome to the MrExcel board!

I'm happy for somebody to prove me wrong, but I doubt you will get your 4 listed points happening with standard worksheet formulas.

If you are happy to employ a user-defined function then you could try this. If you need instruction on how to implement the udf post back & ask.

VBA Code:
Function GetSubjects(r As Range) As Variant
  Dim RX As Object
  Dim a As Variant
  Dim i As Long
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = "(\[.*?\])|(\(.*?\))"
  a = r.Value
  For i = 1 To UBound(a)
    a(i, 1) = Trim(RX.Replace(a(i, 1), ""))
  Next i
  GetSubjects = a
End Function

Anonymous User 51392.xlsm
CD
1
2InputResult
3Biology[23], Chemistry, Physics[17], History, [21]Geography[11]Biology, Chemistry, Physics, History, Geography
4Mathematics,[27] Science (Terminated), [51]English,[36] History,[79] Music[3]Mathematics, Science , English, History, Music
5Accounting (Suspended), Economics[25]Accounting , Economics
6Arts, Literature (Begins on 7/7/2022)[6]Arts, Literature
7
Sheet1
Cell Formulas
RangeFormula
D3:D6D3=GetSubjects(C3:C6)
Dynamic array formulas.
 
Upvote 0
Hi and Welcome to MrExcel!

the round brackets along with the contents inside the round brackets can be removed?
If only one word appears in parentheses, as your examples are, then you could use this long formula:
Dante Amor
CD
1
2InputResult
3Biology[23], Chemistry, Physics[17], History, [21]Geography[11]Biology, Chemistry, Physics, History, Geography
4Mathematics,[27] Science (Terminated), [51]English,[36] History,[79] Music[3]Mathematics, Science, English, History, Music
5Accounting (Suspended), Economics[25]Accounting, Economics
6Arts, Literature (Begins on 7/7/2022)[6]Arts, Literature
Hoja4
Cell Formulas
RangeFormula
D3:D6D3=TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE( SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE( SUBSTITUTE(SUBSTITUTE(IF(ISNUMBER(SEARCH("(",C3:C6)), REPLACE(C3:C6,SEARCH("(",C3:C6)-1,SEARCH(")",C3:C6)-SEARCH("(",C3:C6)+2,""),C3:C6), "0",""),"1",""),"2",""),"3",""),"4",""),"5",""),"6",""),"7",""),"8",""),"9",""),"[",""),"]",""))
 
Upvote 0
If only one word appears in parentheses
Hi Dante
Just to clarify - I think that you actually meant "If there is only one set of parentheses" - since there is more than one word in the parentheses of row 6 in the sample data.

Also worth noting that for the formula to work, there must not be any digits in the subject names (eg "Biology Level 2")

That being said, the formula may be useful to the OP - though I would not be surprised if the 'one set of parentheses' restriction may be breached with the full real data.
If it is of use, then it can be shortened somewhat by the removal of many of the double-quote marks, a shorter way of dealing with rows that have no parentheses and also FIND is sufficient for use here too instead of SEARCH.

Anonymous User 51392.xlsm
CD
1
2Input
3Biology[23], Chemistry, Physics[17], History, [21]Geography[11]Biology, Chemistry, Physics, History, Geography
4Mathematics,[27] Science (Terminated), [51]English,[36] History,[79] Music[3]Mathematics, Science, English, History, Music
5Accounting (Suspended), Economics[25]Accounting, Economics
6Arts, Literature (Begins on 7/7/2022)[6]Arts, Literature
Sheet1 (2)
Cell Formulas
RangeFormula
D3:D6D3=TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE( SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(REPLACE(C3:C6,FIND("(",C3:C6&"(")-1, FIND(")",C3:C6&")")-FIND("(",C3:C6&"(")+2,""),"[",9),"]",9),0,9),1,9),2,9),3,9),4,9),5,9),6,9),7,9),8,9),9,""))
Dynamic array formulas.
 
Last edited:
Upvote 0
Another non-vba approach, assuming that you have the LAMBDA function:
Though this one does need to be copied down each row, it should handle any number of parentheses or square brackets, anything within those brackets and keeps any numbers outside of the brackets.
It just requires all brackets to be paired and no nested brackets of the same sort.

To implement, In the Name Manger (Formulas ribbon tab) -> New.. -> Name: REPLBRACKETS -> Refers to: Paste the formula below -> OK

Excel Formula:
=LAMBDA(s,IF(ISNUMBER(FIND("(",s)),REPLBRACKETS(REPLACE(s,FIND("(",s)-1,FIND(")",s)-FIND("(",s)+2,"")),IF(ISNUMBER(FIND("[",s)),REPLBRACKETS(REPLACE(s,FIND("[",s),FIND("]",s)-FIND("[",s)+1,"")),s)))
Then use a formula like this in the worksheet

Anonymous User 51392.xlsm
CD
1InputResult
2Biology[23], Chemistry, Physics[17], History, [21]Geography[11]Biology, Chemistry, Physics, History, Geography
3Mathematics,[27] Science (Terminated), [51]English,[36] History,[79] Music[3]Mathematics, Science, English, History, Music
4Accounting (Suspended), Economics[25] (abc)Accounting, Economics
5Arts, Literature (Begins on 7/7/2022)[6]Arts, Literature
6Arts 2nd Level[33.3], Science (Terminated), English (New Courses [a] & [c]), History[0 - Failed]Arts 2nd Level, Science, English, History
Sheet3
Cell Formulas
RangeFormula
D2:D6D2=REPLBRACKETS(C2)
 
Upvote 0
Solution
Hi,

Couldn't it have been easier to use Find and Replace (Ctrl + H)?
 
Upvote 0
I'm sure that it would help the OP if you detailed the steps to do that (easily).
Of course. Press the CTRL + H key and in the REPLACEMENT tab, [*] will be placed, and nothing will be placed when replacing with. Then the procedure will be identical to replacing (*) with nothing.
 
Upvote 0
Of course. Press the CTRL + H key and in the REPLACEMENT tab, [*] will be placed, and nothing will be placed when replacing with. Then the procedure will be identical to replacing (*) with nothing.
Thanks. I hadn't actually tried it am wondering if the Find/Replace behaviour has changed at some point or if my memory is faulty (quite likely)? My recollection is that with Find/Replace the wildcard * was "hungry" and that with [*] it would replace as below below
Biology[23], Chemistry, Physics[17], History, [21]Geography[11]

It certainly looks like an option for the OP if a manual solution suits. It may require the tweak of including a space before the opening parenthesis in the 'Find' to avoid a space before the comma for those items.
 
Upvote 0
My recollection is that with Find/Replace the wildcard * was "hungry" and that with [*] it would replace as below below
Biology[23], Chemistry, Physics[17], History, [21]Geography[11]
In Microsoft 365 If I use [*] to find and replace with nothing results is Biology, Chemistry, Physics, History, Geography
will delete the brackets and what is between the brackets, for each individual brackets.
 
Upvote 0

Forum statistics

Threads
1,214,390
Messages
6,119,235
Members
448,879
Latest member
VanGirl

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top