Substring search

drdisp

New Member
Joined
Apr 3, 2015
Messages
7
Hello guys,

Hope you are doing well.

I am seeking your help regarding an Excel formula.

I will be receiving a large text with a lot of XML tags in A1 cell.

I will have only a couple of XML tags that I will be using, the rest will be useless.

The purpose would be to extract the text between those certain tags.

For example, I only need the data between the following tags:

<to> and to>
<from> and </from>
<address> and </address>

Basically, pasting the following text in A1:

<?xmlversion="1.0"encoding="UTF-8"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
<address>George Bush Street</address>
<uselesstag>some_other_info</uselesstag>
</note>

should return the following info:

Tove
Jani
George Bush Street


Could you please assist with that?

MID and SEARCH functions come to my mind, however, I cannot get them to work for some reason.

Your help is greatly appreciated! :)
 
Last edited:

Excel Facts

Difference between two dates
Secret function! Use =DATEDIF(A2,B2,"Y")&" years"&=DATEDIF(A2,B2,"YM")&" months"&=DATEDIF(A2,B2,"MD")&" days"
<to></to><from></from><address></address>
<!--?xmlversion="1.0"encoding="UTF-8"?--><note><to>Tove</to><from>Jani</from><heading>Reminder</heading>Don't forget me this weekend!<address>George Bush Street</address><uselesstag>some_other_info</uselesstag></note>43515666142169
using search these numbers locate the 3 tags you are interested in
considering only the "to" tags
<to> is 4 characters</to>
so you want character 47 onwards (43+4)
you want to stop at character 50 (51-1)
number of characters wanted = 4 = 51-47
Tove
Jani
George Bush Street
Tove
=MID(E2,47,4)
the 47 and 4 in this formula are clearly replaced with
F2+4 and G2-F2-4
forum screws up text with "<" and ">" in it
trust me my big text cell is exactly as yours

<tbody>
</tbody>
 
Upvote 0
< to > < /to > < from > < /from > < address > < /address >
< ?xmlversion="1.0"encoding="UTF-8"? > < note > < to > Tove < /to > < from > Jani < /from > < heading > Reminder < /heading > < body > Don't forget me this weekend! < /body > < address > George Bush Street < /address > < uselesstag > some_other_info < /uselesstag > < /note >43515666142169
using search these numbers locate the 3 tags you are interested in
considering only the "to" tags
< to > is 4 characters
so you want character 47 onwards (43+4)
you want to stop at character 50 (51-1)
number of characters wanted = 4 = 51-47
Tove
Jani
George Bush Street
Tove
=MID(E2,47,4)
the 47 and 4 in this formula are clearly replaced with
F2+4 and G2-F2-4
I used find and replace to insert blank
characters around the < and >

<colgroup><col span="3"><col><col><col><col span="3"><col><col><col></colgroup><tbody>
</tbody>
 
Upvote 0

Forum statistics

Threads
1,215,453
Messages
6,124,920
Members
449,195
Latest member
Stevenciu

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top