Parse all html in excel

PetrusV

New Member
Joined
May 12, 2022
Messages
16
Office Version
  1. 2021
Platform
  1. Windows
I have data that looks like this:

TopicDefinition
A (Alpha)<p>Alpha, the first letter of the Greek alphabet, as Omega is the last. These letters occur in the text of <ref>Rev 1:8</ref>, <ref>Rev 1:11</ref>; <ref>Rev 21:6</ref>; <ref>Rev 22:13</ref>, and are represented by &#147;Alpha&#148; and &#147;Omega&#148; respectively (omitted in R.V., <ref>Rev 1:11</ref>). They mean &#147;the first and last.&#148; (Compare <ref>Heb 12:2</ref>; <ref>Isa 41:4</ref>; <ref>Isa 44:6</ref>; <ref>Rev 1:11</ref>, <ref>Rev 1:17</ref>; <ref>Rev 2:8</ref>.)</p><p>In the symbols of the early Christian Church these two letters are frequently combined with the cross or with Christ's monogram to denote his divinity.</p>
Aaronites<p>The descendants of Aaron, and therefore priests. Jehoiada, the father of Benaiah, led 3,700 Aaronites as &#147;fighting men&#148; to the support of David at Hebron (<ref>1Ch 12:27</ref>). Eleazar (<ref>Num 3:32</ref>), and at a later period Zadok (<ref>1Ch 27:17</ref>), was their chief.</p>
Abaddon<p>Destruction, the Hebrew name (equivalent to the Greek <i>Apollyon</i>, i.e., destroyer) of &#147;the angel of the bottomless pit&#148; (<ref>Rev 9:11</ref>). It is rendered &#147;destruction&#148; in <ref>Job 28:22</ref>; <ref>Job 31:12</ref>; <ref>Job 26:6</ref>; <ref>Pro 15:11</ref>; <ref>Pro 27:20</ref>. In the last three of these passages the Revised Version retains the word &#147;Abaddon.&#148; We may regard this word as a personification of the idea of destruction, or as sheol, the realm of the dead.</p>
Abagtha<p>One of the seven eunuchs in Ahasuerus's court (<ref>Est 1:10</ref>; <ref>Est 2:21</ref>).</p>

How do I parse the Definition column to preserve/parse all html data?
 

Excel Facts

What does custom number format of ;;; mean?
Three semi-colons will hide the value in the cell. Although most people use white font instead.
Try this macro:
VBA Code:
Sub PTxt()
Dim htDoc As Object, I As Long
'
Set htDoc = CreateObject("HTMLfile") 'late binding alla html obj lib
'
Sheets("mySheet").Select                        '<<< The sheet with the data
For I = 2 To Cells(Rows.Count, 2).End(xlUp).Row
    htDoc.write Cells(I, 2).Value
    On Error Resume Next
    Cells(I, 3) = htDoc.getelementsbyTAGname("p")(0).innertext
    On Error GoTo 0
    If Len(Cells(I, 3)) = 0 Then Cells(I, 3) = Cells(I, 2)
    htDoc.Close
Next I
'
End Sub
Insert the code in a standard module of your vba project; the line marked <<< has to customized according the comment. Then Run Sub PTxt:
-it will scan the info in column B and insert a text version in column C
-if you are confident the the output is correct then remove column B
 
Upvote 0
Solution
@Anthony47 I've spent the entire day trying to find a working VBA script, none worked. Your code not only works flawlessly but it runs fast! You're da man! Thank you so much for this!!!!
 
Upvote 0

Forum statistics

Threads
1,214,808
Messages
6,121,681
Members
449,048
Latest member
81jamesacct

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top