Going through HTML source code

Spurious

Active Member
Joined
Dec 14, 2010
Messages
439
Hello guys,

got a more general question.

I got a HTML source code I want to go through and write certain passages that are within certain <></> brackets. Is instr really my only way of sorting that out, because I run into a few problems.

To be more specific:
It's a source code that has exchange rates and it's divided by month and days, so there is month 1, then 30 days, then month 2 then 30days, rinse and repeat.

I kinda have problems with those months, because he always skips and I havent really found a solution.
 

Excel Facts

Select all contiguous cells
Pressing Ctrl+* (asterisk) will select the "current region" - all contiguous cells in all directions.

Norie

Well-known Member
Joined
Apr 28, 2004
Messages
76,318
Office Version
  1. 365
Platform
  1. Windows
Spurious

Is this a one off or are you getting this on a regular basis?

Even if it is a one-off you could try automating IE, opening the file locally and then extracting the information that way.

That works for 'live' pages so I can't see it being a problem for locally saved HTML, and it might be easier to extract the data.:)

PS That's last comment's kind of biased - never have got a grasp on RegExp in VBA, even though I use it elsewhere.:eek:
 

Spurious

Active Member
Joined
Dec 14, 2010
Messages
439

ADVERTISEMENT

I am not quite sure what you mean Norie :/.

I use RegEx now, but I dont get any output, where is my fault?

Edit: Why is the code not shown properly? What do I have to do?

Code:
Sub QTextFormat2(jQText As String)
Dim FF As Integer

FF = FreeFile
Open "c:\Test.txt" For Output As #FF

    With CreateObject("VBScript.RegExp")
    .Pattern = "<H1 class=stats0>*</H1>|<TH class=month colSpan=7>*</TH></TR>|<TD><STRONG>*</TD>"
    If .test(jQText) Then
        Print #FF, .Execute(jQText)(0).submatches(0)
    End If
    End With

Close #FF
End Sub

As usual, help is greatly appreciated!

Cheers
 
Last edited:

Richard Schollar

MrExcel MVP
Joined
Apr 19, 2005
Messages
23,707
If your pattern includes html tags you will get gnarled output - you need to replace any instances of < with "& lt ;" (no double quotes and no spaces though) and instances of > with "& gt ;" - but just for your html tags eg:

.Pattern = "<div>"<DIV&GT;"< p>
 

Spurious

Active Member
Joined
Dec 14, 2010
Messages
439

ADVERTISEMENT

So trying it again:

I am not quite sure what you mean Norie :/.

I use RegEx now, but I dont get any output, where is my fault?

Code:
Sub QTextFormat2(jQText As String)
Dim FF As Integer

FF = FreeFile
Open "c:\Test.txt" For Output As #FF

    With CreateObject("VBScript.RegExp")
    .Pattern = "<H1 class=stats0>*</H1>|<TH class=month colSpan=7>*</TH></TR>|<TD><STRONG>*</TD>"
    If .test(jQText) Then
        Print #FF, .Execute(jQText)(0).submatches(0)
    End If
    End With

Close #FF
End Sub

As usual, help is greatly appreciated!

Cheers
 

Richard Schollar

MrExcel MVP
Joined
Apr 19, 2005
Messages
23,707
Can you post an example of the actual text file you have (you will need to use the "& lt ;" in place of all the opening <) and indicate which lines (eg by colouring them red) you wish to extract?
 

Spurious

Active Member
Joined
Dec 14, 2010
Messages
439
Can you post an example of the actual text file you have (you will need to use the "& lt ;" in place of all the opening <) and indicate which lines (eg by colouring them red) you wish to extract?

Here we go:

Code:
[COLOR="Red"]<H1 class=stats0>Japanese yen (JPY)</H1>[/COLOR]<!--
<script type="text/javascript" src="AC_OETags.js"></script>
<script type="text/javascript" src="history.js"></script>
<script type="text/javascript">currencyCode='jpy'</script>
<script type="text/javascript" src="flex.js" ></script>
<noscript>
<p>
<object type="application/x-shockwave-flash" data="Fxref.swf" width="100%" height="400">
<param name="movie" value="Fxref.swf" />
<param name="quality" value="high"/>
<param name="allowScriptAccess" value="sameDomain" />
<param name="flashvars" value="cur=jpy"/>
</object>
</p>
</noscript>
-->
<DIV id=flexChart class=flash-replaced><EMBED height=600 type=application/x-shockwave-flash pluginspage=http://www.adobe.com/go/getflashplayer width="100%" src=/stats/exchange/eurofxref/html/FxrefNg.swf?dataFile=jpy.xml.zlib&dsdFile=ecb_exr1.xml&width=100%&height=100%&historyUrl=history.htm&cur=jpy flashvars="">
<DIV class=alt></DIV></DIV><A name=M4></A>
<DIV id=flexChart class=flash-replaced><EMBED height=600 type=application/x-shockwave-flash pluginspage=http://www.adobe.com/go/getflashplayer width="100%" src=/stats/exchange/eurofxref/html/FxrefNg.swf?dataFile=jpy.xml.zlib&dsdFile=ecb_exr1.xml&width=100%&height=100%&historyUrl=history.htm&cur=jpy flashvars="">
<DIV class=alt></DIV></DIV><A name=M4></A>
<H2>Reference rates over last four months - Japanese yen (JPY)</H2>
<DIV class=fxref>
<DIV class="boxleft medium">
<TABLE class=tablecalendar summary="Reference rates: Mar">
<THEAD>
<TR>
[COLOR="#ff0000"]<TH class=month colSpan=7>Mar - 2011</TH>[/COLOR]</TR>
<TR>
<TH class=day abbr=Monday>Mon</TH>
<TH class=day abbr=Tuesday>Tue</TH>
<TH class=day abbr=Wednesday>Wed</TH>
<TH class=day abbr=Thursday>Thu</TH>
<TH class=day abbr=Friday>Fri</TH>
<TH class="day weekend" abbr=Saturday>Sat</TH>
<TH class="day weekend" abbr=Sunday>Sun</TH></TR></THEAD>
<TBODY>
<TR>
<TD> <BR> </TD>
[COLOR="#ff0000"]<TD><STRONG>1</STRONG><BR>113.39</TD>
<TD><STRONG>2</STRONG><BR>113.33</TD>
<TD><STRONG>3</STRONG><BR>113.43</TD>
<TD><STRONG>4</STRONG><BR>115.63</TD>[/COLOR]
<TD class=weekend><STRONG>5</STRONG><BR> </TD>
<TD class=weekend><STRONG>6</STRONG><BR> </TD></TR>
<TR class=dark>
 

Richard Schollar

MrExcel MVP
Joined
Apr 19, 2005
Messages
23,707
Try this:

Code:
Sub GetHtml(ByVal strText As String)
Dim fl As String
Dim mtches As Variant
Dim i As Integer, j As Long
i = FreeFile
fl = "c:\myOutputFile.txt"
With CreateObject("vbscript.regexp")
    .Global = True
    .ignorecase = True
    .Pattern = "(<H1 class=stats0>.*?</H1>)|(<TH class=month colSpan=7>.*?</TH>(?=</TR>))|(<TD><STRONG>.*?</TD>)"
    If .test(strText) Then
        Set mtches = .Execute(strText)
        j = 0
        Open fl For Binary Access Write As #i
            Do Until j = mtches.Count
                Put #i, , CStr(mtches(j))
                j = j + 1
            Loop
        Close #i
    End If
End With
End Sub
 

Forum statistics

Threads
1,148,182
Messages
5,745,210
Members
423,933
Latest member
ankushmukherjee

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top