Extracting web data from a sub web page

99bobster99

Board Regular
Joined
Jul 19, 2004
Messages
95
Hi,

I am using the "CreateObject("InternetExplorer.Application")" method to extract data from a webpage. This webpage is a sub page that I cannot find the data for, even after filtering through all the ".document.all(x)" items? Any ideas on how to obtain this "sub page" data, since there is other data that I need which is on the "main" page? I don't want to have to call up this sub page separately since I'm already seeing it (physically able to view it on the main page but not able to extract it) from the main page? Any ideas?
 
Norie, what would be a better option than Excel or VBA for obtaining the data?

The data I'm looking to retrieve is both the "real-time" last trade and volume data (which is located under the Options header) as well as the "delayed" day low, day high and today's open data (which is located under the Microsoft Corp header) at this URL;

http://www.cnbc.com/id/15837290?q=msft

I'd like to obtain both sets of data ("real-time" data is on the main URL and "delayed" data is on the iframe URL) in a single navigation (if possible) as per Tom's xls file. How do I get both sets of data using Tom's xls file?
 
Upvote 0

Excel Facts

Select all contiguous cells
Pressing Ctrl+* (asterisk) will select the "current region" - all contiguous cells in all directions.
"Why not use a standard Web Query from Data > Import External Data?"

Because the frame's URL changes constantly. So I suppose a web query would not work repeatedly.

Bob. The additional data, that you did not mention before, can be retrieved from the results in my previous reply. I don't know which volume you are after.

<img src="http://home.fuse.net/tstom/07060329.jpg">
 
Upvote 0
I'd like to retrieve the "Last Trade", "Volume" and "Time Stamp" from the upper half of the webpage (located in the upper "real-time" banner, in the region indicated by your upper circled Volume on your bitmap). And at the same time, I'd like to retrieve the "Day Low", "Day High" and "Today's Open" from the bottom half of the webpage (located in the lower "delayed" banner, in the region indicated by your lower circled Volume on your bitmap).
Can I do this using your xls file you sent earlier?
 
Upvote 0
Thank you everyone for your contributions!!

I used John/Kenneth's methods mentioned earlier and am making some headway! Looks like this is the way to go taking the IE "no more than 1 document link" security enforcement into consideration.

Thank you again everyone for your input, I will post my final code once I get it working 100%.
 
Upvote 0
Bob. How often are you going to query this site? Are you trying to import the data into the spreadsheet as often as it updates? How many stocks are you watching?
 
Upvote 0
Hi Tom,

I'd like to query the site on demand manually (vb icon on the spreadsheet). I'd be updating a single symbol at a time.
 
Upvote 0
Hi Bob. I had some fun with this just to see what could be done. Here are two example workbooks using the same code. One shows how to impliment this as a custom worksheet function to get realtime and openhighlow data on demand. The other shows how to automatically update a range of cells with the realtime data. I'll post the code in my next reply.

UDF Example:
<a href="http://home.fuse.net/tstom/0708081250.328272.UDF.zip"><img src="http://home.fuse.net/tstom/zip.gif"width="48"height="48"border="0"></a> <a href="http://home.fuse.net/tstom/0708081250.328272.UDF.zip">0708081250.328272.UDF.zip</a>

Realtime Data Sync Example:
<a href="http://home.fuse.net/tstom/0708081250.328272.RTD.zip"><img src="http://home.fuse.net/tstom/zip.gif"width="48"height="48"border="0"></a> <a href="http://home.fuse.net/tstom/0708081250.328272.RTD.zip">0708081250.328272.RTD.zip</a>
 
Upvote 0
Example code using the webbrowser control and MSHTML object library.

Add a userform. Name it frmDataItem. Add WebBrowser1. (web browser control). There is no code in the userform.

Add a class module named OHLArgs
Code:
Option Explicit

Public Symbol As String

Public OpenText As String
Public LowText As String
Public HighText As String

Public OpenValue As Currency
Public LowValue As Currency
Public HighValue As Date

Public HasError As Boolean

Add a class module named RTDArgs
Code:
Option Explicit

Public Symbol As String

Public LastText As String
Public VolumeText As String
Public TimeText As String

Public LastValue As Currency
Public VolumeValue As Currency
Public TimeValue As Date

Public HasError As Boolean

Add a class module named RTDItem
Code:
Option Explicit

Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)

Private Const UrlPart As String = "http://www.cnbc.com/id/15837290?q="

Public Event UpdateAvailable(Symbol As String, RealTimeData As RTDArgs)
Public Event TimedOut(Symbol As String)

'requires a reference to Microsoft HTML Object Library
Private WithEvents TimeSpan As HTMLSpanElement
Private LastSpan As HTMLSpanElement
Private VolumeSpan As HTMLSpanElement

Private HTMLDoc As HTMLDocument

Private WithEvents WebB As WebBrowser

Private pFrm As frmDataItem
Private pSymbol As String
Private pEnableEvents As Boolean
Private pConnected As Boolean
Private pTimeOutSeconds As Integer
Private pSessionID As String
Private pFrameSource As String
Private pForceRefresh As Boolean
Private pRealTimeData(2) As String
Private pOpenHighLow(2) As String
Private pCancelNavigation As Boolean
Private pLastRefreshTime As Date
Private pCallerIsUDF As Boolean

Friend Function OpenHighLow() As OHLArgs
    If Not pConnected Then Exit Function
    If DateDiff("n", pLastRefreshTime, Now) > 1 Then Me.Refresh
    
    Dim Args As OHLArgs
    Set Args = New OHLArgs
    
    With Args
        .Symbol = pSymbol
        .OpenText = pOpenHighLow(0)
        .HighText = pOpenHighLow(1)
        .LowText = pOpenHighLow(2)
    
        If IsNumeric(pOpenHighLow(0)) Then
            .OpenValue = CCur(pOpenHighLow(0))
        Else
            .HasError = True
        End If
    
        If IsNumeric(pOpenHighLow(1)) Then
            .HighValue = CCur(pOpenHighLow(1))
        Else
            .HasError = True
        End If
        
        If IsNumeric(pOpenHighLow(2)) Then
            .LowValue = CDate(pOpenHighLow(2))
        Else
            .HasError = True
        End If
    End With
    
    Set OpenHighLow = Args
End Function

Friend Property Get RTD() As RTDArgs
    If Not pConnected Then Exit Property
    Dim Args As RTDArgs
    
    Set Args = New RTDArgs
    
    With Args
        .Symbol = pSymbol
        .LastText = pRealTimeData(0)
        .VolumeText = pRealTimeData(1)
        .TimeText = pRealTimeData(2)
    
        If IsNumeric(pRealTimeData(0)) Then
            .LastValue = CCur(pRealTimeData(0))
        Else
            .HasError = True
        End If
    
        If IsNumeric(pRealTimeData(1)) Then
            .VolumeValue = CCur(pRealTimeData(1))
        Else
            .HasError = True
        End If
        
        If IsDate(pRealTimeData(2)) Then
            .TimeValue = CDate(pRealTimeData(2))
        Else
            .HasError = True
        End If
    End With
    
    Set RTD = Args
End Property

Friend Function Connect(Symbol As String, _
   Optional EnableEvents As Boolean, _
   Optional TimeOutSeconds As Integer = 10, _
   Optional CallerIsUDF As Boolean) As Boolean
   
    Dim TimeOutTime As Date
    
    Me.TimeOutSeconds = TimeOutSeconds
    pCallerIsUDF = CallerIsUDF
    
    Me.Symbol = Symbol
    
    If Not pCallerIsUDF Then
        Me.EnableEvents = EnableEvents
    End If
    
    TimeOutTime = Now + TimeSerial(0, 0, TimeOutSeconds)
    Do Until Me.Connected
        Sleep 250
        If Now > TimeOutTime Then Exit Do
    Loop
    
    Connect = Me.Connected
    pLastRefreshTime = Now

End Function

Private Function GetProperties() As Boolean
    Dim Script As HTMLScriptElement, Frame As HTMLObjectElement
    Dim TempFrameSource As String

    On Error Resume Next
    Set HTMLDoc = WebB.Document
        
    For Each Script In HTMLDoc.Scripts
        If InStr(Script.innerHTML, "cnbc_RunQuoteStrip") <> 0 Then
            pSessionID = Replace(Split(Script.innerHTML, ",")(3), "'", "")
            Exit For
        End If
    Next

    For Each Frame In HTMLDoc.all
        TempFrameSource = Frame.src
        If InStr(TempFrameSource, "delayedQuotes&symbol=" & Symbol) <> 0 Then
            pFrameSource = TempFrameSource
            Exit For
        End If
    Next
    
    Set LastSpan = HTMLDoc.getElementById("WSODQ_" & pSymbol & "_LAST_0_" & pSessionID)
    Set VolumeSpan = HTMLDoc.getElementById("WSODQ_" & pSymbol & "_VOLUME_0_" & pSessionID)
    Set TimeSpan = HTMLDoc.getElementById("WSODQ_" & pSymbol & "_LASTTIME_0_" & pSessionID)
    
    If (Not LastSpan Is Nothing) And (Not VolumeSpan Is Nothing) And (Not TimeSpan Is Nothing) Then
        If LastSpan.innerText <> "" And VolumeSpan.innerText <> "" And TimeSpan.innerText <> "" And pFrameSource <> "" Then
            GetProperties = True
            Exit Function
        End If
    End If
End Function

Private Sub Class_Initialize()
    Set pFrm = New frmDataItem
    Set WebB = pFrm.WebBrowser1
End Sub

Private Sub Class_Terminate()
    Set WebB = Nothing
    Unload pFrm
    Set pFrm = Nothing
End Sub

Friend Sub Refresh()
    pForceRefresh = True
    Me.Symbol = pSymbol
    pLastRefreshTime = Now
End Sub

Friend Property Let TimeOutSeconds(ByVal vNewValue As Integer)
    If vNewValue < 5 Then vNewValue = 5
    pTimeOutSeconds = vNewValue
End Property

Friend Property Get TimeOutSeconds() As Integer
    TimeOutSeconds = pTimeOutSeconds
End Property

Friend Property Get EnableEvents() As Boolean
    EnableEvents = pEnableEvents
End Property

Friend Property Let EnableEvents(ByVal vNewValue As Boolean)
    pEnableEvents = vNewValue
End Property

Friend Property Get Symbol() As String
    Symbol = pSymbol
End Property

Friend Property Let Symbol(ByVal vNewValue As String)
    Dim TimeOutTime As Date, doc(1) As HTMLDocument
    
    If pSymbol <> vNewValue Or pForceRefresh Then
        pForceRefresh = False
        pSymbol = vNewValue
'        pFrm.Show vbModeless
        pCancelNavigation = False
        TimeOutTime = Now + TimeSerial(0, 0, Int(TimeOutSeconds))
        WebB.Navigate UrlPart & pSymbol
        Do Until GetProperties
            If Now > TimeOutTime Then Exit Do
            DoEvents
        Loop
        WebB.Stop
        Set doc(0) = New HTMLDocument
        Set doc(1) = doc(0).createDocumentFromUrl(pFrameSource, "")
        Do Until doc(1).ReadyState = "complete"
            DoEvents
        Loop
        pOpenHighLow(0) = doc(1).getElementById("right").Children(0).Children(0).innerText
        pOpenHighLow(1) = doc(1).getElementById("stockQuote").getElementsByTagName("B")(2).innerText
        pOpenHighLow(2) = doc(1).getElementById("stockQuote").getElementsByTagName("B")(4).innerText
    End If
    Set LastSpan = HTMLDoc.getElementById("WSODQ_" & pSymbol & "_LAST_0_" & pSessionID)
    pConnected = True
End Property

Friend Property Get Connected() As Boolean
    Connected = pConnected
End Property

Private Sub TimeSpan_onpropertychange()
    pRealTimeData(0) = LastSpan.innerText
    pRealTimeData(1) = VolumeSpan.innerText
    pRealTimeData(2) = TimeSpan.innerText
    If pEnableEvents Then RaiseEvent UpdateAvailable(pSymbol, Me.RTD)
End Sub

Private Sub WebB_BeforeNavigate2(ByVal pDisp As Object, URL As Variant, Flags As Variant, TargetFrameName As Variant, PostData As Variant, Headers As Variant, Cancel As Boolean)
    If pCallerIsUDF Then
        Cancel = pCancelNavigation
        pCancelNavigation = True
    End If
End Sub


Example usage as a UDF.
Code:
Option Explicit

Private RtdUdf As RTDItem

Function OHL_UDF(Symbol As String) As Variant
    
    Dim v As Variant, OHLTemp() As Variant, OHLArgs As OHLArgs
    
    Application.Volatile False
    
    If RtdUdf Is Nothing Then
        Set RtdUdf = New RTDItem
        RtdUdf.Connect Symbol:=Symbol, CallerIsUDF:=True
    Else
        RtdUdf.Symbol = Symbol
    End If
    
    'Today's Open    Day High    Day Low Last    Volume
    'Time    Last Refresh Time   Status
    
    ReDim OHLTemp(4)
    Set OHLArgs = RtdUdf.OpenHighLow
    
    With OHLArgs
        If .HasError Then
            OHLTemp(0) = .OpenText
            OHLTemp(1) = .HighText
            OHLTemp(2) = .LowText
            OHLTemp(4) = "ERROR"
        Else
            OHLTemp(0) = .OpenValue
            OHLTemp(1) = .HighValue
            OHLTemp(2) = .LowValue
            OHLTemp(4) = "OK"
        End If
    End With
    
    OHLTemp(3) = Now
    OHL_UDF = OHLTemp
End Function

Function RTD_UDF(Symbol As String) As Variant()
    
    Dim v As Variant, RTDTemp() As Variant, RTDArgs As RTDArgs
    
    Application.Volatile False
    
    If RtdUdf Is Nothing Then
        Set RtdUdf = New RTDItem
        RtdUdf.Connect Symbol:=Symbol, CallerIsUDF:=True
    Else
        RtdUdf.Symbol = Symbol
    End If
    
    'Today's Open    Day High    Day Low Last    Volume
    'Time    Last Refresh Time   Status
    
    ReDim RTDTemp(4)
    Set RTDArgs = RtdUdf.RTD
    
    With RTDArgs
        If .HasError Then
            RTDTemp(0) = .LastText
            RTDTemp(1) = .VolumeText
            RTDTemp(2) = .TimeText
            RTDTemp(4) = "ERROR"
        Else
            RTDTemp(0) = .LastValue
            RTDTemp(1) = .VolumeValue
            RTDTemp(2) = .TimeValue
            RTDTemp(4) = "OK"
        End If
    End With
    
    RTDTemp(3) = Now
    RTD_UDF = RTDTemp
End Function

Example usage to get realtime data.
Code:
Option Explicit

Private WithEvents RTD As RTDItem

Sub StartRTD()
    
    Set RTD = New RTDItem
    RTD.Connect Symbol:="msft", EnableEvents:=True, TimeOutSeconds:=10, CallerIsUDF:=False
    If Not RTD.Connected Then
        MsgBox "Connection failed..."
    End If
        
End Sub

Sub StopRTD()
    Set RTD = Nothing
End Sub

Private Sub CommandButton1_Click()
    StartRTD
End Sub

Private Sub CommandButton2_Click()
    StopRTD
End Sub

Private Sub RTD_UpdateAvailable(Symbol As String, RealTimeData As RTDArgs)
    Dim DestRng As Range, RTDArgs As RTDArgs
    
    Set DestRng = [b2:e2]
  
    Set RTDArgs = RTD.RTD
    With RTDArgs
        If Not .HasError Then
            DestRng(1) = .LastValue
            DestRng(2) = .VolumeValue
            DestRng(3) = .TimeValue
            DestRng(4) = "Ok"
        Else
            DestRng(1) = .LastText
            DestRng(2) = .VolumeText
            DestRng(3) = .TimeText
            DestRng(4) = "Error"
        End If
    End With
     
    
End Sub

Private Sub RTD_TimedOut(Symbol As String)
    MsgBox "Timed Out..."
End Sub
 
Upvote 0
Thank you Tom, that works great! That is exactly what I was looking for! Thank you for the insight into how these embedded frames work!

Thank you to everyone who contributed on this thread!
 
Upvote 0

Forum statistics

Threads
1,214,386
Messages
6,119,220
Members
448,876
Latest member
Solitario

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top