Pull Data From Website

mikeypsooner

New Member
Joined
May 16, 2017
Messages
33
I would like to pull the data from the tables on the webpage https://finance.yahoo.com/quote/VTI/holdings?p=VTI

Looking to get the Overall Portfolio Composition (%), Sector Weightings (%), Equity Holdings, Bond Ratings, Top 10 Holdings (14.91% of Total Assets), and Fund Overview in tables similar the one below.

Not to familiar with
getElementsByClassName, getElementsByTagName, etc. I general know how to locate the table information but not sure how to write the code to get getElementsByClassName, getElementsByTagName, etc. When I look at the HTML code I am not sure which one to grab for my specific data to get to a table. Maybe some useful vba and html documents to look at?

Any help is much appreciated.


Option Explicit

Sub GetStockData()

Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLRows As MSHTML.IHTMLElementCollection
Dim HTMLRow As MSHTML.IHTMLElement
Dim wksDest As Worksheet
Dim r As Long

Set wksDest = Sheet2
wksDest.Cells.Clear

Application.ScreenUpdating = False

IE.navigate "https://finance.yahoo.com/quote/GOOGL?p=GOOGL"
IE.Visible = False

With IE
Do While .Busy Or .readyState <> READYSTATE_COMPLETE
DoEvents
Loop
End With

Set HTMLDoc = IE.document

With HTMLDoc.getElementsByClassName("D(ib) Fw(200) Mend(20px)")(0)
wksDest.Range("A1").Value = .Children(0).innerText
wksDest.Range("B1").Value = .Children(1).innerText
End With

Set HTMLRows = HTMLDoc.getElementsByClassName("D(ib) W(1/2) Bxz(bb)")(0).getElementsByTagName("tr")

r = 3
For Each HTMLRow In HTMLRows
wksDest.Cells(r, "A").Value = HTMLRow.Cells(0).innerText
wksDest.Cells(r, "B").Value = HTMLRow.Cells(1).innerText
r = r + 1
Next HTMLRow

Set HTMLRows = HTMLDoc.getElementsByClassName("D(ib) W(1/2) Bxz(bb)")(1).getElementsByTagName("tr")

r = 3
For Each HTMLRow In HTMLRows
wksDest.Cells(r, "D").Value = HTMLRow.Cells(0).innerText
wksDest.Cells(r, "E").Value = HTMLRow.Cells(1).innerText
r = r + 1
Next HTMLRow

Sheet2.Activate

Application.ScreenUpdating = True

MsgBox "Completed...", vbInformation

Set IE = Nothing
Set HTMLDoc = Nothing
Set HTMLRows = Nothing
Set HTMLRow = Nothing
Set wksDest = Nothing

End Sub
 
i studied the html and that's how the yahoo programmer identified where that value is just look at the source... look at my post #12 ... i dont really know too much about html other than it has reliable patterns to use for parsing

Code:
Sub ParseYahooFinanceTable(html As String)

    Dim index, tableIndex, removeIndex, row, col As Long
    Dim searchValue, parseValue As String
    Dim switch as Boolean

    row = 1
    col = 1
    switch = False
    index  = 1
    tableIndex = 29

    Do
        searchValue = see my image, mr excel cant handle this line
        index = InStr(index, html, searchValue)
        
        If index > 0 Then
            index = index + Len(searchValue)

            removeIndex = see my image, mr excel cant handle this line
            Cells(row, col).Value2 = Mid(html, index, removeIndex - index)

            If switch Then
                row  = row + 1
                tableIndex = tableIndex + 4
                switch = False
            Else
                col = col + 1
                tableIndex = tableIndex + 1
                switch = True
            End If
        Else
            MsgBox "Update your macro"
            Exit Sub
        End If
    Loop While tableIndex < 51
End Sub

ajsDdEq.png


so if you pass the html string into that subroutine, it will write that table into A1:B6 on the active sheet
 
Last edited:
Upvote 0

Excel Facts

Pivot Table Drill Down
Double-click any number in a pivot table to create a new report showing all detail rows that make up that number
hmmm i never used internetexplorer object to navigate the internet... i actually prefer to use httpwebrequest

just google the api to grab the source of the link you went to


I found this... that might work, looks simple

Code:
Dim htmlText As String
With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "http://yahoo.com", False
    .Send
    htmlText = .ResponseText
End With


https://www.mrexcel.com/forum/excel...de-visual-basic-applications.html#post2752378
 
Last edited:
Upvote 0
I get an access denied so it doesnt work. I have been searching but can't find anything. Not sure why I can pull the entire html beside the HTMLDoc.body.outerHTML
 
Upvote 0
I get an access denied so it doesnt work. I have been searching but can't find anything. Not sure why I can pull the entire html beside the HTMLDoc.body.outerHTML


yeah so just use httpwebrequest...

https://www.codeproject.com/questions/794960/how-do-i-get-http-request-in-excel-vba

scroll down to Solution 1, then you can get rid of all that other internetexplorer HTMLDoc code

this is better i think...

https://coderwall.com/p/pbxsyw/vba-web-requests

btw in my code, the table actually ends at Table Index 55 so have the loop break < 56 instead of < 51
 
Last edited:
Upvote 0
https://coderwall.com/p/pbxsyw/vba-web-requests

<code class="prettyprint prettyprinted" style=""> ' make sure to include the Microsoft WinHTTP Services in the project
' tools -> references -> Microsoft WinHTTP Services, version 5.1
' http://www.808.dk/?code-simplewinhttprequest
' http://msdn.microsoft.com/en-us/library/windows/desktop/aa384106(v=vs.85).aspx
' http://www.neilstuff.com/winhttp/
</code>

Code:
<code class="prettyprint prettyprinted" style="">Function MakeGetRequest(url <code class="prettyprint prettyprinted" style=""> As String</code><code class="prettyprint prettyprinted" style=""></code>) As String
    ' make sure to include the Microsoft WinHTTP Services in the project
    ' tools -> references -> Microsoft WinHTTP Services, version 5.1
    ' http://www.808.dk/?code-simplewinhttprequest
    ' http://msdn.microsoft.com/en-us/library/windows/desktop/aa384106(v=vs.85).aspx
    ' http://www.neilstuff.com/winhttp/

    ' create the request object
    Set req = CreateObject("WinHttp.WinHttpRequest.5.1")

    ' set timeouts
    ' http://msdn.microsoft.com/en-us/library/windows/desktop/aa384061(v=vs.85).aspx
    ' SetTimeouts(resolveTimeout, ConnectTimeout, SendTimeout, ReceiveTimeout)
    req.SetTimeouts 60000, 60000, 60000, 60000

    ' make the request, http verb (method), url, false to force syncronous
    ' open(http method, absolute uri to request, async (true: async, false: sync)
    req.Open "GET", url, False


    ' set WinHttpRequestOption enumerations
    ' http://msdn.microsoft.com/en-us/library/windows/desktop/aa384108(v=vs.85).aspx

    ' set user agent
    req.Option(0) = "Echovoice VBA HTTP Bot v0.1"

    ' set ssl ignore errors
    '   13056: ignore errors
    '   0: break on errors
    req.Option(4) = 13056

    ' set redirects
    req.Option(6) = True

    ' allow http to redirect to https
    req.Option(12) = True

    ' send request
    ' send post data, should be blank for a get request
    req.Send ""

    ' read response and return
    MakeWebRequest = req.ResponseText

End Function</code>

then you could do...

Code:
Call ParseYahooFinanceTable(<code class="prettyprint prettyprinted" style="">MakeGetRequest("https://finance.yahoo.com/quote/VTI/profile?p=VTI"</code><code class="prettyprint prettyprinted" style="">))
</code>
try that
 
Last edited:
Upvote 0
I tried the code but get an error on Call ParseYahooFinanceTable(<code class="prettyprint prettyprinted" style="color: rgb(51, 51, 51); font-size: 12px; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; line-height: 12px;">MakeGetRequest("https://finance.yahoo.com/quote/VTI/profile?p=VTI"</code><code class="prettyprint prettyprinted" style="color: rgb(51, 51, 51); font-size: 12px; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; line-height: 12px;">))

I just did Msgbox MakeGetRequest("https://finance.yahoo.com/quote/VTI/profile?p=VTI")

Not having any luck</code>
 
Upvote 0

Forum statistics

Threads
1,217,278
Messages
6,135,604
Members
449,951
Latest member
willbaggs

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top