Pull Data From Website

mikeypsooner

New Member
Joined
May 16, 2017
Messages
33
I would like to pull the data from the tables on the webpage https://finance.yahoo.com/quote/VTI/holdings?p=VTI

Looking to get the Overall Portfolio Composition (%), Sector Weightings (%), Equity Holdings, Bond Ratings, Top 10 Holdings (14.91% of Total Assets), and Fund Overview in tables similar the one below.

Not to familiar with
getElementsByClassName, getElementsByTagName, etc. I general know how to locate the table information but not sure how to write the code to get getElementsByClassName, getElementsByTagName, etc. When I look at the HTML code I am not sure which one to grab for my specific data to get to a table. Maybe some useful vba and html documents to look at?

Any help is much appreciated.


Option Explicit

Sub GetStockData()

Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLRows As MSHTML.IHTMLElementCollection
Dim HTMLRow As MSHTML.IHTMLElement
Dim wksDest As Worksheet
Dim r As Long

Set wksDest = Sheet2
wksDest.Cells.Clear

Application.ScreenUpdating = False

IE.navigate "https://finance.yahoo.com/quote/GOOGL?p=GOOGL"
IE.Visible = False

With IE
Do While .Busy Or .readyState <> READYSTATE_COMPLETE
DoEvents
Loop
End With

Set HTMLDoc = IE.document

With HTMLDoc.getElementsByClassName("D(ib) Fw(200) Mend(20px)")(0)
wksDest.Range("A1").Value = .Children(0).innerText
wksDest.Range("B1").Value = .Children(1).innerText
End With

Set HTMLRows = HTMLDoc.getElementsByClassName("D(ib) W(1/2) Bxz(bb)")(0).getElementsByTagName("tr")

r = 3
For Each HTMLRow In HTMLRows
wksDest.Cells(r, "A").Value = HTMLRow.Cells(0).innerText
wksDest.Cells(r, "B").Value = HTMLRow.Cells(1).innerText
r = r + 1
Next HTMLRow

Set HTMLRows = HTMLDoc.getElementsByClassName("D(ib) W(1/2) Bxz(bb)")(1).getElementsByTagName("tr")

r = 3
For Each HTMLRow In HTMLRows
wksDest.Cells(r, "D").Value = HTMLRow.Cells(0).innerText
wksDest.Cells(r, "E").Value = HTMLRow.Cells(1).innerText
r = r + 1
Next HTMLRow

Sheet2.Activate

Application.ScreenUpdating = True

MsgBox "Completed...", vbInformation

Set IE = Nothing
Set HTMLDoc = Nothing
Set HTMLRows = Nothing
Set HTMLRow = Nothing
Set wksDest = Nothing

End Sub
 
dud1yPf.png

FCpkFvh.png


now instead of 2 conditions one item per condition (boolean switch was used)
you have still 2 conditions but 3 items before repeat
so now instead of a boolean switch, just use an integer that you toggle between value 1 - 3

so the table index pattern is +1 +1 +4

your loop is still doing 2 things only like before but it needs to do the 2nd condition twice in a row

Code:
'iNITIALIZE
Dim loopToggle as Integer
loopToggle = 1

...

'this would be at the end of the loop to change the next loop iteration's logic
loopToggle = loopToggle + 1
If loopToggle = 4 Then loopToggle = 1 'now this will loop 1-3

SWITCH = true was for values before

now loopToggle = 1 is for beginning of table row and 2-3 is for values

If switch Then is the same as... if loopToggle > 1
 
Last edited:
Upvote 0

Excel Facts

Is there a shortcut key for strikethrough?
Ctrl+S is used for Save. Ctrl+5 is used for Strikethrough. Why Ctrl+5? When you use hashmarks to count |||| is 4, strike through to mean 5.
the code is almost identical for that 3 column table...

Code:
Sub ParseYahooFinanceFundOperationTable(html As String)
    Dim index, tableIndex, removeIndex, row, col As Long
    Dim searchValue As String
    Dim switch As Integer


    row = 1
    col = 4
    switch = 1
    index = 1
    tableIndex = 70


    Do
        searchValue = IIf(switch > 1, "", "[COLOR=#ff0000]?[/COLOR]span ") & "data-reactid=""" & tableIndex & """>"
        
        index = InStr(index, html, searchValue)
        
        If index > 0 Then
            index = index + Len(searchValue)
            
            removeIndex = InStr(index, html, "[COLOR=#ff0000]?[/COLOR]/")
            
            Cells(row, col).Value2 = Mid(html, index, removeIndex - index)


            If switch = 1 Then
                col = col + 1
                tableIndex = tableIndex + 1
                switch = 2
            Else
                If switch = 2 Then
                    col = col + 1
                    tableIndex = tableIndex + 1
                    switch = 3
                Else
                    row = row + 1
                    col = col - 2
                    tableIndex = tableIndex + 4
                    switch = 1
                End If
            End If
        Else
            MsgBox "Update your macro"
            Exit Sub
        End If
    Loop While tableIndex < 85
    
    Columns.AutoFit
End Sub

i removed the less than characters and replaced with ? because mr excel would remove the text

FqOIO8o.png
 
Last edited:
Upvote 0
Just in case you're still interested in an answer to this question, HTMLDoc.body.outerHTML does in fact pull the entire HTML code.
Not quite; that's only the HTML body tag and its contents.

For the whole HTML, including the html and head tags, use HTMLDoc.body.parentElement.outerHTML.
 
Upvote 0
Not quite; that's only the HTML body tag and its contents.

For the whole HTML, including the html and head tags, use HTMLDoc.body.parentElement.outerHTML.

Strictly speaking, you're right of course. However, we were actually just talking within the context of the body itself. Also, for the whole HTML, I think we could also use HTMLDoc.documentElement.outerHTML.
 
Upvote 0
cerfani,

When I use the statement responseText = MakeGetRequest("https://finance.yahoo.com/quote/VTI/holdings?p=VTI") how do I view the responseText. I am working on a new page but when I try to write it to a cell value it is truncating it because it is too long. How can I view the entire html maybe in a word or text document.
 
Upvote 0
you would need to save the string to a text file and then use the file path to start a process. If you don't specify a process, windows will use the file extension to determine which app to open it with.

Notepad++ is a free notepad app, that is useful for programmers or if you want to look at html code in a text file... all those images i pasted of the html code, i was viewing it in notepad++

notepad++ provides the syntax highlighting ;)
 
Last edited:
Upvote 0
my code

qurl = "https://finance.yahoo.com/quote/SPY/options?p=SPY"
With ActiveSheet.QueryTables.Add(Connection:="URL;" & qurl, Destination:=Range("A1"))
.PostText = True
.BackgroundQuery = True
.TablesOnlyFromHTML = False
.Refresh BackgroundQuery:=False
End With

error message
Run-time error "1004"
Unable to open https://finance.yahoo.com/quote/SPY/options?p=SPY.
could not be found. (HTTP/1.0 404)
 
Upvote 0
cerfani and others,

Can not figure out why the Function MakeGetRequest fails when I use the website below

Code:
responseText = MakeGetRequest("http://portfolios.morningstar.com/fund/summary?t=VTI&region=usa&culture=en_US")

I get a run time error '-2147012866 (80072efe): the connection with the server was terminated abnormally. I can paste the test "https://finance.yahoo.com/quote/VTI" and it works great.

Any thoughts?
 
Upvote 0
in your browser run the developer tools to inspect the network... then navigate to that website... then set the headers in your code to be exactly like your browser

when a browser navigates, it is actually reading the html and it knows how to communicate but you can just do it in your browser and copy what it does.

You can see how to set headers inside the code. Also you should be aware if a website requires your to POST data. Then you need a function for that. GET requests simply ask for the html with no specified parameters.

edit: I just checked, you are doing a GET for this one, setup the headers correctly and the server will respond... you can most likely ignore the cookie header... that is probably google up to no good... not sure but i doubt it is needed, watch out with the acceptencoding too, you can probably ignore that too
 
Last edited:
Upvote 0

Forum statistics

Threads
1,216,028
Messages
6,128,392
Members
449,445
Latest member
JJFabEngineering

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top