Scraping web site...pulling in html source to parse for data

OaklandJim

New Member
Joined
Nov 29, 2018
Messages
28
Office Version
2019
Platform
Windows
Am trying to scrape Morningstar web pages for funds data. In the past I paid up to get help with this as it is beyond my abilities. I'll pay up again if I have to.

Anyway, before Morningstar redesigned their web pages the following code used to pull in the entire html source for a web url.

VBA Code:
Function GetSource(sURL As String) As String
    Dim oXHTTP As Object
    
    Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
    
    On Error GoTo haveError
    oXHTTP.Open "GET", sURL, False
    oXHTTP.send
    GetSource = oXHTTP.responsetext
        
    Set oXHTTP = Nothing
        
Exit Function

haveError:

    Debug.Print "Error in Function GetSource for URL " & sURL
    Err.Clear
    GetSource = ""
    Set oXHTTP = Nothing
End Function
Now it pulls in text that looks a lot different than the source does when I look at it and save it using a browser. Specifically the code pulls in SOME html but it is a much smaller file (55k chars) than the source (~1.8 million chars ) that I see/save using the browser.

In the latter I can see my data and its tags and know how to parse to get it. BUT I cannot get the "full" source code to parse. The "small" source I get does not have my data in it.

Also, I presume that VBA string vars and string functions can handle that many chars?

Here is a typical URL for me. This is for an ETF

Here is one for a mutual fund
 

Rijnsent

Well-known Member
Joined
Oct 17, 2005
Messages
1,110
Office Version
365
Platform
Windows
Hi Jim,

what data are you looking for? If I load the page (VNQ), I see that they load some stuff after loading the initial page, calling on an internal API. So it could be that your scraping can be more efficient by tapping into those. If I check the source of the page (View page source), I also see a more compact version of the page, where also a lot of stuff is not showing. So it seems that they made their side scrape-unfriendly on purpose.
Having said that, you could try this trick to get to their API (python): Google Translate
And to finish it off: I felt intrigued and spent some time building a small tool that can scrape the code through the JSONs, see this dropbox folder and find the file morningstar_scrape.xlsm (the others are for other questions I answered on this forum): MrExcel

Koen
 

OaklandJim

New Member
Joined
Nov 29, 2018
Messages
28
Office Version
2019
Platform
Windows
Koen I really appreciate the thoughtful response! Thanx a lot. I just want to scrape basic info from Morningstar like fund expense, yield, stars rating, medal rating and asset allocation for my portfolio spreadsheet. I skimmed the resource you suggested. Looks great. But, unfortunately I'm getting old and not so up-to-date enuf on modern programming stuff. VBA, php, limited javascript but never worked with an API or Python or JOSN. Anyway, I cannot expect much handholding for such a journey so I conclude that 1. what I want can be done but 2. I am not really up to learning what seems like a lot. I'll suck it up and do by transcribing data for my 50 funds. I'll need a half day at least. Prefer to do quarterly or more often but may have to stick with one update a year. Heheh, if I could program it in two weeks I'd be surprised. Last thought, is there a good clearinghouse for programmers-for-hire who might do this for me for a reasonable cost? Have a great day all.
 

OaklandJim

New Member
Joined
Nov 29, 2018
Messages
28
Office Version
2019
Platform
Windows
Koen I did not want to feel ungrateful so I took a look at the workbook that you put together. Wow. That took a lot of work. I can't tell how good it is but it sure seems sophisticated. What to do with it though? I looked at it and I see some semblance of VBA that I know but a lot is not understandable. Might you add a super simple basic front end on it so I can mess with it? A button even that makes relevant call to do the scraping and output of the array? Otherwise I just won't know what to do with your hard work. Thanks again, really.
 

Rijnsent

Well-known Member
Joined
Oct 17, 2005
Messages
1,110
Office Version
365
Platform
Windows
Koen I did not want to feel ungrateful so I took a look at the workbook that you put together. Wow. That took a lot of work. I can't tell how good it is but it sure seems sophisticated. What to do with it though? I looked at it and I see some semblance of VBA that I know but a lot is not understandable. Might you add a super simple basic front end on it so I can mess with it? A button even that makes relevant call to do the scraping and output of the array? Otherwise I just won't know what to do with your hard work. Thanks again, really.
Hi Jim,
thanks for your kind words. I wrote a big chunck of the code for some other project, so that was basically copy-paste. I could build something with a button if you provide me with a mockup. So if you can make an excel sheet with your user interface, I can fuse that with my code. You can't attach excel files, so use e.g. dropbox to get your file accross.
Thanks,
Koen
 

Forum statistics

Threads
1,077,864
Messages
5,336,846
Members
399,105
Latest member
idleminutes

Some videos you may like

This Week's Hot Topics

Top