Scraping web site...pulling in html source to parse for data

OaklandJim

New Member
Joined
Nov 29, 2018
Messages
46
Office Version
2019
Platform
Windows
Am trying to scrape Morningstar web pages for funds data. In the past I paid up to get help with this as it is beyond my abilities. I'll pay up again if I have to.

Anyway, before Morningstar redesigned their web pages the following code used to pull in the entire html source for a web url.

VBA Code:
Function GetSource(sURL As String) As String
    Dim oXHTTP As Object
    
    Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
    
    On Error GoTo haveError
    oXHTTP.Open "GET", sURL, False
    oXHTTP.send
    GetSource = oXHTTP.responsetext
        
    Set oXHTTP = Nothing
        
Exit Function

haveError:

    Debug.Print "Error in Function GetSource for URL " & sURL
    Err.Clear
    GetSource = ""
    Set oXHTTP = Nothing
End Function
Now it pulls in text that looks a lot different than the source does when I look at it and save it using a browser. Specifically the code pulls in SOME html but it is a much smaller file (55k chars) than the source (~1.8 million chars ) that I see/save using the browser.

In the latter I can see my data and its tags and know how to parse to get it. BUT I cannot get the "full" source code to parse. The "small" source I get does not have my data in it.

Also, I presume that VBA string vars and string functions can handle that many chars?

Here is a typical URL for me. This is for an ETF

Here is one for a mutual fund
 

Rijnsent

Well-known Member
Joined
Oct 17, 2005
Messages
1,151
Office Version
365
Platform
Windows
Hi Jim,

what data are you looking for? If I load the page (VNQ), I see that they load some stuff after loading the initial page, calling on an internal API. So it could be that your scraping can be more efficient by tapping into those. If I check the source of the page (View page source), I also see a more compact version of the page, where also a lot of stuff is not showing. So it seems that they made their side scrape-unfriendly on purpose.
Having said that, you could try this trick to get to their API (python): Google Translate
And to finish it off: I felt intrigued and spent some time building a small tool that can scrape the code through the JSONs, see this dropbox folder and find the file morningstar_scrape.xlsm (the others are for other questions I answered on this forum): MrExcel

Koen
 

OaklandJim

New Member
Joined
Nov 29, 2018
Messages
46
Office Version
2019
Platform
Windows
Koen I really appreciate the thoughtful response! Thanx a lot. I just want to scrape basic info from Morningstar like fund expense, yield, stars rating, medal rating and asset allocation for my portfolio spreadsheet. I skimmed the resource you suggested. Looks great. But, unfortunately I'm getting old and not so up-to-date enuf on modern programming stuff. VBA, php, limited javascript but never worked with an API or Python or JOSN. Anyway, I cannot expect much handholding for such a journey so I conclude that 1. what I want can be done but 2. I am not really up to learning what seems like a lot. I'll suck it up and do by transcribing data for my 50 funds. I'll need a half day at least. Prefer to do quarterly or more often but may have to stick with one update a year. Heheh, if I could program it in two weeks I'd be surprised. Last thought, is there a good clearinghouse for programmers-for-hire who might do this for me for a reasonable cost? Have a great day all.
 

OaklandJim

New Member
Joined
Nov 29, 2018
Messages
46
Office Version
2019
Platform
Windows
Koen I did not want to feel ungrateful so I took a look at the workbook that you put together. Wow. That took a lot of work. I can't tell how good it is but it sure seems sophisticated. What to do with it though? I looked at it and I see some semblance of VBA that I know but a lot is not understandable. Might you add a super simple basic front end on it so I can mess with it? A button even that makes relevant call to do the scraping and output of the array? Otherwise I just won't know what to do with your hard work. Thanks again, really.
 

Rijnsent

Well-known Member
Joined
Oct 17, 2005
Messages
1,151
Office Version
365
Platform
Windows
Koen I did not want to feel ungrateful so I took a look at the workbook that you put together. Wow. That took a lot of work. I can't tell how good it is but it sure seems sophisticated. What to do with it though? I looked at it and I see some semblance of VBA that I know but a lot is not understandable. Might you add a super simple basic front end on it so I can mess with it? A button even that makes relevant call to do the scraping and output of the array? Otherwise I just won't know what to do with your hard work. Thanks again, really.
Hi Jim,
thanks for your kind words. I wrote a big chunck of the code for some other project, so that was basically copy-paste. I could build something with a button if you provide me with a mockup. So if you can make an excel sheet with your user interface, I can fuse that with my code. You can't attach excel files, so use e.g. dropbox to get your file accross.
Thanks,
Koen
 

Forum statistics

Threads
1,082,025
Messages
5,362,731
Members
400,688
Latest member
ConnerR

Some videos you may like

This Week's Hot Topics

  • populate from drop list with multiple tables
    Hi All, i have a drop list that displays data, what i want is when i select one of those from the list to populate text from different tables on...
  • Find list of words from sheet2 in sheet1 before a comma and extract text vba
    Hi Friends, Trying to find the solution on my task. But did not find suitable one to the need. Here is my query and sample file with details...
  • Dynamic Formula entry - VBA code sought
    Hello, really hope one of you experts can help with this - i've spent hours on this and getting no-where. .I have a set of data (more rows than...
  • Listbox Header
    Have a named range called "AccidentsHeader" Within my code I have: [CODE]Private Sub CommandButton1_Click() ListBox1.RowSource =...
  • Complex Heat Map using conditional formatting
    Good day excel world. I have a concern. Below link have a list of countries that carries each country unique data. [URL...
  • Conditional formatting
    Hi good morning, hope you can help me please, I have cells P4:P54 and if this cell is equal to 1 then i want row O to say "Fully Utilised" and to...
Top