Hello Everyone.
I am trying scraping for the first time. I came across this website that gives a great example but I am getting lost with what I would call more advanced programming.
What I am trying to do is extract the person's name, number and address from the website but getting lost in the code and unclear how it should be modified. The sample code uses a list brought in from Excel, but in this case I am trying to hard code with the following id from the HTML source code
listing__row
ContactName1
ContactPhone1
ContactAddress1
I want to output the scraping results to a spreadsheet but was planning to post the results to the sheet as the details are located so was not planning to store the values in memory.
Any help to steer me in the right direction is appreciated.
Scraping a website's HTML in VBA
Sample of HTML Code to be used
I am trying scraping for the first time. I came across this website that gives a great example but I am getting lost with what I would call more advanced programming.
What I am trying to do is extract the person's name, number and address from the website but getting lost in the code and unclear how it should be modified. The sample code uses a list brought in from Excel, but in this case I am trying to hard code with the following id from the HTML source code
listing__row
ContactName1
ContactPhone1
ContactAddress1
I want to output the scraping results to a spreadsheet but was planning to post the results to the sheet as the details are located so was not planning to store the values in memory.
Any help to steer me in the right direction is appreciated.
Scraping a website's HTML in VBA
Sample of HTML Code to be used
HTML:
K Fiedler-Seres (905) 895-3245 832 Arnold Cres Newmarket ON L3Y 2E2 Get directions
Code:
Sub ImportCanada411Data()'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "http://www.canada411.ca/search/ad/1/-/cZQQstZArnold%20CrescentQQciZNewmarketQQpvZONQQpcZ/?pgLen=10000"
'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to Canada411 ..."
DoEvents
Loop
'show text of HTML document returned
Set html = ie.document
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
Dim Contact As IHTMLElement
Dim Questions As IHTMLElementCollection
Dim Question As IHTMLElement
'Dim RowNumber As Long
'Dim QuestionId As String
'Dim QuestionFields As IHTMLElementCollection
'Dim QuestionField As IHTMLElement
Dim StreetCnt As Integer
Dim ListCnt As Integer
Dim QuestionFieldLinks As IHTMLElementCollection
Set Contact = html.getElementById("listing__row")
'Set Questions = QuestionList.Children
StreetCnt = 2
ListCnt = 2
'For Each Question In Questions
'if this is the tag containing the question details, process it
If Question.className = "question-summary narrow" Then
'first get and store the question id in first column
QuestionId = Replace(Question.ID, "question-summary-", "")
Cells(RowNumber, 1).Value = CLng(QuestionId)
'if this is the question's votes, store it (get rid of any surrounding text)
If QuestionField.className = "votes" Then
votes = Replace(QuestionField.innerText, "votes", "")
votes = Replace(votes, "vote", "")
Cells(RowNumber, 2).Value = Trim(votes)
End If
'likewise for views (getting rid of any text)
If QuestionField.className = "views" Then
views = QuestionField.innerText
views = Replace(views, "views", "")
views = Replace(views, "view", "")
Cells(RowNumber, 3).Value = Trim(views)
End If
'if this is the bit where author's name is ...
If QuestionField.className = "started" Then
'get a list of all elements within, and store the
'text in the second one
Set QuestionFieldLinks = QuestionField.all
Cells(RowNumber, 4).Value = QuestionFieldLinks(2).innerHTML
End If
'Next QuestionField
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Next
Set html = Nothing
MsgBox "Done!"
End Sub