Hi -
So I wrote a script to basically scrape through the html on a site, and then draw out certain information to populate my excel sheet. The problem is that it takes AGES!
I'm sure there must be a much more efficient or quicker way to do what I want. I was thinking along the following lines:
1) Just make the html of the page one long string, and then use mid, len, instr type functions to extract the info I want, rather than looping through each child html element? I'm guessing this is the main thing that affects performance?
2) I tried putting the info into an array, rather than straight into the ranges on my sheet, but it didn't really seem to be any quicker? I don't know if it's just the way I was doing it though. Any advice?
So PLEASE HELP!
Here is my code so far:
So I wrote a script to basically scrape through the html on a site, and then draw out certain information to populate my excel sheet. The problem is that it takes AGES!
I'm sure there must be a much more efficient or quicker way to do what I want. I was thinking along the following lines:
1) Just make the html of the page one long string, and then use mid, len, instr type functions to extract the info I want, rather than looping through each child html element? I'm guessing this is the main thing that affects performance?
2) I tried putting the info into an array, rather than straight into the ranges on my sheet, but it didn't really seem to be any quicker? I don't know if it's just the way I was doing it though. Any advice?
So PLEASE HELP!
Here is my code so far:
Code:
Sub ImportAlibabaData()
Dim QuestionList As IHTMLElement
Dim Questions As IHTMLElementCollection
Dim Question As IHTMLElement
Dim RowNumber As Long
Dim QuestionId As String
Dim QuestionFields As IHTMLElementCollection
Dim QuestionField As IHTMLElement
Dim votes As String
Dim views As String
Dim QuestionFieldLinks As IHTMLElementCollection
Dim newtext As String
Dim x As Integer
Dim Getsuppliernumber As String
'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "http://www.alibaba.com/corporations/CO2_regulator.html"
'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Searching....."
DoEvents
Loop
'show text of HTML document returned
Set html = ie.document
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
'clear old data out and put titles in
Cells.Clear
'trying to find total suppliers
Set QuestionList = html.body
Set Questions = QuestionList.Children
For Each Question In Questions
If InStr(Question.innerText, "se_rst=") <> 0 Then
Getsuppliernumber = Mid(Question.innerText, InStr(Question.innerText, "se_rst=") + 7, 5)
Getsuppliernumber = Left(Getsuppliernumber, InStr(Getsuppliernumber, "|") - 1)
End If
Next
'Start by getting a reference to the HTML element which contains all of the questions (this also initialises the row number in the spreadsheet to 4, the one after the titles):
Set QuestionList = html.getElementById("J-items-content")
Set Questions = QuestionList.Children
x = 0
'Now we'll loop over all of the child elements within this tag, finding each question in turn:
For Each Question In Questions
'if this is the tag containing the question details, process it
If Question.className = "f-icon m-item " Then
x = x + 1
Set QuestionFields = Question.all
For Each QuestionField In QuestionFields
'Company Name
If QuestionField.className = "title ellipsis" Then
Range("A" & x).Value = QuestionField.innerText
End If
'Years as Gold Supplier
For years = 1 To 20
If QuestionField.className = "gs" & years Then
Range("B" & x).Value = years
End If
Next
'Trade Assurance
If QuestionField.className = "ico ico-ta" Then
Range("C" & x).Value = "Yes TA"
End If
'Contact Details Link
If QuestionField.className = "cd" Then
Range("D" & x).Value = QuestionField.getAttribute("href")
End If
'Assessed Supplier
If QuestionField.className = "as J-as" Then
Range("E" & x).Value = "Yes AS"
End If
'Main Products Listed
If QuestionField.className = "value ellipsis ph" Then
Range("F" & x).Value = QuestionField.innerText
End If
'Total Revenue
If QuestionField.hasAttribute("data-reve") = True Then
Range("G" & x).Value = QuestionField.innerText
End If
'Country
If QuestionField.hasAttribute("data-coun") = True Then
Range("H" & x).Value = QuestionField.innerText
End If
'Response Rate
If QuestionField.className = "sts" Then
Range("I" & x).Value = QuestionField.innerText
End If
Next
End If
Next
Set html = Nothing
Application.StatusBar = ""
End Sub