Scraping Website data in VBA soooo slow - any advice?

bensdb

New Member
Joined
Jan 24, 2014
Messages
18
Hi -

So I wrote a script to basically scrape through the html on a site, and then draw out certain information to populate my excel sheet. The problem is that it takes AGES!

I'm sure there must be a much more efficient or quicker way to do what I want. I was thinking along the following lines:

1) Just make the html of the page one long string, and then use mid, len, instr type functions to extract the info I want, rather than looping through each child html element? I'm guessing this is the main thing that affects performance?

2) I tried putting the info into an array, rather than straight into the ranges on my sheet, but it didn't really seem to be any quicker? I don't know if it's just the way I was doing it though. Any advice?

So PLEASE HELP!

Here is my code so far:

Code:
Sub ImportAlibabaData()

Dim QuestionList As IHTMLElement
Dim Questions As IHTMLElementCollection
Dim Question As IHTMLElement
Dim RowNumber As Long
Dim QuestionId As String
Dim QuestionFields As IHTMLElementCollection
Dim QuestionField As IHTMLElement
Dim votes As String
Dim views As String
Dim QuestionFieldLinks As IHTMLElementCollection
Dim newtext As String
Dim x As Integer
Dim Getsuppliernumber As String

'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "http://www.alibaba.com/corporations/CO2_regulator.html"
'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Searching....."
DoEvents
Loop
'show text of HTML document returned
Set html = ie.document
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
'clear old data out and put titles in
Cells.Clear

'trying to find total suppliers
Set QuestionList = html.body
Set Questions = QuestionList.Children
For Each Question In Questions
    If InStr(Question.innerText, "se_rst=") <> 0 Then
    Getsuppliernumber = Mid(Question.innerText, InStr(Question.innerText, "se_rst=") + 7, 5)
    Getsuppliernumber = Left(Getsuppliernumber, InStr(Getsuppliernumber, "|") - 1)
    End If
Next

'Start by getting a reference to the HTML element which contains all of the questions (this also initialises the row number in the spreadsheet to 4, the one after the titles):
Set QuestionList = html.getElementById("J-items-content")
Set Questions = QuestionList.Children

x = 0
'Now we'll loop over all of the child elements within this tag, finding each question in turn:
For Each Question In Questions
    'if this is the tag containing the question details, process it
    If Question.className = "f-icon m-item  " Then
    x = x + 1
    Set QuestionFields = Question.all
        For Each QuestionField In QuestionFields
        
            'Company Name
            If QuestionField.className = "title ellipsis" Then
                Range("A" & x).Value = QuestionField.innerText
            End If
            
            'Years as Gold Supplier
            For years = 1 To 20
                If QuestionField.className = "gs" & years Then
                    Range("B" & x).Value = years
                End If
            Next
            
            'Trade Assurance
            If QuestionField.className = "ico ico-ta" Then
                Range("C" & x).Value = "Yes TA"
            End If
            
            'Contact Details Link
            If QuestionField.className = "cd" Then
                Range("D" & x).Value = QuestionField.getAttribute("href")
            End If
            
            'Assessed Supplier
            If QuestionField.className = "as J-as" Then
                Range("E" & x).Value = "Yes AS"
            End If
            
            'Main Products Listed
            If QuestionField.className = "value ellipsis ph" Then
                Range("F" & x).Value = QuestionField.innerText
            End If
            
            'Total Revenue
            If QuestionField.hasAttribute("data-reve") = True Then
                Range("G" & x).Value = QuestionField.innerText
            End If
       
            'Country
            If QuestionField.hasAttribute("data-coun") = True Then
                Range("H" & x).Value = QuestionField.innerText
            End If
               
            'Response Rate
            If QuestionField.className = "sts" Then
                Range("I" & x).Value = QuestionField.innerText
            End If     
            
        Next
       
    End If
Next
Set html = Nothing

Application.StatusBar = ""

End Sub
 

Excel Facts

Did you know Excel offers Filter by Selection?
Add the AutoFilter icon to the Quick Access Toolbar. Select a cell containing Apple, click AutoFilter, and you will get all rows with Apple

Forum statistics

Threads
1,215,063
Messages
6,122,927
Members
449,094
Latest member
teemeren

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top