How can I make this web scrape VBA code perform quicker?

Indominus

Board Regular
Joined
Jul 11, 2020
Messages
126
Office Version
  1. 2016
Platform
  1. Windows
Hello. So I have this VBA code that web scrapes data from a work website. It used Internet Explorer. However, when there is a lot of data it can take a few minutes to actually finish performing the task. Any tweaks that I could make to increase the speed or how could I transfer this using Google chrome instead? And if so how? I have seen Chrome web scrape codes perform quicker. Here is the code. It is for work so I cannot post the URL. Thank you in advance!

VBA Code:
Dim Browser As InternetExplorer

Dim Document As HTMLDocument

Dim Table As IHTMLElement

Dim Tables As IHTMLElementCollection

Dim Div As IHTMLElement

Dim Divs As IHTMLElementCollection

Dim H3 As IHTMLElement

Dim TR As IHTMLElement

Dim TRs As IHTMLElementCollection

Dim TD As IHTMLElement

Dim TDs As IHTMLElementCollection

Dim Row As Integer

Dim Column As Integer



Dim ws As Worksheet

Dim URL As String


Set ws = ThisWorkbook.Worksheets("Setup")




Row = 1

Column = 1

Set ws = Sheets("PROCESS")

ws.Cells.Clear



URL = "Work URL"



Set Browser = New InternetExplorerMedium

Browser.navigate URL



'Wait for page to load

Do While Browser.Busy Or Browser.readyState <> READYSTATE_COMPLETE

DoEvents

Loop



'Scan the document

Set Document = Browser.Document

Set Divs = Document.getElementById("secondaryProductivityList").getElementsByTagName("div")

For Each Div In Divs

Set H3 = Div.getElementsByTagName("h3")(0)



If Not Div.className = "floatHeader" And Not H3 Is Nothing Then

ws.Cells(Row, 1).Value = H3.innerText

Row = Row + 1



Set Tables = Div.getElementsByTagName("table")

Set Table = Tables(0)

Set TRs = Table.getElementsByTagName("tr")

For Each TR In TRs

Column = 1

Set TDs = TR.getElementsByTagName("th")

For Each TD In TDs

ws.Cells(Row, Column).Value = TD.innerText

ws.Cells(Row, Column).Font.Bold = True

If TD.getAttribute("colspan") Then

Column = Column + TD.getAttribute("colspan")

Else

Column = Column + 1

End If

Next



Set TDs = TR.getElementsByTagName("td")

For Each TD In TDs

ws.Cells(Row, Column).Value = TD.innerText

Column = Column + 1

Next



Row = Row + 1

Next

End If

Row = Row + 1

Next



Browser.Quit

End Sub
 

Excel Facts

Copy a format multiple times
Select a formatted range. Double-click the Format Painter (left side of Home tab). You can paste formatting multiple times. Esc to stop

diddi

Well-known Member
Joined
May 20, 2004
Messages
3,337
Office Version
  1. 2010
Platform
  1. Windows
try xmlhttp. it usually performs the quickest
 

johnnyL

Well-known Member
Joined
Nov 7, 2011
Messages
843
Office Version
  1. 2013
  2. 2007
Platform
  1. Windows
@ Imdominus, If you group/format your code, it may make it a bit easier, for you, and others ,to follow your code. ;)

VBA Code:
'
    Dim Document    As HTMLDocument
'
    Dim Div         As IHTMLElement
    Dim H3          As IHTMLElement
    Dim Table       As IHTMLElement
    Dim TD          As IHTMLElement
    Dim TR          As IHTMLElement
'
    Dim Divs        As IHTMLElementCollection
    Dim Tables      As IHTMLElementCollection
    Dim TDs         As IHTMLElementCollection
    Dim TRs         As IHTMLElementCollection
'
    Dim Column      As Integer
    Dim Row         As Integer
'
    Dim Browser     As InternetExplorer
'
    Dim URL         As String
'
    Dim ws          As Worksheet
'
    Set ws = ThisWorkbook.Worksheets("Setup")
'
    Row = 1
    Column = 1
'
    Set ws = Sheets("PROCESS")
'
    ws.Cells.Clear
'
    URL = "Work URL"
'
    Set Browser = New InternetExplorerMedium
'
    Browser.navigate URL
'
'   Wait for page to load
    Do While Browser.Busy Or Browser.readyState <> READYSTATE_COMPLETE
        DoEvents
    Loop
'
'   Scan the document
    Set Document = Browser.Document
'
    Set Divs = Document.getElementById("secondaryProductivityList").getElementsByTagName("div")
'
    For Each Div In Divs
        Set H3 = Div.getElementsByTagName("h3")(0)
'
        If Not Div.className = "floatHeader" And Not H3 Is Nothing Then
            ws.Cells(Row, 1).Value = H3.innerText
            Row = Row + 1
'
            Set Tables = Div.getElementsByTagName("table")
            Set Table = Tables(0)
            Set TRs = Table.getElementsByTagName("tr")
'
            For Each TR In TRs
                Column = 1
'
                Set TDs = TR.getElementsByTagName("th")
'
                For Each TD In TDs
                    ws.Cells(Row, Column).Value = TD.innerText
                    ws.Cells(Row, Column).Font.Bold = True
'
                    If TD.getAttribute("colspan") Then
                        Column = Column + TD.getAttribute("colspan")
                    Else
                        Column = Column + 1
                    End If
                Next
'
                Set TDs = TR.getElementsByTagName("td")
'
                For Each TD In TDs
                    ws.Cells(Row, Column).Value = TD.innerText
                    Column = Column + 1
                Next
'
                Row = Row + 1
            Next
        End If
'
        Row = Row + 1
    Next
'
    Browser.Quit

That being said, you may want look into temporarily loading your scraped data into an array, and when all data is scraped, you could dump the array data to the sheet.

Since you can't post your work URL that you use, possibly post an alternate URL?
 

Indominus

Board Regular
Joined
Jul 11, 2020
Messages
126
Office Version
  1. 2016
Platform
  1. Windows
@ Imdominus, If you group/format your code, it may make it a bit easier, for you, and others ,to follow your code. ;)

VBA Code:
'
    Dim Document    As HTMLDocument
'
    Dim Div         As IHTMLElement
    Dim H3          As IHTMLElement
    Dim Table       As IHTMLElement
    Dim TD          As IHTMLElement
    Dim TR          As IHTMLElement
'
    Dim Divs        As IHTMLElementCollection
    Dim Tables      As IHTMLElementCollection
    Dim TDs         As IHTMLElementCollection
    Dim TRs         As IHTMLElementCollection
'
    Dim Column      As Integer
    Dim Row         As Integer
'
    Dim Browser     As InternetExplorer
'
    Dim URL         As String
'
    Dim ws          As Worksheet
'
    Set ws = ThisWorkbook.Worksheets("Setup")
'
    Row = 1
    Column = 1
'
    Set ws = Sheets("PROCESS")
'
    ws.Cells.Clear
'
    URL = "Work URL"
'
    Set Browser = New InternetExplorerMedium
'
    Browser.navigate URL
'
'   Wait for page to load
    Do While Browser.Busy Or Browser.readyState <> READYSTATE_COMPLETE
        DoEvents
    Loop
'
'   Scan the document
    Set Document = Browser.Document
'
    Set Divs = Document.getElementById("secondaryProductivityList").getElementsByTagName("div")
'
    For Each Div In Divs
        Set H3 = Div.getElementsByTagName("h3")(0)
'
        If Not Div.className = "floatHeader" And Not H3 Is Nothing Then
            ws.Cells(Row, 1).Value = H3.innerText
            Row = Row + 1
'
            Set Tables = Div.getElementsByTagName("table")
            Set Table = Tables(0)
            Set TRs = Table.getElementsByTagName("tr")
'
            For Each TR In TRs
                Column = 1
'
                Set TDs = TR.getElementsByTagName("th")
'
                For Each TD In TDs
                    ws.Cells(Row, Column).Value = TD.innerText
                    ws.Cells(Row, Column).Font.Bold = True
'
                    If TD.getAttribute("colspan") Then
                        Column = Column + TD.getAttribute("colspan")
                    Else
                        Column = Column + 1
                    End If
                Next
'
                Set TDs = TR.getElementsByTagName("td")
'
                For Each TD In TDs
                    ws.Cells(Row, Column).Value = TD.innerText
                    Column = Column + 1
                Next
'
                Row = Row + 1
            Next
        End If
'
        Row = Row + 1
    Next
'
    Browser.Quit

That being said, you may want look into temporarily loading your scraped data into an array, and when all data is scraped, you could dump the array data to the sheet.

Since you can't post your work URL that you use, possibly post an alternate URL?
How would I load this into an array?
 

Forum statistics

Threads
1,136,954
Messages
5,678,759
Members
419,782
Latest member
gc75150

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top