Excel vba - extracting data from webpage using msxml2.xmlhttp

nkaggarwal1

New Member
Joined
Dec 9, 2018
Messages
14
Hi , The below link helped me to workout a old problem i was facing of copying picture from website to excel.

https://www.mrexcel.com/forum/excel-...namic+url+cell

Need help again.

The Code and modules which was given was very fast , it processed 500 entries in 20-25 seconds . i had another requirement of extracting data from webpage and i tried the below code and it takes around 3-4 seconds for one record , how can i change this to module format USING MSXML2.XMLHTTP

First column of Asin9 array contains my 100 asin's for which i am trying to nextract a particular data from amazon.in site.

Code-
====
Sub Button1_Click()
Dim StartTime As Double
Dim SecondsElapsed As Double
StartTime = Timer
For i = 1 to 100
On Error Resume Next
Dim IE As New InternetExplorer
IE.Visible = False
IE.navigate "https://www.amazon.in/dp/" & Range("Asin9")(i, 1).Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Range("Asin9")(i, 13).Value = Doc.getElementById("imgTagWrapperId").innerHTML
Next
SecondsElapsed = Round(Timer - StartTime, 2)
MsgBox "Refresh Completed in" & SecondsElapsed & " seconds", vbInformation
End Sub


Kindly advise!!

Thanks,

Nishant.
 

Excel Facts

Can a formula spear through sheets?
Use =SUM(January:December!E7) to sum E7 on all of the sheets from January through December
I haven't been able to fully test the following code, since I don't have the specific values contained in the first column of your data. Nevertheless, see if it provides you with the desired result...

Code:
Option Explicit

Sub Button1_Click()


    Dim xmlReq As Object
    Dim htmlDoc As Object
    Dim htmlElement As Object
    Dim url As String
    Dim i As Long
    
    On Error GoTo errHandler
    
    Set xmlReq = CreateObject("MSXML2.XMLHTTP")
    Set htmlDoc = CreateObject("HTMLFile")
    
    For i = 1 To Range("Asin9").Rows.Count
        If Len(Range("Asin9")(i, 1)) > 0 Then
            url = "https://www.amazon.in/dp/" & Range("Asin9")(i, 1).Value
            With xmlReq
                .Open "GET", url, False
                .send
            End With
            If xmlReq.Status <> 200 Then
                Range("Asin9")(i, 13).Value = "Error " & xmlReq.Status & ":  " & xmlReq.statusText
            Else
                htmlDoc.body.innerHTML = xmlReq.responseText
                Set htmlElement = htmlDoc.getElementById("imgTagWrapperId")
                If htmlElement Is Nothing Then
                    Range("Asin9")(i, 13).Value = "Item not found"
                Else
                    Range("Asin9")(i, 13).Value = htmlElement.innerHTML
                End If
            End If
        End If
    Next i
        
exitHandler:
    Set xmlReq = Nothing
    Set htmlDoc = Nothing
    Exit Sub
    
errHandler:
    MsgBox "Error " & Err.Number & ": " & Err.Description, vbCritical, "Error"
    Resume exitHandler


End Sub

Hope this helps!
 
Upvote 0
Hi Domenic , Thanks a lot , This worked fine and the speed has improved from 3-4 seconds to ~2.5 seconds but i could see one issue , when i run this for multiple entries 5/10/100/500 , the entries in between randomly misses out . so for example it processes around for 300 out of 500 or 70 out of 100 . I understand this can be because of network speed or my system CPU I/O but i get confused when i run the one for images as it never gives any misses. That is also on XMLHTTP

I made few changes in code which you gave to just gather timinings and few other details.

Sample entry -
B077PW9V3J

Sub Button1_Click()
Dim xmlReq As Object
Dim htmlDoc As Object
Dim htmlElement As Object
Dim htmlElement1 As Object
Dim url As String
Dim i As Long
Application.ScreenUpdating = False
Dim StartTime As Double
Dim SecondsElapsed As Double
Dim rRange As Range
Dim rCell As Range
StartTime = Timer
'On Error GoTo errHandler
Set ws = Worksheets("Sheet1")

Set xmlReq = CreateObject("MSXML2.XMLHTTP")
Set htmlDoc = CreateObject("HTMLFile")

With ws
LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
Set rRange = .Range("A5:A" & LastRow)
End With

For Each rCell In rRange
If Len(rCell) > 0 Then
url = "https://www.amazon.in/dp/" & rCell.Value
With xmlReq
.Open "GET", url, False
.send
End With
If xmlReq.Status <> 200 Then
rCell.Offset(, 2).Value = "Error " & xmlReq.Status & ": " & xmlReq.statusText
Else
htmlDoc.body.innerHTML = xmlReq.responseText
Set htmlElement = htmlDoc.getElementById("imgTagWrapperId")
Set htmlElement1 = htmlDoc.getElementById("productTitle")
If htmlElement Is Nothing Then
rCell.Offset(, 2).Value = "Item not found"
Else
rCell.Offset(, 2).Value = htmlElement.innerHTML
End If
If htmlElement1 Is Nothing Then
rCell.Offset(, 1).Value = "Item not found"
Else
rCell.Offset(, 1).Value = htmlElement1.innerText
End If
End If
End If
Next rCell
Application.ScreenUpdating = True
SecondsElapsed = Round(Timer - StartTime, 2)
MsgBox "Refresh Completed in" & SecondsElapsed & " seconds", vbInformation

exitHandler:
Set xmlReq = Nothing
Set htmlDoc = Nothing
Exit Sub

errHandler:
MsgBox "Error " & Err.Number & ": " & Err.Description, vbCritical, "Error"
Resume exitHandler

End Sub
 
Upvote 0
When you say that "the entries in between randomly misses out" what do you mean exactly? Do you get "Error 404: Not Found", or maybe "Item not found", or something else?
 
Upvote 0
Hi Dominic, it says item not found, i made the same entry in all 500 rows and it gave data for randomly around 250-300 entries and for other it gave item not found.
 
Upvote 0
That means that the page source for the entry does not contain an element whose ID is imgTagWrapperId. If you manually check the page source for the entry, do you see an element with that ID?
 
Upvote 0
Hi Domenic, thanks, actually that should not be the case as I copied same entry in all 500 fields, while processing the script it gave item not found for many same entries randomly. I feel that's the system cpu issues but I never got that while processing the script for getting the images.
 
Upvote 0
Can you provide us with a sample entry that returns "Item not found" ?
 
Upvote 0
It's probably because the website contains dynamic content. So, when a response is received, the content hasn't fully loaded. Therefore, I would suggest that you stick with using Internet Explorer.
 
Last edited:
Upvote 0

Forum statistics

Threads
1,213,504
Messages
6,114,016
Members
448,543
Latest member
MartinLarkin

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top