Excel vba - extracting data from webpage using msxml2.xmlhttp

nkaggarwal1

New Member
Joined
Dec 9, 2018
Messages
14
Hi , The below link helped me to workout a old problem i was facing of copying picture from website to excel.

https://www.mrexcel.com/forum/excel-...namic+url+cell

Need help again.

The Code and modules which was given was very fast , it processed 500 entries in 20-25 seconds . i had another requirement of extracting data from webpage and i tried the below code and it takes around 3-4 seconds for one record , how can i change this to module format USING MSXML2.XMLHTTP

First column of Asin9 array contains my 100 asin's for which i am trying to nextract a particular data from amazon.in site.

Code-
====
Sub Button1_Click()
Dim StartTime As Double
Dim SecondsElapsed As Double
StartTime = Timer
For i = 1 to 100
On Error Resume Next
Dim IE As New InternetExplorer
IE.Visible = False
IE.navigate "https://www.amazon.in/dp/" & Range("Asin9")(i, 1).Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Range("Asin9")(i, 13).Value = Doc.getElementById("imgTagWrapperId").innerHTML
Next
SecondsElapsed = Round(Timer - StartTime, 2)
MsgBox "Refresh Completed in" & SecondsElapsed & " seconds", vbInformation
End Sub


Kindly advise!!

Thanks,

Nishant.
 

Some videos you may like

Excel Facts

Ambidextrous Undo
Undo last command with Ctrl+Z or Alt+Backspace. If you use the Undo icon in the QAT, open the drop-down arrow to undo up to 100 steps.

Domenic

MrExcel MVP
Joined
Mar 10, 2004
Messages
19,298
I haven't been able to fully test the following code, since I don't have the specific values contained in the first column of your data. Nevertheless, see if it provides you with the desired result...

Code:
Option Explicit

Sub Button1_Click()


    Dim xmlReq As Object
    Dim htmlDoc As Object
    Dim htmlElement As Object
    Dim url As String
    Dim i As Long
    
    On Error GoTo errHandler
    
    Set xmlReq = CreateObject("MSXML2.XMLHTTP")
    Set htmlDoc = CreateObject("HTMLFile")
    
    For i = 1 To Range("Asin9").Rows.Count
        If Len(Range("Asin9")(i, 1)) > 0 Then
            url = "https://www.amazon.in/dp/" & Range("Asin9")(i, 1).Value
            With xmlReq
                .Open "GET", url, False
                .send
            End With
            If xmlReq.Status <> 200 Then
                Range("Asin9")(i, 13).Value = "Error " & xmlReq.Status & ":  " & xmlReq.statusText
            Else
                htmlDoc.body.innerHTML = xmlReq.responseText
                Set htmlElement = htmlDoc.getElementById("imgTagWrapperId")
                If htmlElement Is Nothing Then
                    Range("Asin9")(i, 13).Value = "Item not found"
                Else
                    Range("Asin9")(i, 13).Value = htmlElement.innerHTML
                End If
            End If
        End If
    Next i
        
exitHandler:
    Set xmlReq = Nothing
    Set htmlDoc = Nothing
    Exit Sub
    
errHandler:
    MsgBox "Error " & Err.Number & ": " & Err.Description, vbCritical, "Error"
    Resume exitHandler


End Sub

Hope this helps!
 

nkaggarwal1

New Member
Joined
Dec 9, 2018
Messages
14
Hi Domenic , Thanks a lot , This worked fine and the speed has improved from 3-4 seconds to ~2.5 seconds but i could see one issue , when i run this for multiple entries 5/10/100/500 , the entries in between randomly misses out . so for example it processes around for 300 out of 500 or 70 out of 100 . I understand this can be because of network speed or my system CPU I/O but i get confused when i run the one for images as it never gives any misses. That is also on XMLHTTP

I made few changes in code which you gave to just gather timinings and few other details.

Sample entry -
B077PW9V3J

Sub Button1_Click()
Dim xmlReq As Object
Dim htmlDoc As Object
Dim htmlElement As Object
Dim htmlElement1 As Object
Dim url As String
Dim i As Long
Application.ScreenUpdating = False
Dim StartTime As Double
Dim SecondsElapsed As Double
Dim rRange As Range
Dim rCell As Range
StartTime = Timer
'On Error GoTo errHandler
Set ws = Worksheets("Sheet1")

Set xmlReq = CreateObject("MSXML2.XMLHTTP")
Set htmlDoc = CreateObject("HTMLFile")

With ws
LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
Set rRange = .Range("A5:A" & LastRow)
End With

For Each rCell In rRange
If Len(rCell) > 0 Then
url = "https://www.amazon.in/dp/" & rCell.Value
With xmlReq
.Open "GET", url, False
.send
End With
If xmlReq.Status <> 200 Then
rCell.Offset(, 2).Value = "Error " & xmlReq.Status & ": " & xmlReq.statusText
Else
htmlDoc.body.innerHTML = xmlReq.responseText
Set htmlElement = htmlDoc.getElementById("imgTagWrapperId")
Set htmlElement1 = htmlDoc.getElementById("productTitle")
If htmlElement Is Nothing Then
rCell.Offset(, 2).Value = "Item not found"
Else
rCell.Offset(, 2).Value = htmlElement.innerHTML
End If
If htmlElement1 Is Nothing Then
rCell.Offset(, 1).Value = "Item not found"
Else
rCell.Offset(, 1).Value = htmlElement1.innerText
End If
End If
End If
Next rCell
Application.ScreenUpdating = True
SecondsElapsed = Round(Timer - StartTime, 2)
MsgBox "Refresh Completed in" & SecondsElapsed & " seconds", vbInformation

exitHandler:
Set xmlReq = Nothing
Set htmlDoc = Nothing
Exit Sub

errHandler:
MsgBox "Error " & Err.Number & ": " & Err.Description, vbCritical, "Error"
Resume exitHandler

End Sub
 

Domenic

MrExcel MVP
Joined
Mar 10, 2004
Messages
19,298
When you say that "the entries in between randomly misses out" what do you mean exactly? Do you get "Error 404: Not Found", or maybe "Item not found", or something else?
 

nkaggarwal1

New Member
Joined
Dec 9, 2018
Messages
14

ADVERTISEMENT

Hi Dominic, it says item not found, i made the same entry in all 500 rows and it gave data for randomly around 250-300 entries and for other it gave item not found.
 

Domenic

MrExcel MVP
Joined
Mar 10, 2004
Messages
19,298
That means that the page source for the entry does not contain an element whose ID is imgTagWrapperId. If you manually check the page source for the entry, do you see an element with that ID?
 

nkaggarwal1

New Member
Joined
Dec 9, 2018
Messages
14

ADVERTISEMENT

Hi Domenic, thanks, actually that should not be the case as I copied same entry in all 500 fields, while processing the script it gave item not found for many same entries randomly. I feel that's the system cpu issues but I never got that while processing the script for getting the images.
 

Domenic

MrExcel MVP
Joined
Mar 10, 2004
Messages
19,298
Can you provide us with a sample entry that returns "Item not found" ?
 

nkaggarwal1

New Member
Joined
Dec 9, 2018
Messages
14
Sample entry is B077PW9V3J, just put the same entry in 500 rows in column A and try the script
 

Domenic

MrExcel MVP
Joined
Mar 10, 2004
Messages
19,298
It's probably because the website contains dynamic content. So, when a response is received, the content hasn't fully loaded. Therefore, I would suggest that you stick with using Internet Explorer.
 
Last edited:

Watch MrExcel Video

Forum statistics

Threads
1,108,655
Messages
5,524,136
Members
409,562
Latest member
meeranaskar

This Week's Hot Topics

Top