XML http: Trying to get outertext!

jamescooper

Well-known Member
Joined
Sep 8, 2014
Messages
780
URL in E1 is: https://www.tesco.com/groceries/en-GB/shop/fresh-food/all?include-children=true&page=1&count=48

I am trying to get the outertext, which should for example return:

Tesco British Salted Block Butter 250G↵Write a review↵↵Sorry, this product is currently unavailable↵↵Rest of ↵Block Butter↵shelf

Any ideas please?

Code:
Public Sub Data_Pull_Products_and_Prices_2()


Dim http As Object, html As New HTMLDocument, topics As Object, topics2 As Object, titleElem As Object, topic As HTMLHtmlElement
Dim i As Integer
Dim j As Integer
Dim rngURL As Range
Dim LastRow As Long
Dim LastRow2 As Long


Application.ScreenUpdating = False


Set http = CreateObject("MSXML2.XMLHTTP")


For Each rngURL In Worksheets("Sheet1").Range("E1", Worksheets("Sheet1").Range("E" & Rows.Count).End(xlUp))


http.Open "GET", rngURL, False
http.send
html.body.innerHTML = http.responseText
'Do While http.readyState <> 4
DoEvents
'Loop


Set topics = html.getElementsByClassName("product-details--wrapper")
LastRow = Cells(Rows.Count, "A").End(xlUp).Row
i = LastRow + 1


For Each topic In topics
Set titleElem = topic.getElementsByTagName("div")(0)
Sheets(1).Cells(i, 1).Value = titleElem.getElementsByTagName("a")(0).outerText
LastRow = Cells(Rows.Count, "A").End(xlUp).Row
i = 1 + LastRow


Next


Next


End Sub
 

Excel Facts

How to calculate loan payments in Excel?
Use the PMT function: =PMT(5%/12,60,-25000) is for a $25,000 loan, 5% annual interest, 60 month loan.

strooman

Active Member
Joined
Oct 29, 2013
Messages
314
Try this code and report back. The results goes in columns A, B, C, D.
I commented out the lines
Application.ScreenUpdating = false
Application.ScreenUpdating = true

Code:
Sub Data_Pull_Products_and_Prices_2()
Dim http, topic, topics, htmlAs, htmlA, topics2, titleElem As Object
Dim html As New HTMLDocument
Dim i, j As Integer
Dim rngURL As Range
Dim LastRow, LastRow2 As Long

'Application.ScreenUpdating = False

Set http = CreateObject("MSXML2.XMLHTTP")

For Each rngURL In Worksheets("Sheet1").Range("E1", Worksheets("Sheet1").Range("E" & Rows.Count).End(xlUp))
    http.Open "GET", rngURL, False
    http.send
    html.body.innerHTML = http.responseText
    Do While http.readyState <> 4
        DoEvents
    Loop
        
    Set topics = html.getElementsByClassName("product-details--wrapper")
    LastRow = Cells(Rows.Count, "A").End(xlUp).Row
    i = LastRow + 1
    
    For Each topic In topics
        Set htmlAs = topic.getElementsByTagName("a")
        j = 1
        For Each htmlA In htmlAs
            Sheets(1).Cells(i, j).Value = htmlA.outerText
            j = j + 1
        Next htmlA
        i = i + 1
    Next topic
Next
'Application.ScreenUpdating = true
End Sub

P.s. I'm not a very regular board member so perhaps others can help you too in case I'm not available.
 
Last edited:

jamescooper

Well-known Member
Joined
Sep 8, 2014
Messages
780

ADVERTISEMENT

Thanks for your efforts, that is interesting because for the URL I gave above it doesn't seem to pull the correct data; but it is along the right lines?

So the url is: https://www.tesco.com/groceries/en-GB/shop/fresh-food/all?page=1&count=48

I need to pull, the item, the price, the price per unit/kg and if no price, then return "[FONT=&quot]Sorry, this product is currently unavailable".

All in one pull is ideal.[/FONT]
 

strooman

Active Member
Joined
Oct 29, 2013
Messages
314
OK, I thought you wanted this data like you say in your first post:
"Tesco British Salted Block Butter 250G↵Write a review↵↵Sorry, this product is currently unavailable↵↵Rest of ↵Block Butter↵shelf"

But now I understand from your last post that you want:
- The item (which is the product)
- The Price
- The price per unit/kg
- When there's no price you want to display: "Sorry, this product is currently unavailable".

I changed the code completely. because we have to figure out if Price and UnitPrice is available.
Pay extra attention to the References you have to make.

Also take notice that the classname from the price is not always the same:
"price-per-sellable-unit price-per-sellable-unit--price price-per-sellable-unit--price-per-item"
"price-per-sellable-unit price-per-sellable-unit--price price-per-sellable-unit--price-per-weight"

We have to tackle that in the code.


Code:
Option Explicit

Sub Data_Pull_Products_and_Prices_1()
'****************************************************************************
'Set references to:
'Microsoft Internet Controls
'Microsoft HTML Object Library
'Microsoft XML, v6.0
'Hit Alt+F11 to go to the Visual Basic Editor and then hit:
'Tools | References
'****************************************************************************
Dim XMLPage As New MSXML2.XMLHTTP60
Dim htmlDoc As New MSHTML.HTMLDocument
Dim LastRow
Dim htmlH3s As MSHTML.IHTMLElementCollection
Dim htmlH3 As MSHTML.IHTMLElement
Dim htmlDivs As MSHTML.IHTMLElementCollection
Dim htmlDiv As MSHTML.IHTMLElement
Dim arrItems_1, arrItems_2, arrItems_3 As Variant
Dim strProduct, strPrice, strUnitPrice As String
Dim rngURL As Range

'Create URL and sent request
For Each rngURL In Worksheets("Sheet1").Range("E1", Worksheets("Sheet1").Range("E" & Rows.Count).End(xlUp))
    XMLPage.Open "GET", rngURL, False
    XMLPage.send
    
    'Get the source (code) of the webpage
    htmlDoc.body.innerHTML = XMLPage.responseText
    
    LastRow = Cells(Rows.Count, "A").End(xlUp).Row + 1
    
    'Set reference to all h3-elements in htmlDoc.body.innerHTML
    Set htmlH3s = htmlDoc.getElementsByTagName("h3")
    
    'Loop through all h3-elements to get the product name
    For Each htmlH3 In htmlH3s
        If htmlH3.className = "sc-dnqmqq kMWCPS" Then
            strProduct = strProduct & "|" & htmlH3.innerText
        End If
    Next
    
    'Set reference to all div-elements in htmlDoc.body.innerHTML
    Set htmlDivs = htmlDoc.getElementsByTagName("div")
    
    'Loop through all div-elements to get the Price AND UnitPrice
    For Each htmlDiv In htmlDivs
        
        'Check if there is a Price and a UnitPrice available
        If htmlDiv.className = "product-info-message-section unavailable-messages" Then
            strPrice = strPrice & "|" & "Sorry, this product is currently unavailable"
            strUnitPrice = strUnitPrice & "|" & "Sorry, this product is currently unavailable"
        End If
        
        'If Yes . . . get it.
        If htmlDiv.className = _
            "price-per-sellable-unit price-per-sellable-unit--price price-per-sellable-unit--price-per-item" Or _
            htmlDiv.className = _
            "price-per-sellable-unit price-per-sellable-unit--price price-per-sellable-unit--price-per-weight" Then
            strPrice = strPrice & "|" & htmlDiv.innerText
        End If
        If htmlDiv.className = "price-per-quantity-weight" Then
            strUnitPrice = strUnitPrice & "|" & htmlDiv.innerText
        End If
    Next
    
    'Store all results in an Array
    arrItems_1 = Split(Mid(strProduct, 2), "|")
    arrItems_2 = Split(Mid(strPrice, 2), "|")
    arrItems_3 = Split(Mid(strUnitPrice, 2), "|")
    
    'Insert the results directly into Sheet1
    Cells(LastRow, 1).Resize(UBound(arrItems_1) + 1) = Application.Transpose(arrItems_1)
    Cells(LastRow, 2).Resize(UBound(arrItems_2) + 1) = Application.Transpose(arrItems_2)
    Cells(LastRow, 3).Resize(UBound(arrItems_3) + 1) = Application.Transpose(arrItems_3)
Next
End Sub
 

jamescooper

Well-known Member
Joined
Sep 8, 2014
Messages
780

ADVERTISEMENT

This is fantastic, thanks a lot; where did you learn all that?

I have been learning slowly.
 

strooman

Active Member
Joined
Oct 29, 2013
Messages
314
This is fantastic, thanks a lot; where did you learn all that?

I have been learning slowly.

Thanks for the feedback.

Learning by trial and (a lot of) error, visit forums, read the solutions from the real excel guru's and learn from their advices.
And most of all, like David Bowie always said: "steal from great inspirations"
 

jamescooper

Well-known Member
Joined
Sep 8, 2014
Messages
780
I noticed a slight error in the code, minor but still worth mentioning. The "next" should be about the array section to avoid duplication of the data.

Thanks.

Code:
Option Explicit


Sub Data_Pull_Products_and_Prices_1()
'****************************************************************************
'Set references to:
'Microsoft Internet Controls
'Microsoft HTML Object Library
'Microsoft XML, v6.0
'Hit Alt+F11 to go to the Visual Basic Editor and then hit:
'Tools | References
'****************************************************************************
Dim XMLPage As New MSXML2.XMLHTTP60
Dim htmlDoc As New MSHTML.HTMLDocument
Dim LastRow
Dim htmlH3s As MSHTML.IHTMLElementCollection
Dim htmlH3 As MSHTML.IHTMLElement
Dim htmlDivs As MSHTML.IHTMLElementCollection
Dim htmlDiv As MSHTML.IHTMLElement
Dim arrItems_1, arrItems_2, arrItems_3 As Variant
Dim strProduct, strPrice, strUnitPrice As String
Dim rngURL As Range


'Create URL and sent request
For Each rngURL In Worksheets("Sheet1").Range("D1", Worksheets("Sheet1").Range("D" & Rows.Count).End(xlUp))
    XMLPage.Open "GET", rngURL, False
    XMLPage.send
    DoEvents
    
    'Get the source (code) of the webpage
    htmlDoc.body.innerHTML = XMLPage.responseText
    
    LastRow = Cells(Rows.Count, "A").End(xlUp).Row + 1
    
    'Set reference to all h3-elements in htmlDoc.body.innerHTML
    Set htmlH3s = htmlDoc.getElementsByTagName("h3")
    
    'Loop through all h3-elements to get the product name
    For Each htmlH3 In htmlH3s
        If htmlH3.className = "sc-dnqmqq kMWCPS" Then
            strProduct = strProduct & "|" & htmlH3.innerText
        End If
    Next
    
    'Set reference to all div-elements in htmlDoc.body.innerHTML
    Set htmlDivs = htmlDoc.getElementsByTagName("div")
    
    'Loop through all div-elements to get the Price AND UnitPrice
    For Each htmlDiv In htmlDivs
        
        'Check if there is a Price and a UnitPrice available
        If htmlDiv.className = "product-info-message-section unavailable-messages" Then
            strPrice = strPrice & "|" & "Sorry, this product is currently unavailable"
            strUnitPrice = strUnitPrice & "|" & "Sorry, this product is currently unavailable"
        End If
        
        'If Yes . . . get it.
        If htmlDiv.className = _
            "price-per-sellable-unit price-per-sellable-unit--price price-per-sellable-unit--price-per-item" Or _
            htmlDiv.className = _
            "price-per-sellable-unit price-per-sellable-unit--price price-per-sellable-unit--price-per-weight" Then
            strPrice = strPrice & "|" & htmlDiv.innerText
        End If
        If htmlDiv.className = "price-per-quantity-weight" Then
            strUnitPrice = strUnitPrice & "|" & htmlDiv.innerText
        End If
[COLOR=#b22222]    Next[/COLOR]
[COLOR=#b22222]    [/COLOR]
[COLOR=#b22222]    Next[/COLOR]
    
    'Store all results in an Array
    arrItems_1 = Split(Mid(strProduct, 2), "|")
    arrItems_2 = Split(Mid(strPrice, 2), "|")
    arrItems_3 = Split(Mid(strUnitPrice, 2), "|")
    
    'Insert the results directly into Sheet1
    Cells(LastRow, 1).Resize(UBound(arrItems_1) + 1) = Application.Transpose(arrItems_1)
    Cells(LastRow, 2).Resize(UBound(arrItems_2) + 1) = Application.Transpose(arrItems_2)
    Cells(LastRow, 3).Resize(UBound(arrItems_3) + 1) = Application.Transpose(arrItems_3)


End Sub
 

Watch MrExcel Video

Forum statistics

Threads
1,130,237
Messages
5,641,026
Members
417,188
Latest member
DrTees

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top