Need Help on WEB automation

Sharid

Well-known Member
Joined
Apr 22, 2007
Messages
1,064
Office Version
  1. 2016
Platform
  1. Windows
I need a bit of help on web automation, my whole code only extracts the urls into column A, but I can not workout how to do the others in the remaining columns. appriciate if someone could have a look. I need the bits that are commented out in green in the code

VBA Code:
    '<h3 class="s-item__title s-item__title--has-tags">5" Android 8.1 Unlocked Cheap Mobile Smart Phone Quad Core Dual SIM WiFi 3G GPS</h3>
    '  <div class="s-item__subtitle">2020 NEW YEAR GIFT,1st 50% OFF,UK Stock</div>
    '<span class="SECONDARY_INFO">Brand new</span>
    '<span class="s-item__price">£31.85<span class="DEFAULT"> to </span>£38.58</span>
    '<span class="POSITIVE">FAST &amp; FREE</span>
    '<span class="BOLD">Free postage</span>
    '<span class="BOLD NEGATIVE">205+ sold</span>
    
   
    For Each link In HTMLDoc.getElementsByTagName("a")
            If link.getAttribute("class") = "vip" Then
                Cells(i, 1).Value = link.getAttribute("href")
                i = i + 1
            End If
        Next link
      
 If pageNumber >= 3 Then Exit Do 'the first 3 pages
        
        Set nextPageElement = HTMLDoc.getElementsByClassName("gspr next")(0)
        
        If nextPageElement Is Nothing Then Exit Do
        
        nextPageElement.Click 'next web page
        Do While IE.Busy Or IE.readyState <> 4
            DoEvents
        Loop
        Application.Wait Now + TimeSerial(0, 0, 5)
        Set HTMLDoc = IE.document
        pageNumber = pageNumber + 1
    Loop
IE.Quit
1582467012507.png

1582466943599.png
 

Excel Facts

Using Function Arguments with nested formulas
If writing INDEX in Func. Arguments, type MATCH(. Use the mouse to click inside MATCH in the formula bar. Dialog switches to MATCH.
Hi. It's a bit tricky to follow because you haven't posted the relevant code (i.e., the code for the entire subroutine) - importantly, I don't know how you've dimensioned the HTML element variables, and that will impact on whether or not the whole thing will work.

What I really don't understand is how you're even managing to get the URLs - the code above goes through each fo the code's <a> tags and filters out any links that don't use the class vip. Looking at the screen capture above, the <a> tag's class is s-Item__link.

In any event, there are a few ways of doing what you want. One way would be to:
  1. pull out all the <span> tags with the correseponding class attribute (e.g., "BOLD", "POSITIVE") by using the .getElementsByClassName method like you did in the code to find and click the button.
  2. then just use the .innerText property to get the text.
 
Upvote 0
Firstly thanks for replying Dan_W I've been stuck on this for days

This is my results
1582469034359.png

Full CODE
Search comes from Sheet 2 A1 and B1 , A1 has the ebay type e.g. .co.uk or .hk and B1 has the item e.g Phones the ebay MUST end with /sch/
1582469191807.png

VBA Code:
Private Sub CommandButton5_Click()

' NEW EBAY CODE BUTTON
  Dim IE As Object
    Dim HTMLDoc As Object
    Dim nextPageElement As Object
    Dim div As Object
    Dim link As Object
    Dim url As String
    Dim pageNumber As Long
   Dim i As Long

  ' Takes Url from Sheet2 A1 seach from Keyword from B1 and places IE
url = Sheets("Sheet2").Range("A1").Value & Replace(Worksheets("Sheet2").Range("B1").Value, " ", "+")
    Set IE = CreateObject("InternetExplorer.Application")
    
    With IE
        .Visible = True
        .navigate url
        Do While .Busy Or .readyState <> 4
            DoEvents
        Loop
    End With

    Application.Wait Now + TimeSerial(0, 0, 5)
    Set HTMLDoc = IE.document
    pageNumber = 1
    i = 2
   Do
   
    ' Gets ebay links from the web for products typed in Sheet2 B1
    For Each link In HTMLDoc.getElementsByTagName("a")
            If link.getAttribute("class") = "vip" Then
                Cells(i, 1).Value = link.getAttribute("href")
                i = i + 1
            End If
        Next link

        If pageNumber >= 1 Then Exit Do 'the first 5 pages
        
        Set nextPageElement = HTMLDoc.getElementsByClassName("gspr next")(0)
        
        If nextPageElement Is Nothing Then Exit Do
        
        nextPageElement.Click 'next web page
        Do While IE.Busy Or IE.readyState <> 4
            DoEvents
        Loop
        Application.Wait Now + TimeSerial(0, 0, 5)
        Set HTMLDoc = IE.document
        pageNumber = pageNumber + 1
    Loop
IE.Quit
    Set IE = Nothing
    Set HTMLDoc = Nothing
    Set nextPageElement = Nothing
    Set div = Nothing
    Set link = Nothing

 MsgBox "All Done"

End Sub
 
Upvote 0
Sorry Dann_W I forgot to add this bit.

You are right the on this "Looking at the screen capture above, the <a> tag's class is s-Item__link " But this is in FIREFOX

As I am using IE it is as shown below and I don't know why. Also the HTML I posted above is from FireFox.

As you can see the class is "vip" if I change it to "s-Item__link" I get no results (URLs) I'm totally confused on why it different
1582469716912.png
 
Upvote 0
Thank for the further detail - that was very helpful. Especially your point about how different class names were used in different browsers. So I went into HTML code of the website, and confirmed that was indeed the case class="vip", but I also noticed that "SECONDARY" should be lvsubtitle. I think that the getelementsbyClassName method would be the better approach. Try:
VBA Code:
    For Each lnk In HTMLDoc.getelementsByclassName("vip")
            Cells(i, 1).Value = lnk.href  
            Cells(i, 2).Value = lnk.innertext  
            i = i + 1
    Next lnk

You can access the "SECONDARY" data from HTMLDoc.getelementsByclassName("lvsubtitle") and the pricelist from HTMLDoc.getelementsByclassName("bold"). Once you've converted them into their constituent elements, it is just a matter of getting the .innertext property.

That said, it is slightly trickier than set out above because it appears that the number of SECONDARY elements vary from item to item, so you'll need to think how to structure your scraping. As you can see in the screen capture below, your approach of going through each item and printing the results onto a spreadsheet is going to result in misaligned data.

Hope that helps.

webscraper.PNG
 
Upvote 0
Dan_W

Thanks for helping out, I tested what you did and you are right it is more complicated when there are secondary elements. I came out with this as a work around, which pulls the same data and can split them, as see in the pictures below. Only problem is it puts everything into row 2 due to range and I can not get it to loop.

Could you have another look at this, as i understand this a bit better and IF this works then I can add more to it

VBA Code:
Range("A2").Offset.Value = doc.getElementsByClassName("vip")(0).href ' URL
Range("B2").Offset.Value = doc.getElementsByClassName("lvtitle")(0).innerText ' TITLE
Range("C2").Offset.Value = doc.getElementsByClassName("hotness-signal red")(0).innerText ' SOLD AMOUNT
Range("D2").Offset.Value = doc.getElementsByClassName("prRange")(0).innerText 'CURRENT PRICE
Range("E2").Offset.Value = doc.getElementsByClassName("bfsp")(0).innerText ' SHIPPING TYPE E.G FREE bfsp
Range("F2").Offset.Value = doc.getElementsByClassName("lvsubtitle")(0).innerText 'SUB TITLE
Range("G2").Offset.Value = doc.getElementsByClassName("FnFl fnf-green")(0).innerText

Item on ebay
1582541193398.png


How It is extracted
1582541268916.png
 
Upvote 0
Hi

Could anyone help on this, I'm still a bit stuck
 
Upvote 0
I keep getting this error message and can not work it out, It pull off about 16 records and then the error message.

1582631592177.png


1582631626469.png


Full code

VBA Code:
Private Sub CommandButton1_Click()
'NEW EBAY CODE BUTTON
Dim IE As Object
Dim url As String
Dim pageNumber As Long
Dim div As Object
Dim HTMLDoc As Object
Dim nextPageElement As Object
Dim link As Object
Dim i As Integer

Set IE = CreateObject("InternetExplorer.Application")
    With IE

  .Visible = True
 
        .navigate Sheets("Sheet2").Range("A1").Value & Replace(Worksheets("Sheet2").Range("B1").Value, " ", "+")
 
    Do
    DoEvents
        Loop Until IE.readyState = READYSTATE_COMPLETE
        
Dim doc As HTMLDocument
    Set doc = IE.document
        While IE.readyState <> 4
    Wend
End With

Do

For i = 0 To 500
    Range("A2").Offset(i).Value = doc.getElementsByClassName("vip")(i).href ' URL
    Range("B2").Offset(i).Value = doc.getElementsByClassName("lvtitle")(i).innerText ' TITLE
    Range("C2").Offset(i).Value = doc.getElementsByClassName("hotness-signal red")(i).innerText ' SOLD AMOUNT
    Range("D2").Offset(i).Value = doc.getElementsByClassName("prRange")(i).innerText 'CURRENT PRICE
    Range("E2").Offset(i).Value = doc.getElementsByClassName("bfsp")(i).innerText ' SHIPPING TYPE E.G FREE bfsp
    Range("F2").Offset(i).Value = doc.getElementsByClassName("lvsubtitle")(i).innerText 'SUB TITLE
    Range("G2").Offset(i).Value = doc.getElementsByClassName("FnFl fnf-green")(i).innerText

' Range("B3").Offset(i, 1).Value = doc.getElementById("ResultSetItems").getElementByTagName("h3")(i).innerText
' Range("B3").Offset(i).Value = doc.getElementById("vip").getElementByClassName("h3")(i).innerText

Next i

        If pageNumber >= 1 Then Exit Do 'the first 5 pages
        
        Set nextPageElement = HTMLDoc.getElementsByClassName("gspr next")(0)
        
        If nextPageElement Is Nothing Then Exit Do
        
        nextPageElement.Click 'next web page
        Do While IE.Busy Or IE.readyState <> 4
            DoEvents
        Loop
        Application.Wait Now + TimeSerial(0, 0, 5)
        Set HTMLDoc = IE.document
        pageNumber = pageNumber + 1
    Loop
IE.Quit
    Set IE = Nothing
    Set HTMLDoc = Nothing
    Set nextPageElement = Nothing
    Set div = Nothing
    Set link = Nothing

 MsgBox "All Done"

End Sub

Also
I was using the ZERO in the code to pick the right innertext for a page Range("B2").Offset(i).Value = doc.getElementsByClassName("lvtitle")(0).innerText ' TITLE
I'm sure this will be an issue for that page, can it be done as Range("B2").Offset(i).Value = doc.getElementsByClassName("lvtitle")(i, 0).innerText ' TITLE
 
Upvote 0
By using i integer as both the row offset, and the index for the class items, you are assuming that there are 501 items (your code starts on 0) on the webpage with that class name - the reason it stops at 16 is because there are likely only 16 items that have that class name.

Looking at the code for the web page again, haven't they changed the HTML code? I don't see in the HTML code most (?) of the class names you have in your code. Is it up-to-date?
 
Upvote 0
Thanks Dan_W for coming back on this,

In regards to this, "the reason it stops at 16 is because there are likely only 16 items that have that class name. " this was my suspicion as well and so, I was looking at running the code as an IF statement, so If there is an element then extract it Else place the word "Nil" into the cell. So it will look like the image below. This would stop the data going into the the FIRST BLANK cell and if, there was no more of that element to extract then it would keep placing the word "Nil" in the cell

But I could not get the code to work.
1584279589943.png


As for the HTML the classes will show different in IE than they do in Google Chrome and Firefox, not sure what browser you are using. Also for ebay.co.uk the classes are different than .com

Url Class
1584280161749.png


Title
1584280234562.png


Sold Amount
1584280289049.png


Current Price
1584280345241.png


Condition
1584280441176.png


And So on, These will not be the same in other browser. Can you let me know which ones you think are wrong and I will check.
 
Upvote 0

Forum statistics

Threads
1,214,652
Messages
6,120,746
Members
448,989
Latest member
mariah3

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top