Extracting tables from web site using vba

viktor4e

New Member
Joined
Jun 30, 2014
Messages
16
Hi everyone,

I am interested in extracting the 2 "Longevity" and "Sillage" tables from the link below in excel using VBA:

Burberry Brit for Men Burberry cologne - a fragrance for men 2004


Currently, I am able to do so using the following code:

Longevity = doc.getElementsByTagName("tbody")(0).innerText
Sillage = doc.getElementsByTagName("tbody")(2).innerText

The issue is that these are extracted in a single cell along with the description words. Is it possible to get each value (numbers only) in a separate cell?

Much appreciated in advance!
 
Apparently the web pages are inconsistent in their number of paragraphs, so using a static paragraph index will produce inconsistent results. The following "fix" assumes the "I have it" paragraph to be between 3 and 8...

Code:
If Not rating Is Nothing Then
    Extract = doc.getElementsByClassName("effect6")(0).PreviousSibling.getElementsByTagName("span")(0).innerText
    Extract2 = doc.getElementsByClassName("effect6")(0).PreviousSibling.getElementsByTagName("span")(2).innerText
    Extract3 = doc.getElementsByTagName("tbody")(0).innerText
    Extract4 = doc.getElementsByTagName("tbody")(2).innerText
    [COLOR=#ff0000]For j = 3 To 8
        Extract5 = doc.getElementsByTagName("p")(j).innerText
        If Left(Extract5, 9) = "I have it" Then Exit For
    Next j[/COLOR]
    pname = .getElementById("col1").getElementsByTagName("h1")(0).innerText
    Cells(I, 1).Value = Extract
    Cells(I, 2).Value = Extract2
    Cells(I, 3).Value = Extract3
    Cells(I, 4).Value = Extract4
    Cells(I, 5).Value = pname
    Cells(I, 6).Value = Extract5
End If

Probably not the most effecient. If you can find a way to identify a paragraph by its beginning content that might be quicker.

Oh, be sure to add "Dim j as Integer" at the top of the code.
 
Last edited:
Upvote 0

Excel Facts

Bring active cell back into view
Start at A1 and select to A9999 while writing a formula, you can't see A1 anymore. Press Ctrl+Backspace to bring active cell into view.
Tonyyy,

What I meant is that the following code (Extract5 = doc.getElementsByTagName("p")(5).innerText) results in something else for this link:
The One for Men Dolce&Gabbana cologne - a fragrance for men 2008

So, instead of extracting this data (I have it: 4222 I had it: 1503 I want it: 1823 My signature: 166) it extracts the text below (After launching more than successful fragrance for women, The One, Dolce&Gabbana house will, at the beginning of March 2008, launch a fragrance for men named The One for Men.)

Can this be fixed somehow?


Try this:

Code:
Sub Ombir_21Dec2016()
Dim j           As Long
Dim rawtext     As Variant
Dim ie          As InternetExplorer
Dim doc         As HTMLDocument
Dim pname       As String
Dim reviews     As Object
Dim output      As Variant

Set ie = New InternetExplorer

With ie
    .Visible = True
    .Navigate "http://www.fragrantica.com/perfume/Dolce-Gabbana/The-One-for-Men-2056.html"
    Do While .Busy Or .ReadyState <> 4: DoEvents: Loop
End With

Set doc = ie.Document
Set reviews = doc.getElementById("mainpicbox").getElementsByTagName("p")

rawtext = Split(reviews(reviews.Length - 1).innerText)

ReDim output(1 To 1, 1 To 5)

output(1, 1) = doc.getElementById("col1").getElementsByTagName("h1")(0).innerText

j = 1
For Each ele In rawtext
    review = review & " " & ele
    If ele Like "[0-9]*" Then
        j = j + 1
        output(1, j) = Mid(review, 2)
        review = ""
    End If
Next
ie.Quit
Range("A1").Resize(, UBound(output, 2)) = output
ActiveSheet.UsedRange.Columns.AutoFit
End Sub
 
Upvote 0

Forum statistics

Threads
1,215,523
Messages
6,125,323
Members
449,218
Latest member
Excel Master

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top