satish78

Board Regular
Joined
Aug 31, 2014
Messages
218
Hi Friends,

I tried finding VBA code on MRexcel forum. But did not find suitable one.

A VBA Macro to extract whole webpage data including hyperlinks for example url: https://www.dice.com/company/07960AP

<colgroup><col width="64"></colgroup><tbody>
</tbody>
Not only from this website, I want the macro to extract any website page on internet.
Just data and hyperlinks would work.
Later, I will fetch data according to my requirement on different website pages.
Help me.

Satish
 
Re: Anyway to loop through list of URLS instead of single website

I test again now. Your vba macro is not working well. As I said on previous post and even now.
Can you change my code to loop through.

Here is the code again, see if you can add your data to it, if it works let me know as i might also use it. See the bit in red, change 500 to a larger number to do more url

The bit in green will need changing, the rest of the code will do what you want.

The orange is optional

Code:
Private Sub CommandButton4_Click()

    Dim i, j, k, l As Integer
    i = 2
    k = 2
    l = 2
'SHEET2 as sheet with URL
    Dim wsSheet As Worksheet, Rows As Long, links As Variant, IE As Object, link As Variant
    Set wb = ThisWorkbook
    Set wsSheet = wb.Sheets("Sheet2")

'Set IE = InternetExplorer
    Set IE = CreateObject("InternetExplorer.Application")
    Rows = wsSheet.Cells(wsSheet.Rows.Count, "A").End(xlUp).Row
    links = wsSheet.Range("A1:A" & Rows)

'IE Open Time per page 5sec and check links on Sheet2 Column A
    With IE
        .Visible = True
        Application.Wait (Now + TimeValue("00:00:5"))

        For Each link In links
            .navigate (link)
            While .Busy Or .ReadyState <> 4: DoEvents: Wend

            [COLOR=#ff0000]For i = 1 To 500[/COLOR]
                On Error Resume Next

[COLOR=#008000]'Paste in sheet and column
                Dim rw As Long
                
                rw = Sheet1.Cells(Sheet1.Rows.Count, 1).End(xlUp).Row + 1
                Sheets("Sheet1").Range("A" & rw).Value = IE.Document.getElementsByTagName("a").Item(i).innerText

            Next i[/COLOR]

'Deletes duplicates in column A Sheet1
            Columns(1).RemoveDuplicates Columns:=Array(1)

'navigate links
        Next link


[COLOR=#ff8c00]Set sendAnEmail = IE.Document.getElementsByClassName("a-link-normal pr-email").Item(0)

sendAnEmail.Click

Application.Wait Now + TimeSerial(0, 0, 2)

retrievedEmail = sendAnEmail.innerText
    End With[/COLOR]

'Close IE Browser
    IE.Quit
    Set IE = Nothing

End Sub
 
Upvote 0

Excel Facts

How to create a cell-sized chart?
Tiny charts, called Sparklines, were added to Excel 2010. Look for Sparklines on the Insert tab.
Re: Anyway to loop through list of URLS instead of single website

Can you give more info, on what content you want from the page, you stated you want to "extract whole webpage content"

1) is it the source code, or what is displayed in the browser?
2) if its the browser are you after the text or text and pictures?
3) how do you want the data displaying? which column and cells?

my code pulls the data and places it in to column A in sheet1, but you don't want this so that rules out most of question 2
 
Upvote 0
Re: Anyway to loop through list of URLS instead of single website

I just need plain text no other image/pictures to extract.
Extracted plain text must go into ColumnA cell and wrap it.
so that, in each cell with each website content while looping through urls.
 
Upvote 0
Trying to extract snippet data from google search

Hi Friends,

Trying description/Snippet data as well from google search.
Below code extract only title and url only.
Can it be possible to mention wait seconds after 200 searches.

Sub XMLHTTP()


Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date


lastRow = Range("A" & Rows.Count).End(xlUp).Row


Dim cookie As String
Dim result_cookie As String


start_time = Time
Debug.Print "start_time:" & start_time


For i = 2 To lastRow


url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)


Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send


Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getElementById("rso")
Set objH3 = objResultDiv.getElementsByTagName("H3")(0)
Set link = objH3.getElementsByTagName("a")(0)




str_text = Replace(link.innerHTML, "", "")
str_text = Replace(str_text, "
", "")


Cells(i, 2) = str_text
Cells(i, 3) = link.href
DoEvents
Next


end_time = Time
Debug.Print "end_time:" & end_time

Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
 
Last edited:
Upvote 0
Re: Trying to extract snippet data from google search

I have merged your last three threads together, since they all seem to be dealing with the same question.

Please do not post the same question multiple times. All clarifications, follow-ups, and bumps should be posted back to the original thread. Per forum rules, posts of a duplicate nature will be locked or deleted (rule 12 here: Forum Rules).

If you do not receive a response, you can "bump" it by replying to it again, though we advise you to wait 24 hours before doing and not to bump a thread more than once a day.
 
Upvote 0

Forum statistics

Threads
1,216,100
Messages
6,128,834
Members
449,471
Latest member
lachbee

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top