Acquire text from multiple webpages

mdstrobel

New Member
Joined
Sep 19, 2011
Messages
2
I have about 1,500 webpages from which I must acquire text. When I view the source code of the website, I learn that the text is always in this item.

<TABLE width=100% class=normal><TR><TD><span class="newstitle">Text to be Acquired</span><br/>

I am not familiar with the html syntax so took a guess at what I needed to copy into this post. I'm interested in the "Text to be Acquired" item.

My Excel file has all 1,500 URLs listed in Col A starting at A2. I would like to simply create VBA code to go out to each URL, grab the text, and return it to the spreadsheet in Col B adjacent to the URL.

Any ideas?

Excel2007
WindowsXP Prof SP3
 

Excel Facts

Using Function Arguments with nested formulas
If writing INDEX in Func. Arguments, type MATCH(. Use the mouse to click inside MATCH in the formula bar. Dialog switches to MATCH.
Sure

Every site is a bit different, but I think I will be able to program in the differences if I can learn how to search the HTML source.

Below are a couple of sites and the HTML code that I am trying to extract.

http://ruhlman.com/2011/02/gluten-free-multigrain-bread-recip/
<h1 class="entry-title">Gluten-Free Multigrain Bread</h1>

http://www.glutenfreeclub.com/Recipe.aspx?nid=195&utm_nooverride=1
<span class="newstitle">Beef Stew</span>

There are several other domains included, but these will give you an example of what I am looking for.

After I tackle that issue, I'm also going to see if I can extract the photo from the html source as well. If you have any insights on that, your thoughts would be appreciated.

Thanks in advance for your willingness to help.

 
Upvote 0
Hi, :)

an example with 'getElementById'.

Code:
Option Explicit
Sub Main()
    Dim objIEDocument As Object
    Dim objResult As Object
    Dim objIEApp As Object
    On Error GoTo Fin
    Set objIEApp = CreateObject("InternetExplorer.Application")
    With objIEApp
        .Visible = False ' True
        .Navigate "http://ruhlman.com/2011/02/gluten-free-multigrain-bread-recip/"
        Do: Loop Until .Busy = False
        Do: Loop Until .Busy = False
        Set objIEDocument = .Document
        With .Document
            Do: Loop Until .ReadyState = "complete"
            Set objResult = .getElementById("content")
            If Not objResult Is Nothing Then
                MsgBox objResult.All(1).InnerText
            End If
        End With
    End With
Fin:
    If Not objIEApp Is Nothing Then objIEApp.Quit
    Set objIEDocument = Nothing
    Set objIEApp = Nothing
    If Err.Number <> 0 Then MsgBox "Fehler: " & _
        Err.Number & " " & Err.Description
End Sub
Sub Main_1()
    Dim objIEDocument As Object
    Dim objResult As Object
    Dim objIEApp As Object
    On Error GoTo Fin
    Set objIEApp = CreateObject("InternetExplorer.Application")
    With objIEApp
        .Visible = False ' True
        .Navigate "http://www.glutenfreeclub.com/Recipe.aspx?nid=195&utm_nooverride=1"
        Do: Loop Until .Busy = False
        Do: Loop Until .Busy = False
        Set objIEDocument = .Document
        With .Document
            Do: Loop Until .ReadyState = "complete"
            Set objResult = .getElementById("dnn_ctr530_ModuleContent")
            If Not objResult Is Nothing Then
                MsgBox objResult.All(11).InnerText
            End If
        End With
    End With
Fin:
    If Not objIEApp Is Nothing Then objIEApp.Quit
    Set objIEDocument = Nothing
    Set objIEApp = Nothing
    If Err.Number <> 0 Then MsgBox "Fehler: " & _
        Err.Number & " " & Err.Description
End Sub
 
Upvote 0

Forum statistics

Threads
1,224,514
Messages
6,179,220
Members
452,895
Latest member
BILLING GUY

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top