VBA Parsing for values from a list of websites

kmpoaquests

New Member
Joined
Sep 26, 2011
Messages
33
I'd like to refer to this post for a better understanding of what I'm trying to do

http://www.mrexcel.com/forum/excel-...sing-webpage-specific-value-between-tags.html


I need help trying to extract specific values between tags from a list of websites.
The particular values I need to extract are :

[h=1]< dd> </dd>[/h]< h1> </h1 >

<dd></dd>

Yahoo contributor website example is here MD Lynn's Contributor Profile - Yahoo! Contributor Network - contributor.yahoo.com

Ideally Column A would contain the hyperlinks pointing to each website I need values scraped from
Column B,C,D,E would then contain the values scraped from yahoo contributor website such as page views , fans, intersets, experiences, education, etc

If anyone can help me I'm greatly in your debt

thank you
 
Option Base 1
Sub YahooContrib()
'
' Created by ukmikeb (MrExcel forum) on 14/05/2013
'
Dim IE As InternetExplorer
Dim Header As String, my_URL As String
Dim Rw As Long
Dim PtrCntnt, PtrCntrib, PtrFans
Dim lg4, lg5, lg6, lg7, lg8


Header = Array("URL", "Author", "General Background", "Page views", "Fans", "Contrib Since", "Education", "Affiliations", "Motto", "Interests")
Columns("A").ColumnWidth = 30
Columns("B").ColumnWidth = 30
Columns("C").ColumnWidth = 60
Columns("D").ColumnWidth = 12
Columns("E").ColumnWidth = 10
Columns("F").ColumnWidth = 10
Columns("G").ColumnWidth = 15
Columns("H").ColumnWidth = 30
Columns("I").ColumnWidth = 30
Columns("J").ColumnWidth = 30
Columns("K").ColumnWidth = 50


Range("A1:K1") = Header
With Range("A1:K1")
.Font.Bold = True
.HorizontalAlignment = xlCenter
End With

Set IE = New InternetExplorer
Rw = 2


Do While Not IsEmpty(Range("A" & Rw))
my_URL = Range("A" & Rw).Value
' IE.Visible = True 'Some things don't work unless it's visible


' Navigate the web to Race Result oage for race-id


IE.navigate my_URL

'Loop unitl ie page is fully loaded
Do Until IE.readyState = READYSTATE_COMPLETE
DoEvents
Loop

Do Until IE.document.readyState = "complete"
DoEvents
Loop


' Set document object
Set HTMLdocMain = IE.document

Range("B" & Rw) = HTMLdocMain.getElementsByTagName("h1")(0).innerText ' Author

Range("C" & Rw) = HTMLdocMain.getElementsByClassName("sec_col")(0).innerText ' General Background

lg4 = HTMLdocMain.getElementsByClassName("stats")(0).innerText ' Page views etc
PtrCntnt = InStr(lg4, "Content")
PtrFans = InStr(lg4, "Fans")
PtrCntrib = InStr(lg4, "Contributor")
Range("D" & Rw) = Mid(lg4, 11, PtrCntnt - 11)
Range("E" & Rw) = Mid(lg4, PtrCntnt + 7, PtrFans - (PtrCntnt + 7))
Range("F" & Rw) = Mid(lg4, PtrFans + 4, PtrCntrib - (PtrFans + 4))

Range("G" & Rw) = Right(lg4, Len(lg4) - (PtrCntrib + 16))

lg5 = HTMLdocMain.getElementsByClassName("prim_col_sect")(1).innerText ' Education/Experience
Range("H" & Rw) = Right(lg5, Len(lg5) - 20)

lg6 = HTMLdocMain.getElementsByClassName("prim_col_sect")(4).innerText ' Affiliations
Range("I" & Rw) = Right(lg6, Len(lg6) - 12)

lg7 = HTMLdocMain.getElementsByClassName("prim_col_sect")(3).innerText ' Motto
Range("J" & Rw) = Right(lg7, Len(lg7) - 5)

lg8 = HTMLdocMain.getElementsByClassName("prim_col_sect")(2).innerText 'Interests
Range("K" & Rw) = Right(lg8, Len(lg8) - 9)


Rw = Rw + 1
Loop
End Sub

run time error 13
type mismatch

references enabled

bb6bd380d6ce1e5a705acc6505e86e13.png
 
Upvote 0

Excel Facts

How to create a cell-sized chart?
Tiny charts, called Sparklines, were added to Excel 2010. Look for Sparklines on the Insert tab.
Probably my bad with this line (confirm it's line 83 in the code) :-
Code:
     Range("K" & Rw) = Right(lg8, Len(lg8) - 12)
which should be :-
Code:
     Range("K" & Rw) = Right(lg8, Len(lg8) - 9)

If it is not the line I have corrected above, could you let me know with the text of the line.

I am sorry I do not possess clairvoyant qualities so "Run time error 13 - Type mismatch" does not help me find the cause of the error. You should find that the line causing the error is highlighted in yellow.

ps You haven't amended the Header line as per my previous post.
 
Upvote 0
Hi

Column A should contain the URL like this :-
"http://contributor.yahoo.com/user/11173/md_lynn.html"

without the quotes of course.

hth
 
Upvote 0
Hi

Here are the results up to column G :
Excel Workbook
ABCDEFG
1URLAuthorGeneral BackgroundPage viewsContentFansContrib Since
2http://contributor.yahoo.com/user/11173/md_lynn.htmlMD LynnMD Lynn MD is an author & freelance writer, with a penchant for fiction, creative nonfiction and topics that pique her passion: alternative medicine, animals & pets, love & relationships, and her all-time favorite: food! She loves gourmet dining, gourmet cooking, comfort foods, sushi, seafood and dinner parties where she s both a guest and cook. She currently lives in the suburbs of the Texas Gulf Coast, almost exactly halfway between Galveston and Houston and enjoys the best both cities and surrounding areas of Texas have to offer. In particular, she loves Kemah s Boardwalk, ****inson s shrimp and fish boats, League City s historic district and parks where she lives and loves and cooks gourmet meals for unique family--two almost-baked children (25-girl, 18-boy) and her favorite furry child, Jake the Dog, and her extended family of choice. ... Read More 4,696,92348955627/05/2006
kampoquest
Excel 2007

It was caused by the incorrect definition for Header - apologies.

The following replaces the "Dim Header as String..etc" :-
Code:
Dim Header()
Dim my_URL As String

Add the following :-
Code:
IE.Quit
Set IE = Nothing


before the "End Sub" statement to close the Internet Explorer before exiting the module.

hth
 
Upvote 0
SOMEONE give this man the gold medal for best excel problem solver ever

It works flawlessly and even hides the Internet explorer instance which is even more awesome
 
Upvote 0

Forum statistics

Threads
1,214,929
Messages
6,122,314
Members
449,081
Latest member
tanurai

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top