save the source code of a webpage as text file

rutger

Board Regular
Joined
Apr 5, 2005
Messages
74
Hello all,

I have a question, is it possible to save the source code from a webpage as a text file by using vba?
Basically I want to do the same as when i right click in a website and then choose to view the source code.
I am able to save a webpage as a html file, but when i save it as text it messes up the whole format.

I can't seem to figure this out.

Thanks in advance,

Greetz,
Rutger
 
Rutger

The code I posted only writes the body of the HTML code to the text file.

I couldn't quite find a way to write the whole thing, but it's probably there somewhere, I'll have a look later.

By the way did you not see my comment about parsing the HTML?

There are other ways to extract info from a webpage.
 
Upvote 0

Excel Facts

Waterfall charts in Excel?
Office 365 customers have access to Waterfall charts since late 2016. They were added to Excel 2019.
Hey Norie,

I´ve seen your reply, thanks.
I am planning to write the complete html code into textfiles.
I have a script that can read and copy lines from a textfile and write that into an excel sheet if the line has certain values in it (hyperlinks in this case.
After this is done i can easily compare the next text file with this one and see if there are changes.

Hope this makes sence.

Rutger
 
Upvote 0
Rutger

That's the sort of thing I thought you might be doing, and I still think there's a different approach you could take.

For example if you wanted to extract every hyperlink from a web page you could use something like this.
Code:
Sub Test()
Dim lnk
    Set ie = CreateObject("InternetExplorer.Application")
    With ie
        .Visible = True
        .Navigate "http://www.nu.nl/"
        Do Until .ReadyState = 4: DoEvents: Loop
       
        Set doc = ie.document
    
        Open "C:\TestHTML.txt" For Output As #1
            For Each lnk In doc.links
                Print #1, lnk
            Next lnk
        Close #1
    End With
    
End Sub
 
Upvote 0
Norie,

Thanks a lot, this does excactly what i want to do.
The only thing that i had in mind that might have worked with the other way is that i might be able to also filter out pictures corresponding to a hyperlink. But maybe i'll find another way for that.

Thanks again.

Rutger
 
Upvote 0
Rutger

That was only sample code, you could probably extract more information from the link objects. eg if they corresponded to an image.

Check this <a href="http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/dhtml_reference_entry.asp
">link</a>.
 
Upvote 0
Norie,

Thanks, i see i have to dig into this deeper then i planned on!

I'll check it all out and come back to you with what i came up with.

Thanks again,

greetz,

Rutger
 
Upvote 0
Hmmmmmm,

Now i am able to retreive all the hyperlinks from a webpage, but does anyone know if it's possible to loop trough all pages in a whole website and get all the links from there? (so from multiple pages in one website)

Thanks in advance,

Rutger
 
Upvote 0

Forum statistics

Threads
1,215,040
Messages
6,122,806
Members
449,095
Latest member
m_smith_solihull

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top