Importing web page text as browser displays it, not source HTML into sheet

Oliver Dewar

Board Regular
Joined
Apr 17, 2011
Messages
201
Hi All.

For an excel app I'm building I need to (effectively) open a url in internet explorer, select all, and paste onto a sheet... or VBA equivalent. However, I'm not after the source code... I'm after the text that the website visitor would see in their browser. If images etc end up coming as well that's fine, I'll strip them out after.

I have the code to navigate to the page and that's working fine.

I can't find a way to copy the text on a web page as the browser displays it and then paste that onto a sheet. I've found many ways to grab the source code... but that's not what I need in this specific instance.

Can anyone help?

For the record, I can easily paste the html using the .document.body.innerHTML method. So I tried copying the source HTML and pasting as unicode text but sometimes it still comes across as HTML.

I tried sendkeys select all and copy then paste... but it didn't seem to work at all. It just doesn't seem to get onto the clipboard.

It doesn't matter if the text ends up in random columns and rows when it pastes as I have a macro to handle that... I just need a helping hand to get the rendered version of the url text as oppose to the source html. Alternatively, I guess a way to paste the source code, or handle the source code so that you end up with the final, readable text only would work too.

(I hope that makes sense... just swear at me and call me names if not and I'll rephrase).

Thanks everyone.
 

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
Hellow, good question !
I would like to do the same but to paste the tables, data, copying from a webpage opened in Chrome (to retrieve data and order them properly).
So, I would be interested to get some tips although there might be some differences as:cool: piloting Chrome by VBA ?
Thanks
 
Upvote 0
Yeah, Rando I think what you're looking for is going to be an entirely different topic mate. Piloting a different browser than Microsoft's built in IE is not straight forward if it's even doable at all. But certainly a different thread.

Anyone have any suggestions about my original question?
 
Upvote 0
Hi Oliver
These few lines of code :-
Code:
        IE.ExecWB 17, 0
        Do Until IE.ReadyState = 4: DoEvents: Loop
        IE.ExecWB 12, 2
        ActiveSheet.PasteSpecial Format:="HTML", link:=False, DisplayAsIcon:=False, NoHTMLFormatting:=True
will copy a web page into a blank sheet, IE being the Internet Explorer object.

If you are after specific data within the page it might be easier to extract just that data rather than going through the process of searching for it and removing the data you are not interested in.

If you would like to share the URL and specify the area of the web page you are interested in it may be possible to show you alternative methods of extracting the data.

hth
 
Last edited:
Upvote 0
If you just want the data from the page you could always try innerText/outerText.
 
Upvote 0
Thanks Guys.

UKMikeB... you nailed it mate. That's fantastic. Working like a charm.

This is for a crawler so it needs to work for any kind of website and your solution achieves this. Of course, it's best to avoid the clipboard where possible... but in this case I think it might be the best 'one-size-fits-all' solution.
 
Upvote 0

Forum statistics

Threads
1,214,806
Messages
6,121,667
Members
449,045
Latest member
Marcus05

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top