Source-Code - an old fashioned way/Some help please. 2 Source Code Scrape methods - want to combine

Spyros13

Board Regular
Joined
Mar 12, 2014
Messages
175
Office Version
  1. 365
  2. 2016
Platform
  1. Windows
I need/want to get the red bit of this macro


Sub GH() Dim i As Long With CreateObject("MSXML2.XMLHTTP") .Open "GET", "http://www.google.com", False .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Firefox/24.0" .send For i = 1 To Len(.responseText) Step 1023 Cells(i \ 1023 + 1, "A").Value = "'" & Mid(.responseText, i, 1023) Next End WithEnd Sub

'Into' the blue bit of this macro:

Option Explicit
Private Const INTERNET_FLAG_NO_CACHE_WRITE = &H4000000
Private Declare Function InternetOpen Lib "Wininet.dll" Alias "InternetOpenA" (ByVal lpszAgent As String, ByVal dwAccessType As Long, ByVal lpszProxyName As String, ByVal lpszProxyBypass As String, ByVal dwFlags As Long) As Long
Private Declare Function InternetReadFile Lib "Wininet.dll" (ByVal hFile As Long, ByVal sBuffer As String, ByVal lNumBytesToRead As Long, lNumberOfBytesRead As Long) As Integer
Private Declare Function InternetOpenUrl Lib "Wininet.dll" Alias "InternetOpenUrlA" (ByVal hInternetSession As Long, ByVal sUrl As String, ByVal sHeaders As String, ByVal lHeadersLength As Long, ByVal lFlags As Long, ByVal lContext As Long) As Long
Private Declare Function InternetCloseHandle Lib "Wininet.dll" (ByVal hInet As Long) As Integer


Public Sub GetWebPageData()


Dim hInternet, hSession, lngDataReturned As Long
Dim iReadFileResult As Integer
Dim sBuffer As String * 64
Dim sTotalData As String
Dim sUrl As String
Dim sLine As String


sUrl = "http://www.engadget.com/" 'Long Website here
hSession = InternetOpen("", 0, vbNullString, vbNullString, 0)


If hSession Then hInternet = InternetOpenUrl(hSession, sUrl, vbNullString, 0, INTERNET_FLAG_NO_CACHE_WRITE, 0)


If hInternet Then
iReadFileResult = InternetReadFile(hInternet, sBuffer, 128, lngDataReturned)


sTotalData = sBuffer


Do While lngDataReturned <> 0
iReadFileResult = InternetReadFile(hInternet, sBuffer, 128, lngDataReturned)
sTotalData = sTotalData + Mid(sBuffer, 1, lngDataReturned)
Loop
End If


iReadFileResult = InternetCloseHandle(hInternet)


'WEBPAGE loaded into sTotalData
Cells(2, 2) = sTotalData
End Sub

Both Macro's work perfectly well. But I cant get the 1st one to work on my excelvba.

They extract Source-Code from a website into Excel.

The first one does it across as many cells as it takes to get the full source code pasted into excel.

The second one does it all into 1 cell, and so has the draw back that it cant get all the data into 1 cell.

The first one, I cant use (or seem to use). No matter how many Tools>References I enable. It says Access Denied on .Send

The second one, I can use. It's a joy to see source code tangibly pasted into a cell, but due to the memory limits of cells, it obviously cant paste all the source code of the page I need.

I like the second one. Its better imo. easier to understand, more close to home in a hands-on way. I could understand it better ... and no opening of IE required! and no malarki in terms of Java Script or XML/XMLCCMS PHP malarki which I hate!!

Im so old school.

I believe you can beat the computer with old fashinoned logical down to earth thinking.

Am I strange!?!?

Anyway, I truelly believe if we can re-jig the 2nd macro to do a

For i = 1 To Len(.responseText) Step 1023 Cells(i \ 1023 + 1, "A").Value = "'" & Mid(.responseText, i, 1023)
Next

It would work the same and have the same functionality as the 1st smaller macro.
 
So use the For Next loop of the first, changing ".responseText" to "sTotalData".

PS use CODE tags when posting VBA code.

John_W - Ill try your suggestion tomorrow.

Im shattered.

Thank yYOU!!!!!!
 
Upvote 0

Excel Facts

Format cells as currency
Select range and press Ctrl+Shift+4 to format cells as currency. (Shift 4 is the $ sign).
Any ideas how I could run the first macro with chrome?

I mean (and this is why I ask), because it doesnt return source code such as <label for="required-city">City:</label> <input class="validate[required] is-required" id="required-city" maxlength="100" name="required-city" placeholder="City" />, which shows up when you select view source. Instead it returns source code which has Mozilla specific elements and not totally true Sc. I get stuff like this instead which doesnt appear in my view source page:

.kprb:active{background-color:#b0281a;background-image:-moz-linear-gradient(top,#dd4b39,#b0281a);background-image:linear-gradient(top,#dd4b39,#b0281a)}.kpgb{background-color:#3d9400;background-image:-moz-linear-

Which is not in my chrome source code (or the proper source code I need). It looks like it downloaded formatting source code, and mozilla browser specific SC, but not source code "proper".

Anyone understand?

For instance, in view source page, I have things like, <label for="required-first">First Name:</label> <input class="validate[required] is-required" id="required-first" maxlength="100" name="required-first" placeholder="First Name"

but I do not even receive the 10 "input class=" 's from this gotten source code.

Sorry for this, its mostly for my notes only, so I can move on from here to the solution later ....



 
Upvote 0
You aren't making a lot of sense unfortunately ;)

If something isn't in source code, but is visible on the page, then it has been created by JavaScript. As such you've got a couple of options, you either need to:

Run the page in Internet Explorer to allow it to run the relevant javascript and generate the markup you want that you can then interact with
Or
Work out where the data is actually coming from and parse it directly from there
 
Upvote 0
You aren't making a lot of sense unfortunately ;)

If something isn't in source code, but is visible on the page, then it has been created by JavaScript. As such you've got a couple of options, you either need to:

Run the page in Internet Explorer to allow it to run the relevant javascript and generate the markup you want that you can then interact with
Or
Work out where the data is actually coming from and parse it directly from there

Thanks Kyle,

It's starting to make sense now. As to why IE versions of Macros are preferred for the download of webpage (& Javascript) source code

I've come accross this macro, which doesnt work for me, but is infamous on stackoverflow, which does seem to download the full source code:


[FONT=Consolas, Menlo, Monaco, Lucida Console, Liberation Mono, DejaVu Sans Mono, Bitstream Vera Sans Mono, Courier New, monospace, serif]
Rich (BB code):
Rich (BB code):
[/FONT]Function GetSource(url As String) As String
<code style="margin: 0px; padding: 0px; border: 0px; vertical-align: baseline; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; white-space: inherit; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">With CreateObject("MSXML2.XMLHTTP")    .Open "GET", url    .Send    Do: DoEvents: Loop Until .Readystate = 4    GetSource = .responsetext    .abortEnd WithEnd Function
</code></pre>It seems to create a function to get the source code from a website in a cell, but the guy doesnt even run or recognised to run.

I insert new module, paste it, highlight and run. But nothing. I get the 'open macro' window instead with all my macros to run (including GH and GetWebPageData() ) but not this macro.

by the way, I find that "M
SXML2.XMLHTTP" macros do not work for me at all.
 
Upvote 0
So do I need to append this code on-top of that piece code to get the function working successfully?

Code:
[FONT=Consolas]<code class="vb keyword" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-weight: bold !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: rgb(0, 102, 153) !important; background: none !important;">Function</code> <code class="vb plain" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: black !important; background: none !important;">GetMSXML() </code><code class="vb keyword" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-weight: bold !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: rgb(0, 102, 153) !important; background: none !important;">As</code> <code class="vb keyword" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-weight: bold !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: rgb(0, 102, 153) !important; background: none !important;">Object</code>  <code class="vb comments" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: rgb(0, 130, 0) !important; background: none !important;">'  MSXML2.XMLHTTP60</code>[/FONT]
[FONT=Consolas]<code class="vb spaces" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; background: none !important;"> </code><code class="vb keyword" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-weight: bold !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: rgb(0, 102, 153) !important; background: none !important;">On</code> <code class="vb keyword" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-weight: bold !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: rgb(0, 102, 153) !important; background: none !important;">Error</code> <code class="vb keyword" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-weight: bold !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: rgb(0, 102, 153) !important; background: none !important;">Resume</code> <code class="vb keyword" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-weight: bold !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: rgb(0, 102, 153) !important; background: none !important;">Next</code>[/FONT]
[FONT=Consolas]<code class="vb spaces" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; background: none !important;">  </code><code class="vb keyword" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-weight: bold !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: rgb(0, 102, 153) !important; background: none !important;">Set</code> <code class="vb plain" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: black !important; background: none !important;">GetMSXML = CreateObject(</code><code class="vb string" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: blue !important; background: none !important;">"MSXML2.XMLHTTP.6.0"</code><code class="vb plain" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: black !important; background: none !important;">)</code>[/FONT]
[FONT=Consolas]<code class="vb keyword" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-weight: bold !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: rgb(0, 102, 153) !important; background: none !important;">End</code> <code class="vb keyword" style="border-top-left-radius: 0px !important; border-top-right-radius: 0px !important; border-bottom-right-radius: 0px !important; border-bottom-left-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; line-height: 1.1em !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace !important; font-weight: bold !important; font-size: 1em !important; direction: ltr !important; -webkit-box-shadow: none !important; box-shadow: none !important; display: inline !important; color: rgb(0, 102, 153) !important; background: none !important;">Function

I found that here :- http://www.jpsoftwaretech.com/screen-scraping-101-with-vba/ ---> http://www.jpsoftwaretech.com/vba/msxml-object-library-routines/#standard</code>[/FONT]
 
Upvote 0
You need to call the Function from a Sub, assigning its result to a variable. You may also be able to use it on a worksheet like this:

=GetSource(A1)

if the URL's source code isn't too long.
 
Upvote 0
All of which is a complete waste of time if the elements are built in javascript
 
Upvote 0

Forum statistics

Threads
1,214,422
Messages
6,119,395
Members
448,891
Latest member
tpierce

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top