Copy table from website into excel with VBA

mcclellan

New Member
Joined
Jan 27, 2009
Messages
4
Hello there,

I love this website.

I am trying to create a macro that will copy a table from a website and paste it into excel.

How do I do this and how do I identify the table name?

Any help is appreciated.

Michael
 
Wow, Guten Morgen Alle (Morning all)
. Wot a nice surprise to wake up to.

Jbeaucaire
. Many thanks for taking the time to come back to your old Post. Your suggestion ties up nicely with my current temporary solution. Your idea is a somewot better way than I am currently doing it so thanks. (Rather than trashing the whole sheet though I am thinking of exporting it to a text file before I use it and then trash it after, so that I have a back up should anything go wrong along the way, (Or the web site changes or vanishes in the future)). (I simply choose a text file as in total I will be looking at up to a million Url addresses, so using excel sheets for that mass of data may be a bit inefficient or swamp my memory space). I will just need to consider a good choice of separator to make sure I can easily re import the data if need be without any nasty problems coming up. (I will probably start a new thread asking experienced users for suggestions of a “Bullet-Proof” separator – I have endless problems with separators especially with the German / English different “ . “ and “ , “ conventions)
. I think there are no right ways to do this. The more methods the better I think. (Personally though I prefer to understand** the method before using it). As you say yours could also be another good elegant solution. So I will continue with that sort of solution in parallel.
.

…….But I am also very keen to explore and learn** the use of other Microsoft Libraries other than just the Excel Library. I have the feeling this is a more fundamental approach and so am not relying on more “nice” software that Excel provides and so is maybe a more flexible approach. (But I may be talking rubbish here –“ I am a Practical Physicist with no computer knowledge!” )

… so…..
Pike
. Thanks so much for your code. It was an unexpected bonus to have a solution using an extra library. I cannot wait to get my teeth into that code, stepping through it and “destroying it” with ‘green comment graffiti until I understand** it. (Have a feeling it will take me a while!)


…………

Vladimir Zakharov
. Thanks for that very clearly explained Code step(s) which is exactly wot I had been missing / looking for. And Doubly thanks for the corresponding Code bit for the other Library. It looks just like the sort of form I had been reading about or seeing in other threads, learn Videos etc, but as I am a bit slow at getting the exact syntax right in computing due to my lack of experience with computers, this sort of help is invaluable in getting me started.


Thanks again guys for the replies. It is very much appreciated. I will reply back here in this Thread and let you know how I got on and share my results once I have extensively tried out Pike and Vladimir’s Codes. (Also, if I may, I will come back here in this Thread with any follow up questions or problems)

Alan Elston
Bavaria


.
.
.




…………………………………………………………
P.s. Quick edit on my Post #6, as I seem to have mixed up maybe my Example files? Or at least the reference to them..

Here is the main .xlsm file of interest with all my macro’s to date:
https://app.box.com/s/6j9vt55zbbnd0u805tx6


And here just for completeness the .xlsx file where all the obtained data goes:
https://app.box.com/s/t0utcujdggd7sm8xr1x4

The relevant “Apfel” bits are highlighted in all the spreadsheets in Yellow in both Files
 
Upvote 0

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
Hello Alan,

As promised, I put together a workbook that would download the web tables to a worksheet. The workbook contains 3 macros and each one is in a separate VBA module.

The macro that opens the web page uses the MSXML2 library. This is faster that using Internet Explorer and also runs synchronously with VBA. The data from the tables is collected into a 2-D array. This allows all the data to be written to the worksheet in s single operation.

The macro is setup to access a single page but can easily be modified to pull tables from other pages. I want to keep it simple at this point so you can see what is going on and understand the code before expanding it. I am sure you will have questions.


The sample workbook can be downloaded from here:
Import Web Tables.xlsm
 
Upvote 0
Hello Alan,

.........I put together a workbook that would download the web tables to a worksheet.........


....... I thought getting Xmas presents was over for this year, until I saw this. Many Thanks. I really appreciate another alternative code. And with those explaining comments and the File.. A great bonus..It really helps with understanding wot is going on which means I learn a lot at the same time, as well as having a working solution to the current problem.

. My long term “Pipe dream” for my project would be to have two parallel solutions to my project. One using mainly VBA with or without possibly some minor use of external libraries. The other I would like to use Microsoft Visual Basic along with external Libraries and extensive use of simple text Files for my massive amount of Data. So this “in - between “solution which has aspects of both is great in helping me get familiar with the vast area of programming I need which would otherwise be a number too high for a programming novice like me.
. I am very grateful. I shall do justice and study and use this code a lot along with the others I have been examining and get back with results.
. my initial observations our ….
……..The macro that opens the web page uses the MSXML2 library. This is faster that using Internet Explorer and also runs synchronously with VBA. The data from the tables is collected into a 2-D array. This allows all the data to be written to the worksheet in s single operation.
………….. I want to keep it simple at this point so you can see what is going on and understand the code before expanding it. I am sure you will have questions………..

.1 : Yes… Wot an amazing speed! As with the code from pike it is very fast. Yours may I expect may be the fastest. I will need to do some speed test along the lines of wot I did here
http://www.mrexcel.com/forum/excel-...ria-code-alternative-looping.html#post3937559
In order to see that. It opens up new possibilities to me. – For a shorter version of my project which would access a smaller amount of Nutritional information (such as that from the web page here and not including my own enormous data bank which initially was to be enhanced by the info scraped here) , I could access the data from the web in real time rather than accessing the data from my Data bank.

2. : As for having more questions. … I will try to understand as much as possible first before troubling you too much!… As a beginner I recognize very little in the code that is related to working with the HTML document!

. Alan


P.s. Sorry for my untypical late response. I am suffering under A Mother in Laws Holiday period “Cyber ban” so can’t get “on” just now too much, if at all!!
 
Upvote 0
Hello Doc,

You're welcome. I didn't expect an immediate response back from you during the holidays.

Experience is great teacher but takes a lot of time. Any questions you have about the code, just ask and I will answer them.
 
Upvote 0
In the above code ...............
For IE object it looks like this:
Set objTable = IE.Document.All.Item("container2").getElementsByTagName("table") '(0)

where "container2" comes from the site's HTML code of the required table: div id="container2"


Hi Vladimir Zakharov
. I have been successfully using your first code line in modified versions of pike’s code, as well as successfully using your second code line in developing my original code further. I have just one very minor follow up question:
. I have occasionally seen additional arguments on similar code lines to your second (internet Explorer ) code line and therefore wanted to check that the actual form you intended giving me should look like this
Code:
[color=darkblue]Set[/color] objTable = appIE.Document.All.Item("container2").getElementsByTagName("table") [color=green]'(0)[/color]
……..I just wanted to check that there was not a typo or MrExcel Editor problem that changed an argument that you intended into a strange comment ‘0
. Can you please confirm that the code line above is as you intended or if not please advise on the actual form that you intended.
Thanks
Alan
 
Upvote 0
another alterative way with two library refernces
Rich (BB code):
Option Explicit
 'reference to Microsoft Internet Controls
 'reference to Microsoft HTML Object Library

Sub Web_Table_Option_One()
Dim xml    As Object
Dim html   As Object
Dim objTable As Object
Dim result As String
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long



another alterative way with two library references
Rich (BB code):
Option Explicit
 'reference to Microsoft Internet Controls
 'reference to Microsoft HTML Object Library

Sub Web_Table_Option_One()
……
.   ………….
……………………
……
………. Pike . Thanks so much for your code. It was an unexpected bonus to have a solution using an extra library. I cannot wait to get my teeth into that code, stepping through it and “destroying it” with ‘green comment graffiti until I understand…………..
Hi pike … I have successfully “destroyed” your code with ‘green comments etc. to the point that I have a very basic understanding of it all. . I had a couple of minor follow up questions: . You are using Late Binding for the external libraries , which I believe is the sensible thing to do when sharing files? . As always one of the things I do to help in my understanding of these types of codes is to try and get the code running with the equivalent Early Binding. This helps me get clear which libraries are used where, and allows me to use intellisense to aid in understanding and developing the code further. In doing so in the case of your code I hit on a couple of things I do not quite understand. Maybe you can help me.: .1 Late binding with __________.6.0 : You use Set xml = CreateObject("MSXML2.XMLHTTP.6.0") Rather than the more usual Set xml = CreateObject("MSXML2.XMLHTTP") . I note that both work in Late binding, but in early Binding only this works Dim xml As MSXML2.XMLHTTP . This for example does not work Dim xml As MSXML2.XMLHTTP.6.0 In Early Binding . - Could you explain your reasoning behind using MSXML2.XMLHTTP.6.0? . – is the answer that you specifically reference version 6 – I note that Early binding worked for me for versions 3, 4, 5, and 6 ( and crashes if I try to use a nonexistent number like MSXML2.XMLHTTP.2.0 ! ) .2 Libraries used. . You suggested you are using these two libraries 1) Microsoft HTML Object Library and 2) Microsoft Internet Controls. . My playing around with Early Binding suggests you are using 1) Microsoft HTML Object Library and 2) Microsoft XML (versions 3, 4 ,5 or 6). Can that be the case, or have I missed something more subtle..? (For example when late binding for the entire program I did not need to reference Microsoft Internet Controls) .3 Set ______ Nothing…. . I expect the answer to this is “yes, but you just left it out for simplicity..” : . Would it be a good idea to include this at the end of your code to be on the safe side?
Code:
[color=darkblue]Set[/color] xml = [color=darkblue]Nothing[/color]
…………………………………. . The last three questions are minor points to help me in understanding the code. I have Early and late binding versions successfully running in Xl 2003, 2007, 2010 on Vista and XP. So thanks again for the code Alan …………………………………….. . . . . P.s. More as amusement than anything else I include my destroyed version of your code that despite my best efforts still miraculously works!!!
Code:
[color=green]'  Pike: "another alterative way with two library refernces".......[/color]
 [color=green]'reference to Microsoft Internet Controls--WRONG maybe???!! soll  Microsoft HTML Object Library    http://stackoverflow.com/questions/20495035/vba-what-to-reference-for-htmlfile[/color]
 [color=green]'reference to Microsoft HTML Object Library>>>> Tick >> Microsoft XML, v6.0' For late binding ( or v4.0 or v5.0 or v3.0 )[/color]
[color=darkblue]Sub[/color] Web_Table_Option_OnepikeAlan1() [color=green]'version destroyed with comments etc. by Alan[/color]
 
[color=green]'Dim lRow As Long'?? Not  used at all (I think??)[/color]
 
Cells.Clear [color=green]'clear cells and give me time to see...[/color]
Application.Wait Now + TimeValue("00:00:02") [color=green]'....if code actually does anything[/color]
[color=green]'[/color]
[color=darkblue]Dim[/color] xml [color=darkblue]As[/color] [color=darkblue]Object[/color] ' Late Bin...d....
[color=green]'Set xml = CreateObject("MSXML2.XMLHTTP.6.0") ' ... ing --- original pike form (.3.0  .4.0 and  .5.0 works too)[/color]
[color=darkblue]Set[/color] xml = CreateObject("MSXML2.XMLHTTP") [color=green]' --- More usual form works as well[/color]
[color=green]'Dim xml As MSXML2.XMLHTTP: Set xml = New MSXML2.XMLHTTP 'Early Binding ( .6.0 bit don't work?)TOOLS >>> REFERENCES >> tick Microsoft XML, v6.0 (or v3.0 or v4.0 or v5.0) >> "if you dont have version 6.0 for Microsoft XML any previous version should do" > http://www.mrexcel.com/forum/excel-questions/759592-help-createobject-msxml2-xmlhttp-macro.html[/color]
[color=green]'[/color]
xml.Open "GET", "http://www.ernaehrung.de/lebensmittel/de/F110000/Apfel.php", [color=darkblue]False[/color] 'Probably getting / opening somehow...
xml.send [color=green]' to allow a highway in use like that in VBA "open for output as 1 etc.."[/color]
[color=darkblue]Dim[/color] result [color=darkblue]As[/color] String: [color=darkblue]Let[/color] result = xml.responseText: Debug.Print result [color=green]'The response - pushed out of the highway. Looks Like HTML info to the site but not the source code[/color]
[color=green]'Range("H1").Value = result'For me for a print out copy to text file etc.[/color]
 
[color=green]'Dim html  As Object 'late Binding[/color]
[color=green]'Set html = CreateObject("htmlfile")[/color]
[color=darkblue]Dim[/color] html [color=darkblue]As[/color] HTMLDocument: [color=darkblue]Set[/color] html = [color=darkblue]New[/color] HTMLDocument [color=green]'Early binding TOOLS >>> REFERENCES >>Microsoft HTML Object Library[/color]
 
[color=darkblue]Let[/color] html.body.innerHTML = result [color=green]'Somehow this gives enough info html so that it can be further used to get anything from the full HTML Soruce file[/color]
 
[color=darkblue]Dim[/color] objTable [color=darkblue]As[/color] [color=darkblue]Object[/color] [color=green]'An object to be set that has enough info in that we need to poppel out wot I want[/color]
[color=darkblue]Set[/color] objTable = html.getElementsByTagName("table") [color=green]'seems to get 8 items tying up with the 8 tables outputed. This is the one object[/color]
[color=darkblue]Dim[/color] ActRw [color=darkblue]As[/color] [color=darkblue]Long[/color] [color=green]'marker row incremented after writing out each table to just after last table[/color]
[color=darkblue]Dim[/color] lngRow [color=darkblue]As[/color] Long: [color=darkblue]Dim[/color] lngCol [color=darkblue]As[/color] [color=darkblue]Long[/color] [color=green]'Variables for each row and column in each table[/color]
 
[color=green]'Strategy is to go through each row in the objectTable then along the "cells" (columns) it has in that row and put it sequentially in the spreadsheet[/color]
[color=darkblue]Dim[/color] lngTable [color=darkblue]As[/color] Long, TableCount [color=darkblue]As[/color] Long: [color=darkblue]Let[/color] TableCount = objTable.Length [color=green]'Gives 8 length  is effectivelly Number/ Count[/color]
    [color=green]'For lngTable = 0 To objTable.Length - 1 'Going for each table .. -1 is probably the array start at 0 thing.[/color]
    [color=darkblue]For[/color] lngTable = 1 [color=darkblue]To[/color] objTable.Length - 1 [color=green]'The 1 mod over pike's original takes out something in <table> tags </table> thar is not really a table (that I want)[/color]
        [color=darkblue]For[/color] lngRow = 0 [color=darkblue]To[/color] objTable(lngTable).Rows.Length - 1 [color=green]'For each object this " objTable(lngTable).Rows.Length - 1 " gives the number of rows in that table[/color]
            [color=darkblue]For[/color] lngCol = 0 [color=darkblue]To[/color] objTable(lngTable).Rows(lngRow).Cells.Length - 1 [color=green]'This bit " objTable(lngTable).Rows(lngRow).Cells.Length " gives the columns ("Length" again really means Nzumber / Count)[/color]
                [color=darkblue]Let[/color] ThisWorkbook.Sheets("pike ").Cells(ActRw + lngRow + 1, lngCol + 1).Value = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText [color=green]'This  " objTable(items 1 - 8) thing " has a massive amount of stuff in it. In the watch window i have not pinned down yet the bit I want... but This " objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText " has!!!![/color]
            [color=darkblue]Next[/color] lngCol [color=green]'go to next "Cell" in that table row (=Column in the table)[/color]
        [color=darkblue]Next[/color] lngRow [color=green]'Go to next row in this table[/color]
        [color=darkblue]Let[/color] ActRw = ActRw + objTable(lngTable).Rows.Length + 1 [color=green]'make sure the next table starts at somewhere around the end of the last one[/color]
    [color=darkblue]Next[/color] lngTable [color=green]'Go to the next table[/color]
   
[color=darkblue]Set[/color] xml = [color=darkblue]Nothing[/color] [color=green]'Probably (no longer??) necerssary, but does no harm... who knows - compatibility with earliear versions etc..? that do not automatically shut things off??[/color]
   
[color=darkblue]End[/color] [color=darkblue]Sub[/color] [color=green]''Sub Web_Table_Option_OnepikeAlan1() 'version destroyed with comments etc. by Alan'  Usually I put a stop here so I do not loose Watch Window info[/color]
 
Upvote 0
Hi Leith

… I am sure you will have questions…..
Experience is great teacher but takes a lot of time. Any questions you have about the code, just ask and I will answer them.

. Actually after getting my head down and “Watching” in the watch window, F8 and all that, it was not too difficult to understand most of it. The code comments already there from you helped a lot. As did having a similar alternative code from pike which I went back to and worked through again, this time in parallel with working through yours.
.. some questions I do have:

.1a ) Would ( a very crude) “VBA” comparison of this…..
Code:
Request.Open "GET", URL, [color=darkblue]True[/color]
Request.send
. ….. be the VBA command something like this:
Code:
    [color=darkblue]Open[/color] Request [color=darkblue]For[/color] [color=darkblue]Output[/color] [color=darkblue]As[/color] 1 [color=green]' or 2 or 3 etc.....[/color]
… ( Both of these I would as a laymen describe as opening up for traffic a data “Highway” )?
.
.1b ) I could not get by Googling or by trial and error to see any difference in True or False in your (or pike’s) code. Could you explain that?

.2 a) Your Pagesrc String (equivalent to pike’s result String) is, I believe just approximately the first quarter of the full HTML Source document. Is it true to say the command s under in .3) somehow give your HTMLdoc Object (equivalent to pike’s html Object) the info it needs to obtain the full HTML Source File if, and when it needs it ?
( Possibly you answered this with your code comment
Code:
                      [color=green]' Convert the HTML code into an HTML Document Object.[/color]
, but I did not fully follow wot you were saying there)
.2b) Just out of interest, To get the Full HTMl source code I simply did the “right mouse click option thing” when in the “Apfel” internet site Browser Window (and copied and saved it to a Notepad ** file as HTML file..and printed it all out!!) ….. but I was wondering if I could get at it with a command through the HTMLdoc Object as an alternative??

.3) Early / late Binding in your Code. A always I try to convert Codes such as yours from the typically (and correctly so) late Binding form to the Early Binding equivalent as this helps me to understand.. I hit a problem with these code lines of yours:
Code:
HTMLdoc.write PageSrc
 
HTMLdoc.Close
.. as I could not get them to work with Early Binding. Can you explain this?

(I got your code to work (I think the same??) with early binding if I replaced those lines with a “pike- type equivalent”: -
Code:
HTMLdoc.body.innerHTML = PageSrc [color=green]'pike equivalent[/color]
)

.4 ) Color Formatting: Did you obtain the original Color Formatting from inspection of a pasted table in a spreadsheet, or is that info somewhere to be found in the HTML Source code and/ or the Heading or Table Objects. ( I expect it must be there somewhere but I could not find it!!)

. 5) A laymen’s explanation of your code and comparison with pike’s: Does this sound reasonable for a summary from a Layman with little programming knowledge? .. Your code and pike’s are basically the same. Yours is more implicit, …
. ……………..by that I mean _a) uses a larger, more detailed collection of Objects – in total about 7 x 7 (not all at the same time) compared with 1 from pike , _b) works through the HTML Source code, that is to say the selected tables therein, (using your Sub GetElemText >> Call GetElemText(oTable.Rows(r).Cells(c), Data(r, c)) ) and picks out the text. Compared with an equivalent command Data(r, c) = oTable.Rows(r).Cells(c).innerText which works as an alternative to calling your Sub GetElemText and is an alternative along the lines of wot pike’s does. ?
(. In this example for me there is no “right” way of doing it – As I am learning and also considering and comparing the possible methods for my RL project, I benefit from seeing as much detail and as many different methods as possible)

.6) Extremely minor point. (And maybe an obvious mistake by you to check out that I went thoroughly through all your code – I did! (I think) ): The last but 1 comment from you in Sub ScrapeData()
Code:
            [color=darkblue]Next[/color] n
       
      [color=green]' Output the tables and format the columns.[/color]
        Rng.Resize(ColumnSize:=colCnt).EntireColumn.AutoFit
Should just read “ format the columns “ as the tables are already outputted (each time in the loop).

7)………….! ….. Please just answer this last question if and when you have the time. It is maybe very vague and I am asking a bit much : (like someone whose only transport knowledge is a bicycle asking an expert mechanic to explain everything that is responsible for getting a car to move from A to B !!:…)….
…. Can you expand your ‘ comments to give a bit more detail of HOW your Sub GetElemText is working. – I got totally out of depth trying to navigate my way through those massive table Objects in the watch Window, - and my googling of “Childs” “Nodes” etc. did not help me much either!!…(I did incidentally I think get the point of using ByRef rather than the default ByValue - as a neat way ( I think?) to get the sub to work as a Function returning back those valueS )


……………………………
. Please answer only when you have the time. It is not urgent – mainly it is to satisfy my lust to learn. – I think I can just about understand now enough to be happy to start using versions of your code in my original RL requirement.

Alan

P.s. I hesitate to show here my desecrated version of your code here.. .. Which miraculously despite my best efforts still works.. but it can be found here… Sub ScrapeDataRuinedByAlan() in Module “Leith” which works on sheet “Leith” in this XL2007 .xlsm file:
https://app.box.com/s/6j9vt55zbbnd0u805tx6
And this XL2003 .xls File
https://app.box.com/s/omw1f7ffunilasii7hk1

………………………………………………………………………………
 
Upvote 0
To pike and Leith and Vladimir.
. It was particularly useful having all these codes. Sometimes when I got stuck trying to understand one I was able to go back and get further understanding the other and vice versa. It made it a particularly good learning exercise for me and reduced my learning time considerably. A very good example of the usefulness of these types of Forums.
. Thanks again
. Alan
(. P.s. I have checked various versions of your codes in XLs 2003 2007 2010 in Vista , XP and have encountered no problems with compatibility to date.)
 
Upvote 0
Hello Alan,

Where to start, where to start?

A 1a. No they are not the same.
This code
Code:
Request.Open "GET", URL, True
is an HTTP Request from the client to the server for information employing the "GET" method.

This code
Code:
Open Request For Output As #1
is used write data to a file.

A 1b. The True/False argument of the HTTP Request is the Asynchronous mode flag. If set False then control is immediately returns to VBA after Send is executed. If set True then control is returned to VBA after the server has sent back a response. I prefer to use asynchronous mode and test if my timeout period has expired to prevent the code from hanging due to an unresponsive server. In the example I provided I used synchronous mode to reduce the amount the code and keep it easier to understand.

A 2a. The responseText property returns the information requested by the Open method as a text string. In this case, the requested entity is the page source of the URL. No difference in what Pike and I are doing at this point.
However at this point in Pike's code, there is a difference.
Code:
Set html = CreateObject("htmlfile")
html.body.innerHTML = result
Pike uses only the Body of the Page Source code when converting it to an HTML DOM document. This excludes the Meta data, Java scripts, and Class information from being converted. Generally speaking, this information is not used when retrieving only text data from a web page.

You are correct that my code does convert all the code into a complete HTML DOM document. There is nothing wrong with Pike's method of converting the page source into an HTML Dom document. I simply prefer to the Write method of an HTML file. Since this method is designed to convert the page source text into an HTML DOM document. Both methods achieve the same results.

A 3. This is a case where late binding has to be used. The htmlfile is an ActiveX object that is a wrapper function for the IHTMLDocument2 interface in MSXML2. This gets into a lot of low level system operation that I will not discuss here.

A 4. I mean no disrespect to Pike. His code is short and does work - in this case. HTML code is very loosely defined. Because of this lack of enforced rules, I have learned over time not to get to comfortable with "short" code. It will come back and bite you. It is better to take the extra time and do it right the first time than have to return and do it again. Defining object variables makes the code more flexible and more easily adaptable to future changes.

The code I provide also more closely reproduces the original layout, including the color. The color is defined in a Cascading Stlye Sheet or CSS. This is reference is included in the HTML document area Pike excluded. It is a relative URL to server held text document containing the HTML CSS constructors. Again, this is topic that is too in depth to include in this answer. I found the colors used for the Table Headers and Grid lines and included them in my code. They are RGB colors.

A 5. There is not much difference in Pike's code and mine. I don't use innerHTML to retrieve text. This property is designed to return the inner HTML code for node element. That is the Child nodes within the node element itself. Nodes can have pure Text Nodes. If this all the have then innerHTML will return only the text and not the HTML code.

The recursion method I employ retrieves all text from the starting element node and all it's associated child nodes. Yo can return text like this using the innerHTML attribute. This very useful in HTML tables as a cell can contain a Table. I think you will see how tricky this gets with using recursion.

A 6. Yes, you are correct. The syntax is wrong. I moved that line from another place in the code and forgot to change it.

A 7. The process of recursion, a routine that calls itself, can be difficult to understand. When the macro is first called two variables are passed to it. Elem, the HTML object we want to get the text from, and ElemText which contains the concatenated text from the the element and it child nodes.

The macro will examine the element for a Text Node. If found the text is concatenated with the ElemText String. If this is the ElemText string is empty then ElemText is set to this value. If not then this value is concatenated with any previous text and separated by a tilde character. This character can be used later to parse the text string into the individual strings from each element. The macro will exit the Sub at this point. When this happens, this copy of the macro is removed from the call stack.

If the element node type is not 3 (a text node) then there are possibly child nodes that need to examined. The macro calls itself and passes the new Elem, Child Node, and old ElemText to itself. At this point a copy of the macro has been added to the Call Stack. This stack keeps track of the running macros. A macro is only removed when either End Sub or End Function has executed. Then the previous macro will start running again.

Hopes this helps. I know it is a lot to take in.
 
Last edited:
Upvote 0

Forum statistics

Threads
1,215,103
Messages
6,123,103
Members
449,096
Latest member
provoking

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top