The best method is the one that works, is reliable and relatively fast. The method you choose depends on the specific web site and some methods may not work.
Good, but only if you're using the MS HTML Object Library to find and parse HTML elements. Some people parse the whole outerHTML using VBA string functions like Instr, Mid, etc., which I think is more prone to web site design changes.2) Get inner text, outer text, inner html or outer html from internet explorer (IE)- this method was by far the most reliable but the data was, at times, very hard to parse and did not always contain everything I needed (good for some applications but bad for others)
Did you try IE.ExecWB to select all, and copy to the clipboard? Forget SendKeys and AppActivate; there is better code on the web for sending keystrokes and mouse clicks to a specific window, which VBA SendKeys can't do - see SendKeys.3) automated Copy and Paste from IE- this was tantalisingly close to being perfect but is given to throwing up inexplicable errors whereby the type of information copied to the clipboard differs depending on whether it is done manually (ie CTRL+A, CTRL+C) or through the automated process (with the former I could get the HTML structure- ie tables, with the latter only text). The enigma here was that I could get the automated copy/paste to give me the right info IF I FIRST CLICKED INSIDE THE IE WINDOW USING MOUSE POINTER- however I was unable to automate this using a VB MACRO (I tried sendkeys and various other methods)
This is the easiest method because no knowledge of HTML is needed, but also very slow, as you've seen, because I think it renders the web page behind the scenes using IE. It also only works if the web page has HTML tables, though it is very good at parsing the table, dealing with odd HTML formatting like < BR > tags and importing into Excel cells. For multiple web query retrievals, have a single web query on a separate sheet and change its Connection property for each URL. I've also seen it freezing or crashing when many web queries are done; the solution is to clear the IE cache every 40-50 queries - there is code on the web to do this. Other methods (XMLhttp, IE automation, CreateDocumentFromURL) can be near instantaneous because they might be retrieving the data from the local cache instead of requesting it from the web site again.4) By automating an excel webquery- I recorded a macro of a query, this worked flawlessly giving me the structure of tables I needed. Snag was it was very very slow- even for a single page it might take 14 to 16 seconds (some of the other methods I used were near instantaneous). Also this method appears to encounter severe lagging/crashing problems when many refreshes are done (that may be because I don't know how to update the queries with different criteria, or properly extinguish them)
Do you mean XMLhttp request? Or XMLhttp DOMdocument? Very few web pages are XML documents with properly structured nodes.5) Loading the page as an XML document- I am investigating this method now- I know next-to-nothing about XML but have a hunch the sort of pages I am scraping (see example above) are suitable for this. I have managed to load the pages as an XML object but at present seem to be running into difficulties trying to parse the structure (ie various nodes) to extract text- keep running into object errors.
The fastest method is generally an XMLhttp request combined with the HTML Object Library. For a mega-fast technique which retrieves multiple pages simultaneously instead of sequentially, use these methods with multiple VBScript processes.
CreateDocumentFromURL (HTML Object Library method) is also fast but does some rendering of the web page. Advantage - you don't need XMLhttp.
IE automation is the next best, but relatively slow because it renders the web page. However some web pages require rendering before you can retrieve the data so this might be the only method that works. And some sites generate pages based on input boxes, dropdown menus etc., which require IE automation. Combine IE automation with the HTML Object Library.
Excel web query - the slowest, but also the easiest to use if the web page has HTML tables, write code for and reliable.