Can't find META tags in HTML when using WinHTTP Request

Julesdude

Board Regular
Joined
Jan 24, 2010
Messages
196
Hi all,

I am trying to locate Meta tags from the following URL:

https://england.shelter.org.uk/professional_resources/policy_and_research/policy_library/policy_library_folder/2020_group_-_fiscal_stimulus_paper

However, what is passed from the result string to HTMLDoc.body.all is only a section of the actual HTML and the meta tags at the beginning of the script are absent and so cannot be found. How do I overcome this?

my code is as follows:

Code:
Dim Http2 As New WinHttpRequest
Dim i as long
dim result as string
dim elements
dim element

Set HTMLDoc = New MSHTML.HTMLDocument



    Http2.Open "GET", url, False
    ' send request
    Http2.send
    result = Http2.responseText



    'pass text of HTML document returned
    HTMLDoc.body.all = result


    Set Elements = HTMLDoc.all.tags("META")


        For Each singleElement In Elements 


        ActiveSheet.Cells(i, "A") = url
        ActiveSheet.Cells(i, "B") = "META " & singleElement.Name
            
        ActiveSheet.Cells(i, "D").NumberFormat = "@" ' text format for date
        ActiveSheet.Cells(i, "D") = singleElement.Content
        
        
        i = i + 1
        
        Next 
    
    
    Set Elements = Nothing
 

ScottR

Well-known Member
Joined
Apr 17, 2008
Messages
545
Meta tags are in at HTML documents head, not it's body. Not in a position to check but I think
Code:
HTMLDoc.all
will include Meta tags

Edit: Sorry I only read your introduction and not the code. You should have what you need but obviously don't! I'll try and look later when at an appropriate machine instead of posting unhelpful suggestions!
 
Last edited:

Kyle123

Well-known Member
Joined
Jan 24, 2012
Messages
2,667
Bit grotty, but gives you an idea:
Rich (BB code):
    Http2.Open "GET", Url, False
    ' send request
    Http2.send
    result = Http2.responseText

    'pass text of HTML document returned
    f = Split(result, "****** ")
    For x = LBound(f) + 1 To UBound(f) + 1
        On Error Resume Next
        Debug.Print Split(Split(f(x), "name=""")(1), """")(0), Split(Split(f(x), "content=""")(1), """")(0)
    Next x
 

Julesdude

Board Regular
Joined
Jan 24, 2010
Messages
196
Meta tags are in at HTML documents head, not it's body. Not in a position to check but I think
Code:
HTMLDoc.all
will include Meta tags

Edit: Sorry I only read your introduction and not the code. You should have what you need but obviously don't! I'll try and look later when at an appropriate machine instead of posting unhelpful suggestions!
Hi Scott, thanks for helping out.

I actually did try that thinking it would work and I get a type mismatch error message when I try to pass the result string to it.

I'm declaring HTMLDoc as:

Code:
Dim HTMLDoc As MSHTML.HTMLDocument
Set HTMLDoc = New MSHTML.HTMLDocument

i also changed
Code:
HTMLDoc.body.all

'to

HTMLDoc.body.innerHTML
This parses ok but cuts out the meta tags.

Not sure if that interrupts the passing? But anyway, it's still not working :(
 
Last edited:

Julesdude

Board Regular
Joined
Jan 24, 2010
Messages
196
Thanks Kyle,
When I try this I get an error - the f is highlighted 'can't assign to array'
How are you declaring the 'f'?
 

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
5,945
A trick I discovered is that, if instead of early binding the HTMLDocument object and loading it with the usual method like this:

Code:
    Dim HTMLdoc As HTMLDocument
    Set HTMLdoc = New HTMLDocument
    HTMLdoc.body.innerHTML = WinHttpRequest.responseText
you late bind HTMLDocument and load the response with its Write method, then the HEAD tag, which includes META tags, is fully populated:

Code:
    Dim HTMLdoc As Object
    Set HTMLdoc = CreateObject("HTMLfile")
    HTMLdoc.Open
    HTMLdoc.Write WinHttpRequest.responseText     
    HTMLdoc.Close
In your case the above Write line should be HTMLdoc.Write Http2.responseText
 

Kyle123

Well-known Member
Joined
Jan 24, 2012
Messages
2,667
The only issue with that though John is that you lose the good stuff like getting elements by class name etc and the more modern stuff. Though in this case it's very unlikely to matter
 

Julesdude

Board Regular
Joined
Jan 24, 2010
Messages
196
John, thank you. This works perfectly.
Will this capture all meta tags though. For example, ones like:
Code:
meta name = "...." content= "....">
meta property = "...." content = "....">

I would want to capture both elements/tags within.

Here's my code in full with your additions:

Code:
    Dim HTMLdoc As Object
   
 Set HTMLdoc = CreateObject("HTMLfile")
    HTMLdoc.Open
    HTMLdoc.Write Http2.responseText
  
    Set Elements = HTMLdoc.all.tags("META")
    
    
    ' add a new worksheet for the webpage and provide headers
    ActiveWorkbook.Sheets.Add
    Cells(1, 1) = "URL"
    Cells(1, 2) = "TAG"
    Cells(1, 3) = "LINK"
    Cells(1, 4) = "TEXT"
    
    i = ActiveSheet.Range("A" & Rows.Count).End(xlUp).Row + 1 ' find the last row


        For Each singleElement In Elements ' new
        ActiveSheet.Cells(i, "A") = url
        ActiveSheet.Cells(i, "B") = "META " & singleElement.Name
            
        ActiveSheet.Cells(i, "D").NumberFormat = "@" ' text format for date
        ActiveSheet.Cells(i, "D") = singleElement.Content
        
        
        i = i + 1
        
        Next ' new
 HTMLdoc.Close
 
Last edited:

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
5,945
Correct Kyle, so declare two HTMLDocuments, one late bound to get the META elements and the other early bound for getElementsByClassName, etc!
 

Forum statistics

Threads
1,077,851
Messages
5,336,753
Members
399,100
Latest member
darcob

Some videos you may like

This Week's Hot Topics

Top