VBA to extract info from web HTML

camle

Board Regular
Joined
Jan 10, 2013
Messages
216
I'm trying to pull info from a website with VBA.

With this code:
Code:
Option Explicit

Sub BrowseTVDBWithQueryStringAndXML()

    Dim XMLPage As New MSXML2.XMLHTTP60
    Dim HTMLdoc As New MSHTML.HTMLDocument

    Cells.Select
    Selection.ClearContents

    XMLPage.Open "GET", "https://www.thetvdb.com/series/quantico/seasons/all", False

    XMLPage.send

    HTMLdoc.body.innerHTML = XMLPage.responseText

    ProcessHTMLPage HTMLdoc

End Sub


Sub ProcessHTMLPage(HTMLPage As MSHTML.HTMLDocument)

    Dim HTMLTable As MSHTML.IHTMLElement
    Dim HTMLTables As MSHTML.IHTMLElementCollection
    Dim HTMLRow As MSHTML.IHTMLElement
    Dim HTMLCell As MSHTML.IHTMLElement
    Dim RowNum As Long, ColNum As Integer

    Set HTMLTables = HTMLPage.getElementsByTagName("table")

    RowNum = 1
    ColNum = 2

    For Each HTMLTable In HTMLTables


        Worksheets("Sheet1").Select

        For Each HTMLRow In HTMLTable.getElementsByTagName("a")
            'Debug.Print vbTab & HTMLRow.innerText

         ColNum = 2
            For Each HTMLCell In HTMLRow.Children

                Cells(RowNum, ColNum) = HTMLCell.innerText
                ColNum = ColNum + 1

            Next HTMLCell

            RowNum = RowNum + 1
        Next HTMLRow

    Next HTMLTable
    
    ActiveSheet.UsedRange.SpecialCells (xlCellTypeLastCell)

        Columns("C:Z").Select
    Selection.ClearContents
    Columns("B:B").Select
    Selection.SpecialCells(xlCellTypeBlanks).Select
    Selection.Delete Shift:=xlUp
    Range("A1").Select
    
    'Test for season episode
    
Dim http As Object, html As New HTMLDocument, topics As Object, titleElem As Object, detailsElem As Object, topic As HTMLHtmlElement
Dim i As Integer
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://www.thetvdb.com/series/quantico/seasons/all", False
http.send
html.body.innerHTML = http.responseText
Set topics = html.getElementsByClassName("table")
i = 1
    
    For Each topic In topics
Set titleElem = topic.getElementsByTagName("td")(0)
Sheets(1).Cells(i, 1).Value = titleElem.getElementsByTagName("a")(0).innerText

i = i + 1
Next
End Sub

I Get:
Code:
  [TABLE]
<tbody>[TR]
[TD="align: right"]1[/TD]
[TD="align: left"]Inside Look[/TD]
[/TR]
[TR]
[TD="align: right"]1[/TD]
[TD="align: left"]Run[/TD]
[/TR]
[TR]
[TD="align: right"]1[/TD]
[TD="align: left"]America[/TD]
[/TR]
[TR]
[TD="align: right"]1[/TD]
[TD="align: left"]Cover[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Kill[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Found[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]God[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Go[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Over[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Guilty[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Quantico[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Inside[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Alex[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Clear[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Answer[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Turn[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Clue[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Care[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Soon[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Fast[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Drive[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Right[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Yes[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]KUDOVE[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]LIPSTICK[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]STESCALADE[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]KUBARK[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]KMFORGET[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]AQUILINE[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]LCFLUTTER[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]ODENVY[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]CLEOPATRA[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]JMPALM[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]ZRTORCH[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]FALLENORACLE[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]EPICSHELTER[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]LNWILT[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]MOCKINGBIRD[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]MKTOPAZ[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]ODYOKE[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]KUMONK[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]MHORDER[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]GLOBALREACH[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]RAINBOW[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]RESISTANCE[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]The Conscience Code[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Fear and Flesh[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Hell's Gate[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Spy Games[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]The Blood of Romeo[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Flashbang[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]And the Heavens Fall[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Deep Cover[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Fear Feargach[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]No Place Is Home[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]The Art of War[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Ghosts[/TD]
[/TR]
[TR]
[TD][/TD]
[TD="align: left"]Who Are You?[/TD]
[/TR]
</tbody>[/TABLE]

The 1's in the 1st column are wrong. Need them to be 0x1, 0x2 ....

The list of episodes are in categories:
Specials, Season 1 on up as high as they have been on.
some seasons are like 2017, 2018

I would like to be able to match the headers:

Specials:
to match episodes in specials list
then the A column would read 0x1, 0x2 and so on in the specials list

then same with the seasons, matching the season numbers 1x1,1x2 or 2018x1, 2018x2

Not sure how the pull out and separate the html info

also pull out only English, some have many languages.

Thanks
 

Excel Facts

Enter current date or time
Ctrl+: enters current time. Ctrl+; enters current date. Use Ctrl+: Ctrl+; Enter for current date & time.

Forum statistics

Threads
1,215,064
Messages
6,122,936
Members
449,094
Latest member
teemeren

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top