Importing data from multiple pages of a single website

gsingh4

New Member
Joined
Jun 3, 2016
Messages
10
Hi all,

I am new to VBA and have been facing a lot of trouble with figuring out how to pull data from the website boxofficemojo.com. I am trying to extract weekly data for the years 2010-2015. So i found a code that did something along the same lines and changed it to suit my needs. It is as follows:

Sub Movies()
Dim nextRow As Integer, i As Integer
Application.ScreenUpdating = False
Application.DisplayStatusBar = True
For i = 1 To 52 'this is the page range to be captured. At the time there was 4083 total.
Application.StatusBar = "Processing Page " & i
nextRow = ActiveSheet.Cells(Rows.Count, "A").End(xlUp).Row + 1
With ActiveSheet.QueryTables.Add(<wbr>Connection:= _
"URL;http://www.boxofficemojo.<wbr>com/weekly/chart/?yr=2015&wk=&<wbr>p=.htm" & i, _
Destination:=Range("A" & nextRow))

.Name = "weekly/chart/?yr=2015&wk=&p=.<wbr>htm"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingAll
.WebTables = "5"
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = True
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
ThisWorkbook.Save
Next i
Application.StatusBar = False
End Sub

However instead of pulling the data for weeks 1 - 52 of 2015, it keeps on pulling data for the latest week of 2016 and repeats it 52 times. I am not sure what's wrong here and any help would be really really appreciated.

Thanks for your effort.




 

Excel Facts

Repeat Last Command
Pressing F4 adds dollar signs when editing a formula. When not editing, F4 repeats last command.
Looks like the wk number goes here (red) instead of at the end of the URL.

Code:
With ActiveSheet.QueryTables.Add(Connection:= _
      "URL;http://www.boxofficemojo.com/weekly/chart/?yr=2015&wk=[COLOR=#ff0000]" & Format(i, "00") & "[/COLOR]&p=.htm", _
       Destination:=Range("A" & nextRow))

And format the week number as two digits e.g.; 01
 
Last edited:
Upvote 0
Hey

Thanks a lot for the reply. I tried what you suggested by keeping everything the same and adding the red part. But for some reason it keeps skipping 10 weeks. Do you know why that might be happening?
 
Upvote 0
I didn't update the format on both the places (the URl and the Name). It's fixed now. Thanks a lot.

I have another question, if you will.

You may have noticed that this code downloads data with a hyperlink to the movies. I have another code that runs and extracts table 5 from each movie page by going to the hyperlink. Would you happen to know how I can skip the code from running on hyperlink that is broken or a movie that does not have a table 5? I can send you the code and explain in more detail if you're able to provide me some insights.

Thank :)
 
Upvote 0
Post the code in this thread if you want. See my signature block below about the use of code tag.
 
Upvote 0
So after I get the movies list, the code below runs in the following steps:

1) Opens the hyperlink of the movie
2)Extracts the table that's below the title of the movie. For example, it will open up the link of the movie In the Heart of the Sea (2015) - Weekly Box Office Results - Box Office Mojo and extract the one table with info like Distributor, release date, Rating etc.
3) Pastes it on a different sheet in the same workbook
4) Copies the information from that sheet to the master sheet

Code:
Sub getMovieData()
    Dim index As Integer
    Dim ws As Worksheet
    Set ws = ThisWorkbook.Sheets.Add(After:=ThisWorkbook.Sheets(ThisWorkbook.Sheets.Count))
    ws.Name = "Beta1"
    
    Call ClearData
    index = 4073
    For Each Rng In Worksheets("Sheet1").Range("C4073:C5678")
        Dim address As String
        address = Rng.Hyperlinks(1).address
        Call ClearTempData
        GetMovieDetails (address)
        CopyData (index)
        index = index + 1
    Next
End Sub


Sub GetMovieDetails(URL As String)
'
' Reads movie data from Box office Mojo
'
'
        URL = "URL;" & URL
        Worksheets("Beta1").Activate
        With ActiveSheet.QueryTables.Add(Connection:=URL, Destination:=Range("A1"))
            .FieldNames = True
            .RowNumbers = False
            .FillAdjacentFormulas = False
            .PreserveFormatting = True
            .RefreshOnFileOpen = False
            .BackgroundQuery = True
            .RefreshStyle = xlInsertDeleteCells
            .SavePassword = False
            .SaveData = True
            .AdjustColumnWidth = True
            .RefreshPeriod = 0
            .WebSelectionType = xlSpecifiedTables
            .WebFormatting = xlWebFormattingNone
            .WebTables = "6"
            .WebPreFormattedTextToColumns = True
            .WebConsecutiveDelimitersAsOne = True
            .WebSingleBlockTextImport = False
            .WebDisableDateRecognition = False
            .WebDisableRedirections = False
            .Refresh BackgroundQuery:=False
        End With


End Sub




Sub CopyData(index As Integer)


Dim ws As Worksheet
Dim temp As Worksheet
Set temp = Sheets("Beta1")


Dim Dest As Worksheet
Set Dest = Sheets("Sheet1")


Set CurCell = temp.Cells(2, 1)
ReadText = CurCell.Value
Set CurCell = Dest.Cells(index, 15)
CurCell.Value = GetValue(ReadText)


Set CurCell = temp.Cells(3, 1)
ReadText = CurCell.Value
Set CurCell = Dest.Cells(index, 16)
CurCell.Value = GetValue(ReadText)


Set CurCell = temp.Cells(4, 1)
ReadText = CurCell.Value
Set CurCell = Dest.Cells(index, 17)
CurCell.Value = GetValue(ReadText)


Set CurCell = temp.Cells(2, 2)
ReadText = CurCell.Value
Set CurCell = Dest.Cells(index, 18)
CurCell.Value = GetValue(ReadText)


Set CurCell = temp.Cells(3, 2)
ReadText = CurCell.Value
Set CurCell = Dest.Cells(index, 19)
CurCell.Value = GetValue(ReadText)


Set CurCell = temp.Cells(4, 2)
ReadText = CurCell.Value
Set CurCell = Dest.Cells(index, 20)
CurCell.Value = GetValue(ReadText)


End Sub


Sub ClearData()


Dim Dest As Worksheet
Set Dest = Sheets("Sheet1")


numx = 2


Dest.Range("Q:Q").Clear


End Sub




Sub ClearTempData()


Dim Dest As Worksheet
Set Dest = Sheets("Beta1")


numx = 2


Dest.Range("A:A").Clear


End Sub


Function GetValue(str As Variant) As String
Dim Result() As String


Result = Split(str, ":")


    GetValue = Result(1)
End Function




But sometimes, the movie's page would have no table or a table with a different format and that stops the code from running altogether. And this is happening rather frequently since there's 5000+ movies in a sheet. I have to restart the code again. For eg, this movie has no table

Box Office Mojo - Movie Index, A-Z

Is it possible to add a function or a sub that just skips the movie if it doesn't have the table or table in a different form than what i need? I tried searching how to do it but all in vain.

I'd really appreciate any insights.
 
Upvote 0
In your CopyData procedure, add On Error Resume Next at the top to ignore errors, or better yet, add some If-Then lines to test if there is a table and test the table format.
 
Last edited:
Upvote 0

Forum statistics

Threads
1,214,822
Messages
6,121,770
Members
449,049
Latest member
greyangel23

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top