URL Query Loop Freezing Excel 2010

Vistalite

New Member
Joined
Dec 27, 2013
Messages
1
Hello all,

I am in the process of creating a VBA macro in Excel 2010 to scrape text from a list of websites to examine the frequency of words. Ideally, I would like to plop in a list of thousands of websites and let the macro run overnight. I’m running Windows 7 64bit.

Ill post the entire Module below:
Code:
Sub WebsiteScrape1()
    Dim Query_URL, URL, txt, c, MyStr As String
    Dim NumSites As Integer
    Dim PuncChars, StopWords, x As Variant
    Dim i, j, k, r, wordCnt As Long
    Dim Match As Boolean
    Dim Start, Finish
    Dim AllWords As Range
    Dim InputSheet As Worksheet
    Dim pc As PivotCache
    Dim PT As PivotTable
    
    NumSites = 4 'Application.WorksheetFunction.CountA(Sheets("Site List").Range("A2", Sheets("Site List").Range("A2").End(xlDown)))
    
    PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _
    "$", "%", "&", "(", ")", " - ", "_", "--", "+", _
    "=", "~", "/", "\", "{", "}", "[", "]", """", "?", "*")
    StopWords = RangeToArray(Sheets("Stop Words").Range("A1", Sheets("Stop Words").Range("A" & Rows.Count).End(xlUp)))
        
    For i = 0 To (NumSites - 1)
        Debug.Print ""
        Debug.Print i, "First Line"                          ' First Line Success
        j = k = r = 0
        Query_URL = URL = txt = c = ""
        
        Application.ScreenUpdating = False
        Start = Timer
        URL = Sheets("Site List").Range("A2").Offset(i, 0).Value
        Query_URL = "URL;http://www." & URL
        Sheets("JunkYard").UsedRange.Delete
        Sheets("Word Frequency").UsedRange.Delete
        With Sheets("JunkYard").QueryTables.Add(Connection:=Query_URL, Destination:=Sheets("JunkYard").Range("A1"))
            .Name = "Website Text"
            .FieldNames = True
            .RowNumbers = False
            .FillAdjacentFormulas = False
            .PreserveFormatting = True
            .RefreshOnFileOpen = False
            .BackgroundQuery = False
            .RefreshStyle = xlOverwriteCells
            .SavePassword = False
            .SaveData = False
            .AdjustColumnWidth = True
            .RefreshPeriod = 0
            .WebSelectionType = xlEntirePage
            .WebFormatting = xlWebFormattingNone
            .WebPreFormattedTextToColumns = True
            .WebConsecutiveDelimitersAsOne = True
            .WebSingleBlockTextImport = False
            .WebDisableDateRecognition = False
            .WebDisableRedirections = False
        On Error GoTo Error_Query:
        Debug.Print i, "Query Start"                      ' Query Start
            .Refresh BackgroundQuery:=False
        Debug.Print i, "Query Done"                       ' Query Done
        End With
        On Error GoTo 0
        
        Sheets("JunkYard").Activate
        Columns("A:A").Select
        ActiveWorkbook.Worksheets("JunkYard").Sort.SortFields.Clear
        ActiveWorkbook.Worksheets("JunkYard").Sort.SortFields.Add Key:=Range("A1"), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortTextAsNumbers
        If Range("A1") <> "" Then
            With ActiveWorkbook.Worksheets("JunkYard").Sort
                .SetRange Range("A1", Range("A" & Rows.Count).End(xlUp))
                .Header = xlNo
                .MatchCase = False
                .Orientation = xlTopToBottom
                .SortMethod = xlPinYin
                .Apply
            End With
        End If
        Set InputSheet = ActiveSheet
        Sheets("Word Frequency").Range("A1") = "All Words"
        InputSheet.Activate
        wordCnt = 2
        r = 1
        
        Debug.Print i, "Cleaning Start"                 ' Cleaning Start
        'Loop until blank cell is encountered
        Do While Cells(r, 1) <> ""
            'covert to UPPERCASE
            txt = UCase(Cells(r, 1))
            
           'Remove punctuation
            For j = 0 To UBound(PuncChars)
                txt = Replace(txt, PuncChars(j), "")
            Next j
           'Remove excess spaces
            txt = WorksheetFunction.Trim(txt)
           'Extract the words
            x = Split(txt)
            
            For j = 0 To UBound(x)
                If x(j) Like "*[!A-Z]*" Then
                    For k = 1 To Len(x(j))
                        c = Mid$(x(j), k, 1)
                        If c Like "[!A-Z]" Then x(j) = Replace(MyStr, c, "", k)
                    Next k
                End If
                Match = False
                For k = 1 To UBound(StopWords)
                    If x(j) = StopWords(k) Then
                        Match = True
                        Exit For
                    End If
                Next k
                If Match = False Then

                    ' Removing this line will solve the freezing!! (but then output of the macro will be blank)
                    Sheets("Word Frequency").Range("A" & wordCnt).FormulaR1C1 = x(j)

                    wordCnt = wordCnt + 1
                End If
            Next j
        r = r + 1
        Loop
        
        Debug.Print i, "Cleaning Done"                    ' Cleaning Done
        'Create pivot table
        Sheets("Word Frequency").Activate
        Set AllWords = Range("A1", ActiveSheet.Range("A" & Rows.Count).End(xlUp))
        On Error Resume Next
        Set pc = ActiveWorkbook.PivotCaches.Add(xlDatabase, "'" & AllWords.Parent.Name & "'!" & AllWords.Address(ReferenceStyle:=xlR1C1))
        'Set PC = ActiveWorkbook.PivotCaches.Add(xlDatabase, AllWords)
        Set PT = pc.CreatePivotTable(ActiveSheet.Range("C1"))
        With PT
            .AddDataField .PivotFields("All Words")
            .PivotFields("All Words").Orientation = xlRowField
        End With
        Debug.Print i, "Pivot Done"                         ' Pivot Done
        On Error GoTo 0

        Sheets("Word Frequency").Activate
        Columns("C:D").Copy
        Columns("C:D").PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
            :=False, Transpose:=False
        Rows("1:1").Select
        Application.CutCopyMode = False
        Selection.Delete Shift:=xlUp
        Columns("C:D").AutoFilter
        ActiveWorkbook.ActiveSheet.AutoFilter.Sort.SortFields.Clear
        ActiveWorkbook.ActiveSheet.AutoFilter.Sort.SortFields.Add Key:=Range _
            ("D1", ActiveSheet.Range("D1").End(xlDown)), SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:= _
            xlSortTextAsNumbers
        On Error GoTo Error_Sort
        With ActiveWorkbook.ActiveSheet.AutoFilter.Sort
            .Header = xlYes
            .MatchCase = False
            .Orientation = xlTopToBottom
            .Apply
        End With
        Debug.Print i, "Sort Done"                   ' Sort Done
        On Error GoTo 0
        Columns("A:B").Delete Shift:=xlToLeft
        Range("A2:B51").Copy
        Sheets("Main").Activate
        ActiveSheet.Range("C2").Offset(i * 2, 0).PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks:=False, Transpose:=True

Continue:
        Sheets("Site List").Range("A2").Offset(i, 0).Copy
        Sheets("Main").Activate
        Sheets("Main").Range("A2").Offset(i * 2, 0).Select
        ActiveSheet.Paste
        
        Do While Application.Ready = False
        Debug.Print "Not Ready", i
        Loop
        
        With ActiveWorkbook
            For Each cn In .Connections
                cn.Delete
            Next
            For Each pc In .PivotCaches
                pc.MissingItemsLimit = xlMissingItemsNone
            Next pc
        End With
    
        With Sheets("JunkYard")
            For j = .QueryTables.Count To 1 Step -1
                .QueryTables(j).Delete
            Next
        End With
        
        Debug.Print i, "Memory Cleanup Done"            ' Memory Cleanup Done
        
        Finish = Timer
        Application.ScreenUpdating = True
        Sheets("Main").Range("B2").Offset(i * 2, 0).Select
        ActiveCell.Formula = Finish - Start

        Debug.Print i, "Last Line"                          ' Last Line
    Next i
Exit Sub

Error_Sort:
    MsgBox ("Sort failed.")
Resume Continue

Error_Query:
    Debug.Print i, "Query Error"                        ' Query Error
Resume Continue

Error_Pivot:
    MsgBox ("Pivot Table could not be created.")
Resume Continue
    
End Sub

Function RangeToArray(ByVal my_range As Range) As String()

    Dim vArray As Variant
    Dim sArray() As String
    Dim i As Long
    
    vArray = my_range.Value
    ReDim sArray(1 To UBound(vArray))
    
    For i = 1 To UBound(vArray)
        sArray(i) = vArray(i, 1)
    Next
    
    RangeToArray = sArray()

End Function

Most of my code is copied from various places on the internet and modified to fit this particular macro. I'm not the best with VBA, so please don't hesitate to dumb-down any explanations for me! Also, please feel free to criticize and/or recommend other improvements to my sloppily-written code! Now, to the freezing:

Problem:
When executed, the macro starts running perfectly. I wrote some debug.print lines in my macro to help me keep track of where the freezing occurs. Excel 2010 will freeze & the title bar will read (not responding) on the URL query after the sum of the "Grand Total" values on the Worksheet named "Main" is greater than about 7000. In other words:

If (Sum of all numbers in Column C in Worksheets("Main") > 7000) Then (the next URL Query will permanently freeze Excel)

Attempted Problem Solving:

I am thoroughly perplexed by the previous relationship, but my hypothesis has held for about 3 days worth of fighting/debugging. I added a bit of memory & connection cleaning code to the end of every iteration, wondering if there is some sort of 'buffer' I am filling..?

I determined which line causes the URL Query to fail, but I am confused as to why they are related at all, and how to fix the freezing while still getting output.
Code:
' Removing this line will solve the freezing!! (but then output of the macro will be blank)
                    Sheets("Word Frequency").Range("A" & wordCnt).FormulaR1C1 = x(j)

I have tried using variations such as .Text = x(j), .Value = x(j), Cells(WordCnt, 1) = x(j), but nothing solves the freezing
Stepping through the macro with F8 produces perfect results every time, no matter how many sites or words I iterate through.
Running the macro for one site, then starting it immediately on another site produces perfect results every time, no matter how many sites or words I iterate through.
Disabling the single line of code above produces perfect results every time (besides having the word-frequency output), no matter how many sites or words I iterate through.

Other Notes:
If a website Query fails, I am totally fine with just skipping to the next site. Plenty of fish in the sea, and the internet is a messy place. On the other hand, If there something I can do to decrease the number of failed sites, I'm all ears.

Here's a link to the full .xlsm file, for your viewing pleasure. If the link breaks, or is removed, here's a quick description so you can make your own:

https://dl.dropboxusercontent.com/u/42453705/Frequency Test/Keyword Frequency Test 1.xlsm

One worksheet named "Main" - Formatted output goes here
One worksheet named "JunkYard" - Workspace for messy data
One worksheet named "Word Frequency" - Aggregate messy data
One worksheet named "Site List" - Website #1 in A2, Website #2 in A3, etc. (remove "www."; "www.google.com" should be "google.com")
One worksheet named "Stop Words" - List containing filtered words like "and", "but", "also", etc. (Word #1 in A1, Word #2 in A2, etc.)

Thank you for any advice on this problem. I really appreciate the feedback to improve my coding skills. Please ask for clarification if anything in here if I've explained poorly!

- Vistalite
 

Some videos you may like

Excel Facts

Get help while writing formula
Click the italics "fx" icon to the left of the formula bar to open the Functions Arguments dialog. Help is displayed for each argument.

William_B

New Member
Joined
Dec 31, 2013
Messages
1
I actually found your post looking for the answer to the same question. I had a 3000+ line spreadsheet that I was comparing line by line to itself essentially(duplicate check). It froze after a few iterations, but I checked the poor dresses in the task manager to see Excel maxed out and after about 20 minutes like that it did finish it's task.

That said I would probably make sure whatever you're running yours on is aptly cooled, maybe break your workbook into smaller workbooks so it won't run your system at peak until failure/completion. Hope that helps some.
 

Watch MrExcel Video

Forum statistics

Threads
1,127,107
Messages
5,622,782
Members
415,927
Latest member
vedasinternational

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top