Specify URL Range and import table data only from each on one worksheet

Thomas99

New Member
Joined
Jul 9, 2012
Messages
8
Objective: Load only the table data from http://www.mdb.uscourts.gov/unclaimed_funds_results?foo=a&bar=1 (notice no "page=" is specified) and then from http://www.mdb.uscourts.gov/unclaimed_funds_results?page=1&foo=a&bar=1. I want to be able to specify the number of pages for the second URL (http://www.mdb.uscourts.gov/unclaimed_funds_results?page=1&foo=a&bar=1). In this case there are 128 pages. I notice the only variable that changes is the "page=x" (where x= the page number). I would like the import to be into a single worksheet and allow me to specify the number of pages. I want to run this on a daily basis comparing this worksheet to a new worksheet (the next day's query) and highlight what is new in the next day's query page in one color and what has been removed from this worksheet (the previous day's query) in a different color.

I am clearly over my head. I searched the internet and various forums (this seems to be the best). I figured someone has wanted to import data from a website returning a large amount of data, but only allowing visitor to view 20 results per page.


I started by using the Excel web query tool. Problem #1 is that I cannot get it to bring in just the table, it brings in the entire page. It appears all I should have to do to accomplish this is move the "place the arrow on the table I want to import". I cannot move the arrow. I am not sure if this is due to the design of the website or something I just do not understand. NOTE: When looking at the source code for the page in html it shows the data I want as a table. The site was developed in a CMS (i.e Drupal 7)

At the moment, I am having difficulty with importing both URLs and inputting the page range for just the second URL to pull in the rest of the data into one worksheet. Additionally, I am not sure how to address the additional rows I have imported (i.e. the rest of the page vs. table only).

I have a decent compare macro that will compare the two pages and highlight the difference in color. I could call this separately once I get the data into worksheets.

I would appreciate some direction and or specifics as to the best method for importing all this data in a clean fashion. I have been working on this for 6 hours (including research) without any real progress other than being able to import via the Web Query tool. I would really appreciate some assistance.

Thomas99
 

Excel Facts

Format cells as date
Select range and press Ctrl+Shift+3 to format cells as date. (Shift 3 is the # sign which sort of looks like a small calendar).
This will get you the data, but you'll need to be more specific about how to define whether the row has been removed, a lot of the case numbers are the same.

Requirements for the below:
1. Have IE 9 installed
2. Add a reference to Microsoft HTML Object Library

Code:
Public Function GetTable(pageNo As Long) As Variant

Dim htmlFile As HTMLDocument
Dim tab1 As HTMLTable
Dim rowCount As Long
Dim cellCount As Long
Dim data() As String


Dim url As String
If pageNo = 0 Then
    url = "http://www.mdb.uscourts.gov/unclaimed_funds_results?&foo=a&bar=1"


Else
    url = "http://www.mdb.uscourts.gov/unclaimed_funds_results?page=" & pageNo & "&foo=a&bar=1"
End If


With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", url
    .send
    Do: DoEvents: Loop Until .readystate = 4
    Set htmlFile = New HTMLDocument
    htmlFile.body.innerHTML = .responsetext
    .abort
End With


With htmlFile
    Set tab1 = .getElementsByClassName("sticky-enabled").Item(0)
    ReDim data(1 To tab1.Rows.Length - 1, 1 To tab1.Rows(0).Cells.Length)
        For rowCount = 1 To tab1.Rows.Length - 1
            For cellCount = 1 To tab1.Rows(0).Cells.Length
                data(rowCount, cellCount) = tab1.Rows(rowCount).Cells(cellCount - 1).innerHTML
        Next cellCount
    Next rowCount
End With


GetTable = data


End Function


Sub GetData()
Dim sheetname As String
Dim pageNo As Long
Dim ltable As Long
Dim data As Variant




sheetname = InputBox("Type the name of the sheet you would like the data adding to")
pageNo = InputBox("Type the last page number that you want to retrieve")


If Not IsNumeric(pageNo) Then
    MsgBox "You must Enter a valid number"
    Exit Sub
End If


With Sheets(sheetname)
    .Cells(1, 1).Resize(, 4).Value = Array("Case No", "Debtor", "Claimant Name", "Amount")
    For ltable = 0 To CLng(pageNo)
        data = GetTable(ltable)
        With .Cells(1, 1).CurrentRegion
            .Offset(.Rows.Count).Resize(UBound(data), 4).Value = data
        End With
    Next ltable
End With


End Sub
 
Upvote 0
Wow! First off Thank you Kyle123. I added the Microsoft HTML Object Library by selecting Tools->References and then checking the option.

NOTE: I do not have IE9. I am running XP with IE8.

When running the script I get a "Run-time error '9' : Subscript out of range". I researched this and found that it typically happens when referencing a non-existing array element, didn't specify the number of array elements, or referenced a non-existent collection member. As far as I can tell, it looks like the script properly calls out the array elements. They are the exact names in the table header. It looks like the number of array elements (4) is defined using UBound. I am not certain about the non-existent collection member.

I attempted to get this working without doing so.

Any further assistance in making the script work is appreciated.
 
Upvote 0
Hi Thomas,

This code won't work with IE8, it has to have IE9, although I'm intrigued by the subscript error since this is one I wouldn't expect - what line do you get this on in the code?

I'll have a think when I get chance today and post a solution for IE8 if you can't install IE9.

Cheers

Kyle
 
Upvote 0
Microsoft does not support IE9 on XP. Of course, I always have the option of upgrading my system or purchasing and installing office on my other system with Windows 7. If you are aware of a solution to make it work on XP, I would be interested so I do not have to upgrade or make a procurement at this time.

When stepping through the script the error is not thrown until the very last line "End Sub".

NOTE: When executing the script, I enter abc for the sheet name and 3 for the number of pages. Are you able to execute it without error on your system?

Thomas99
 
Upvote 0
Try changing:
Code:
Set tab1 = .getElementsByClassName("sticky-enabled").Item(0)
To:
Code:
Set tab1 = .getElementsByTagName("Table").Item(0)

Yes, the code runs perfectly for me :)
 
Upvote 0
I installed Office 2010 on my Win 7 system and received the same "Run-time error '9' : Subscript out of range" error. I tried your suggestion above and still got the error. However, I stepped through the script again and it failed just after the line that says "With Sheets(sheetname)" (i.e. The pointer was on that line and I pressed F8 and the error appeared). I have copied the text from that section of my script below. NOTE: I copied the test from your post above and pasted it as well as verified it visually. Considering the script runs perfectly for you and the array looks correct to me, I am still seeking your input. I also inserted an image below of my script and references I am using. Please look it over and if you see anything wrong, let me know. I am open to any and all suggestions at this point.

======

With Sheets(sheetname)
.Cells(1, 1).Resize(, 4).Value = Array("Case No", "Debtor", "Claimant Name", "Amount")
For ltable = 0 To CLng(pageNo)
data = GetTable(ltable)
With .Cells(1, 1).CurrentRegion
.Offset(.Rows.Count).Resize(UBound(data), 4).Value = data
End With
Next ltable
End With


End Sub
============


image2.jpg
 
Upvote 0
I wanted to add to my above post that I did try this in a "clean" workbook without the other macros loaded.

I also wanted to note that in debug when I hover over the "With Sheets" text I get the following "Sheets(sheetname) = Subscript out of range<subscript out="" of="" range=""><subscript out="" of="" range="">" (with less than sign before Subscript out of range and a greater than sign after it. When I hover over "(sheetname)" I get "sheetname = "abc"". I searched for forum for subscript our of range errors, but did not find anything that seemed to directly relate to my situation. I tried commenting out the get data line for the sheetname and the inputbox for sheetname and putting in the name of the sheet (Sheet1), but that did not work. I probably did something incorrectly.</subscript></subscript>
 
Last edited:
Upvote 0
Hi Thomas,

The error doesn't have anything to do with your version if excel, it is simply that you don't have a sheet named arbc . The side is asking you where you'd like to dump the data, so you need to type the name of a valid sheet, eg

Hope that helps :)
 
Upvote 0
Oh my goodness. I could have swore, I ran the script a few times specifying "Sheet1" as the name. I created a sheet named "abc" and ran the script as previously stated on the system with IE9 and it worked. THANK YOU! I appreciate all of your assistance and patience. I will work on sorting and comparing from here.

NOTE: As you stated it did not work on IE8.

Thomas99
 
Upvote 0

Forum statistics

Threads
1,214,593
Messages
6,120,435
Members
448,962
Latest member
Fenes

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top