Scraping a Site with VBA

spneufel

New Member
Joined
Feb 23, 2018
Messages
2
This seems so simple, but I'm stuck.

I need to scrape data from www.ibanknet.com. This site updates their data monthly, which is great, but the issue is I have a lot of rows that need updating.
The website uses a ton of frames and I'm having issues trying to pull out the data.

Format:
This is how I'm setup (I have a lot more info, but this will give you a good idea)


Goal is to open active link, pull the data that I want, paste it, move to next row. Thats it... simple right? lol


This is sort of what I'm looking for, but it isn't looping to the next row. I would need to add in the functionality to copy and paste the data I want to the selected rows based on criteria.

Sub ImportXrates()
Dim ws As Worksheet
Dim qt As QueryTable
Dim URL As String

URL = "http://www.x-rates.com/table/?from=USD&amount=1"
Set ws = Worksheets.Add

Set qt = ws.QueryTables.Add( _
Connection:="URL;" & URL, _
Destination:=Range("A5"))


With qt
.RefreshOnFileOpen = True
.Name = "Xrates"
.WebFormatting = xlWebFormattingRTF
.WebSelectionType = xlSpecifiedTables
.WebTables = "1"
.Refresh
End With

End Sub


Sub UpdateXrates()

Dim qt As QueryTable
Dim URL As String

If wsRates.Range("B1").Value = "" Then
MsgBox "You must choose a currency!", vbExclamation
Exit Sub
End If
If Not IsNumeric(wsRates.Range("B2").Value) Then
MsgBox " The amount must be a number!", vbExclamation
Exit Sub
End If

If wsRates.Range("B2").Value < 0.1 Or wsRates.Range("B2").Value > 100000 Then
MsgBox " That amount is too low or too high!", vbExclamation
End If


URL = _
"http://www.x-rates.com/table/?from=" & _
wsRates.Range("B1").Value & "&amount=" & _
wsRates.Range("B2").Value

Set qt = wsRates.QueryTables("XRates")

With qt
.Connection = "URL;" & URL
.Refresh
End With

End Sub






This is something else I found which can work if the looping part is added in. I would just need formula's added in to copy and paste the data into the correct cells by reference ID prior to it clearing the sheet and moving to next row. Obviously any reference to the passwords is meaningless since the site I'm scraping doesn't need and it has a bypass on the vba.
The name of my sheet is called Lists and it starts on A6 as shown below.

Private Sub CommandButton1_Click()


Dim IE As Object
Dim doc As Object
Dim strURL As String
Dim USERNAMEinput As String
Dim PASSWORDinput As String

Application.DisplayAlerts = False
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False

'On Error GoTo CleanUp 'going to clean up due to manual calculation still set and causes errors that break sheet with NA's until forced recalculation

' Statusbar
Application.StatusBar = "CODE is loading. Please wait, this should take a minute or so..."

Worksheets("Auto SSD Here").Cells.ClearContents


strURL = Sheets("List").Range("A6").Value 'THIS SETS THE URL, SWITCH TO HARD CODED URL, OR SWITCH REFERENCE TO THE CELL THAT CONTINES YOUR URL

Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True 'Make this true to see the internet explorer window excel is using, False to see nothing

.navigate strURL
Do Until .ReadyState = 4: DoEvents: Loop
Do While .Busy: DoEvents: Loop

'Resume SignIn
SignIn:


'on error skip to scraping data because the error is likely that the internet explorer object is already signed in
On Error GoTo ScrapingData




' Wait while IE loading...
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop


'the line below is working fine, take out the ' to login hard coded again, but now using a user input box to pass the value to the login form
'IE.Document.getElementByID("Ecom_User_ID").Value = "HARD CODED USERNAME GOES HERE"
IE.document.getElementByID("Ecom_User_ID").Value = Range("A1")
'above will try to input A1 in user ID, keeping it in simply because if it is already signed in, it will fail, and error to scraping data BUT CHANGE THIS FOR YOUR URL HTML, BECAUSE IT IS PROBABLY NOT CALLED ECOMUSERID FOR YOU


USERNAMEinput = Application.InputBox("Enter your ID Number/User Name:", "Input Box Text", Type:=2)
IE.document.getElementByID("Ecom_User_ID").Value = USERNAMEinput
'CHANGE ABOVE FOR YOUR URL HTML

' Wait while IE loading...
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop



PASSWORDinput = Application.InputBox("Enter your Password:", "Input Box Text", Type:=2)
IE.document.getElementByID("password-password").Value = PASSWORDinput
'CHANGE ABOVE FOR YOUR URL HTML


' Wait while IE loading...
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop



IE.document.getElementByID("loginButton").Click
'CHANGE ABOVE FOR YOUR URL HTML



' Wait while IE loading...
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop




ScrapingData:
Resume ScrapingData1
ScrapingData1:


Set doc = IE.document

GetAllTables doc 'this calls the entire scraping macro below

.Quit
End With


'THIS LINE ALERTS THE USER THAT LOGIN FAILED (IT CHECKS TO SEE IF ROW 8 IS BLANK BECAUSE ROW 8 SHOULD CONTAIN A TABLE TITLE)
'CHANGE THIS FOR YOUR CODE BECAUSE ROW 8 MAY NOT ALWAYS CONTAIN SOMETHING, BUT THIS WAS THE BEST WAY I COULD FIGURE OUT HOW TO ALERT THE USER THAT THE SIGN IN PROCESS FAILED AND SO SCRAPING WRONG INFO
If ThisWorkbook.Worksheets("Auto SSD Here").Range("A8").Value = "" Then
MsgBox "Your login credentials were incorrect, try logging in again. (If you are SURE you logged in correctly, then the code might need attention, contact ME.)"
End If

CleanUp:

' Clean up
Application.StatusBar = ""
Application.DisplayAlerts = True
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
Application.ScreenUpdating = True







End Sub





Sub GetAllTables(doc As Object)

' get all the tables from a webpage document, doc, and put them in a new worksheet

Dim ws As Worksheet
Dim rng As Range
Dim tbl As Object
Dim rw As Object
Dim cl As Object
Dim tabno As Long
Dim nextrow As Long
Dim I As Long


Application.DisplayAlerts = False
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual



Set ws = Worksheets("Auto SSD Here")

For Each tbl In doc.getElementsByTagName("TABLE")
tabno = tabno + 1
nextrow = nextrow + 1
Set rng = ws.Range("B" & nextrow)
rng.Offset(, -1) = "Table " & tabno
For Each rw In tbl.Rows
For Each cl In rw.Cells
rng.Value = cl.outerText
Set rng = rng.Offset(, 1)
I = I + 1
Next cl
nextrow = nextrow + 1
Set rng = rng.Offset(1, -I)
I = 0
'below I am trying to insert a status bar update for every new row
'CHANGE THIS TO SUIT YOUR SHEET SHEET THOUGH, BECAUSE THIS IS ASSUMING THERE ARE ABOUT 5-6 HUNDRED ROWS TO SCRAPE. I WASN'T SURE HOW TO MAKE THE MACRO COUNT THE ROWS, AND THEN COMPARE THE REAL ROWS
'AGAINST THE COMPLETED ROWS, SO INSTEAD IT IS AN ESTIMATE. MACRO MAY FINISH BEFORE REACHING 100% OR AFTER REACHING 120% OR SO IN MY CASE, BUT THIS IS OKAY FOR MY PURPOSES.
Application.StatusBar = "Approx. " & nextrow / 5.5 & "% complete."


Next rw

Next tbl

'ws.Cells.ClearFormats

End Sub



I could really use a hand with this. I'm stuck and I really don't want to update manually... it takes forever and I feel like stabbing myself in the eyes after a hundred rows...

Thanks People!
 

Excel Facts

Who is Mr Spreadsheet?
Author John Walkenbach was Mr Spreadsheet until his retirement in June 2019.
I should also mention that when copying based on frame name, the data doesn't seem to be consistent. Sometimes the field labeled "number of offices" is on different rows. Obviously a vlookup or index match would solve this, but just wanted to make sure anyone looking at this realizes that the frames don't seem to be consistent from 1 page to another other than name
 
Upvote 0

Forum statistics

Threads
1,214,919
Messages
6,122,260
Members
449,075
Latest member
staticfluids

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top