Excel VBA to Sign Into and then Extract Data from a Website

Kc7487

New Member
Joined
Mar 19, 2016
Messages
20
Hello Mr. Excel folks!

I have made great strides on my attempts to automate logging into, and then extracting data, from a website.

I have gotten all the way to the point where the data is indeed in excel! However, the data is all going in only one column, column A, and I cannot figure out how to fix this problem. I was hoping somebody could take a look at this code and maybe give me a hint as to what I need to do to make the table populate properly (each cell in a different cell).

It is so close, but with the entire row (like 30 columns or so) populating into ONLY one column (column A) it just won't work for me.

Please, any help would be greatly appreciated.

I will post my current code below.


Code:
Private Sub CommandButton1_Click()


' CREATE_IEAPP Macro
' Testing signing in creating an internet explorer application and passing username and pw to it
'

'THIS MACRO IS EDITED FROM IEAPP MACRO, trying to pass values directly to the control boxes on the sign in form and not use send keys


    Dim i As Long
    Dim IE As Object
    Dim objElement As Object
    Dim objCollection As Object
Application.DisplayAlerts = False
    
    Worksheets("Auto SSD Here").Cells.ClearContents
    
    
    ' Create InternetExplorer Object
    Set IE = CreateObject("InternetExplorer.Application")
 
    ' You can uncoment Next line To see form results
    IE.Visible = True
 
    ' Send the form data To URL As POST binary request
    IE.Navigate Sheets("Auto Todays SSD").Range("A1").Value 'Changed from MACRO WORKAROUND A13 due to Web Querry not working....trying to go straight to the SSD to login
 
    ' Statusbar
        Application.StatusBar = "My CODE is loading. Please wait..."
 
    ' Wait while IE loading...
    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop
  
  'on error scip to scraping data because the error is likely that the internet explorer object is already signed in
  On Error GoTo ScrapingData
  
  
    
    ' Wait while IE loading...
    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop
             'the line below is working fine, take out the ' to login hard coded again, but trying to use a cell to pass the value to the login form, SEEMS TO WORK ON LINE BELOW IT
             'IE.Document.getElementByID("Ecom_User_ID").Value = "HARD CODED USERNAME GOES HERE" 
              IE.document.getElementByID("Ecom_User_ID").Value = Range("B14")
    
      ' Wait while IE loading...
    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop
             'the line below is working fine, take out the ' to login hard coded again, but trying to use a cell to pass the value to the login form, SEEMS TO WORK ON LINE BELOW IT
             'IE.Document.getElementByID("password-password").Value = "HARD CODED PASSWORD GOES HERE"
              IE.document.getElementByID("password-password").Value = Range("B15")
             
      ' Wait while IE loading...
    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop
    
            IE.document.getElementByID("loginButton").Click
 
      ' Wait while IE loading...
    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop
 
 
 
 
 'THIS SCRAPING DATA PART IS WHAT IS GIVING ME TROUBLE
 'SCRAPING THE DATA USING A WEB QUERY DOES NOT SEEM TO WORK, BECAUSE THE WEB QUERY IS NOT RECOGNIZED AS LOGGED IN, EVEN IF EXCEL OPENED UP AN INTERNET EXPLORER OBJECT AND LOGGED IN A FEW SECONDS BEFORE.
 
 ' THEREFORE TRYING TO SCRAPE THE DATA INSIDE OF THE SAME INTERNET EXPLORER OBJECT WHICH EXCEL VBA USES TO LOG IN. USING COPY/PASTE DOES ACTUALLY USUALLY SEEM TO WORK (USING SEND KEYS) BUT UNFORTUNATELY
 'SOMETIMES INSTEAD OF COPYING AND PASTING THE SSD PART, IT WILL SOMEHOW SKIP THE COPY PORTION, AND SIMPLY PASTE WHATEVER WAS LAST COPIED TO THE CLIPBOARD IN CELL A1
 
 'Furthermore and even more unfortunately, a windows security dialog box pops up which requires clicking on cancel like 39 times before the SSD appears....
 
 'NOW TRYING TO USE A DIFFERENT CODE THAT DOES NOT UTILIZE THE CLIPBOARD OR SENDKEYS, AND COMMENTING AWAY ALL OF THIS TEMPORARILY.
 
'ScrapingData:
'
'Application.DisplayAlerts = False
'
'    SendKeys "^a"
'
'    Do While IE.Busy
'        Application.Wait DateAdd("s", 1, Now)
'    Loop
'
'    SendKeys "^c"
'
'    Do While IE.Busy
'        Application.Wait DateAdd("s", 1, Now)
'    Loop
'
'
'   Worksheets("Auto SSD Here").Cells.UnMerge
'   Worksheets("Auto SSD Here").Paste Destination:=Worksheets("Auto SSD Here").Range("A1")
'   Worksheets("Auto SSD Here").Cells.UnMerge
'
'    ' Clean up
'    'IE.Quit 'Closes the window,deactivate if you want to leave ssd window open to make sure pasted properly, etc...
'    Set IE = Nothing
'    Set objElement = Nothing
'    Set objCollection = Nothing
'
'    Application.StatusBar = ""
'    Application.DisplayAlerts = True
'
'Exit Sub
'ErrorMessageBox:
'    'MsgBox "Error, you are probably already signed in, OR the code is no longer working and needs attention, contact xxxxxxxxxxx."





'This is the new code for scraping that does not use the clipboard and sendkeys, the portion above has all been commented INOP so that if needed it can be re-activated
'This code below has a problem too though, it only populates into the A column, all data is in one column.... inquiring on Mr Excel

ScrapingData:

With IE
            x = .document.body.innertext
            x = Replace(x, Chr(10), Chr(13))
            x = Split(x, Chr(13))
            Worksheets("Auto SSD Here").Range("A1").Resize(UBound(x)) = Application.Transpose(x) 'WHAT DO THE LAST 3 LINES DO???

            .Quit
        End With

        'Cleaning up
         Set IE = Nothing
         Set objElement = Nothing
         Set objCollection = Nothing
         Application.StatusBar = ""
         Application.DisplayAlerts = True


'The problem with this code above is that every line of the SSD populates into only 1 column. It does get all of the data, and put each row on a different row (but also scrapes notes).
         
         
         
         
         
         
    End Sub
 

Some videos you may like

Excel Facts

Quick Sum
Select a range of cells. The total appears in bottom right of Excel screen. Right-click total to add Max, Min, Count, Average.

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
6,276
If the web data is in a HTML table element then you could extract the rows and columns like this:
Code:
    Dim table As Object, tRow As Object, tCell As Object
    Dim rowIndex
    Set table = IE.document.getElementsByTagName("TABLE")(0)   '0 = 1st table, 1 = 2nd table, etc.
    For Each tRow In table.Rows
        For Each tCell In tRow.Cells
            Worksheets("Auto SSD Here").Range("A1").Offset(tRow.rowIndex, tCell.cellIndex).Value = tCell.innerText
        Next
    Next
The above code assumes the data is in the first table. For a different table change the 0 in the ("TABLE")(0) line. You can determine the correct table by looking at the HTML source, or trial and error by changing the 0 to 1, 2, 3, etc.
 

Kc7487

New Member
Joined
Mar 19, 2016
Messages
20
Thanks so much John_w!

I imputed your code instead of the part of my code that was giving me trouble. This seems MUCH closer. Unfortunately though, I am still getting an error. It actually does pull 3 rows completely successfully with all 31 columns. Then it pulls a 4th row with only 22 columns, and the rest of the 500 or so rows it is not pulling up at all (I assume because the error is stopping it).

It errors out on this line inside of the loop:
Code:
Worksheets("Auto SSD Here").Range("A1").Offset(tRow.rowIndex, tCell.cellIndex).Value = tCell.innerText
Inside of the other line of code where you specify which table to grab, I tried it with the 0, with a 1, then a 2, then a 3. It seems 3 is the table I need. (1 and 2 were some header type things, 4,5,6 were extraneous tables that are NOT the missing rows, and not information I actually need. Maybe it would somehow help to just pull ALL tables, but somehow I doubt it.....)

I don't know if this helps, but I did read the HTML source. I'm not great at reading this stuff, but it looks like when the coding gets to the table I want, there is a TON of lines of code. All of the lines of code look pretty much the same, except for the individual content of each individual cell. It looks as though each new row starts with a TR and then end with a /TR, and then every new column in that row starts with a TD, and then ends with a /TD. I would simply post exactly what page I'm trying to scrape, but of course it is proprietary.

I was wondering John_w if you (or anybody else too for that matter?) might have some insight on this issue? Thanks!
 

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
6,276
What is the error?

Is the table being loaded/generated dynamically? It could be that the table is not completely loaded when the loop is running. In that case the code should wait until the table is completely loaded before looping through the rows and columns. However, without seeing the HTML I can't really help further.

It looks as though each new row starts with a TR and then end with a /TR, and then every new column in that row starts with a TD, and then ends with a /TD.
That is a HTML table structure. TR tags are the rows and within each row there are TD tags for the columns.
 

Kc7487

New Member
Joined
Mar 19, 2016
Messages
20
Oops! I thought I posted the error, sorry about that. The error which Excel is throwing on the line I specified earlier (...Worksheets("Auto SSD Here").Range("A1").Offset(tRow.rowIndex, tCell.cellIndex).Value = tCell.innerText...) is as follows:

..."Run-time error '1004': Application-defined or object-defined error"...

In terms of the table being loaded/generated dynamically, I am uncertain of what this means exactly, however I do have the internet explorer object window visible, and can see the entirety of the table loads up and displays inside the window at the same time, (so I believe it should all be completely loaded while the loop is running???)

I wanted to like to experiment with simply skipping the error, and trying to load whatever cells come next in the loop, but the coding I have been adding (On Error Resume Next) does not seem to fit this particular situation, it still errors the same way as before. Any idea what error handling procedure I could use to try and accomplish skipping the error to see if I still get the proper results?

Or I suppose even better yet, any idea on how I might try to properly resolve the error with this additional information? Or any other information I should try to figure out that could help? (I realize of course it would be easier given the HTML code, my apologies, I know it makes it much more broad of an issue to solve, I wish I could just post it :-/).
 

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
6,276
I don't know why that error would occur there. Perhaps you could check the values of tRow.rowIndex, tCell.cellIndex and tCell.innerText to see if they are sensible values.

Sorry, but I can't help further without the URL and access to the page.
 

Kc7487

New Member
Joined
Mar 19, 2016
Messages
20
Thank you for the remarks John_w, I have good news! After working on this for a while. I got it to work! I will post the code here just in case anybody is looking for it. It actually signs into the website too, and then pulls the table. MAKE SURE IF YOU USE THIS CODE THAT YOU CHANGE THE USERNAME AND PASSWORD HTML NAMES AND SUBMIT BUTTON HTML NAME TO WHATEVER THE HTML CALLS THEM ON YOUR PAGE, VERY LIKELY IT WILL NOT BE THE SAME AS MINE. I ended up scraping ALL tables, and then use excel functions to filter down to what I need. This seemed to work better for my case. I believe (but am not certain at all) that I may have had other tables INSIDE OF the table I wanted? And possibly this was somehow causing an error? In any case, this worked for me.

Code:
Private Sub CommandButton1_Click()


    Dim IE As Object
    Dim doc As Object
    Dim strURL As String
    Dim USERNAMEinput As String
    Dim PASSWORDinput As String
    
Application.DisplayAlerts = False
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False

'On Error GoTo CleanUp 'going to clean up due to manual calculation still set and causes errors that break sheet with NA's until forced recalculation

' Statusbar
        Application.StatusBar = "CODE is loading. Please wait, this should take a minute or so..."

 Worksheets("Auto SSD Here").Cells.ClearContents


    strURL = Sheets("URL Reference").Range("A30").Value 'THIS SETS THE URL, SWITCH TO HARD CODED URL, OR SWITCH REFERENCE TO THE CELL THAT CONTINES YOUR URL

    Set IE = CreateObject("InternetExplorer.Application")
    With IE
         .Visible = False 'Make this true to see the internet explorer window excel is using, False to see nothing

        .navigate strURL
        Do Until .ReadyState = 4: DoEvents: Loop
        Do While .Busy: DoEvents: Loop
        
'Resume SignIn
SignIn:
            
            
            'on error skip to scraping data because the error is likely that the internet explorer object is already signed in
  On Error GoTo ScrapingData
  
  
  
    
    ' Wait while IE loading...
    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop
    
    
             'the line below is working fine, take out the ' to login hard coded again, but now using a user input box to pass the value to the login form
             'IE.Document.getElementByID("Ecom_User_ID").Value = "HARD CODED USERNAME GOES HERE"
              IE.document.getElementByID("Ecom_User_ID").Value = Range("A1")
              'above will try to input A1 in user ID, keeping it in simply because if it is already signed in, it will fail, and error to scraping data BUT CHANGE THIS FOR YOUR URL HTML, BECAUSE IT IS PROBABLY NOT CALLED ECOMUSERID FOR YOU
              
              
              USERNAMEinput = Application.InputBox("Enter your ID Number/User Name:", "Input Box Text", Type:=2)
               IE.document.getElementByID("Ecom_User_ID").Value = USERNAMEinput
              'CHANGE ABOVE FOR YOUR URL HTML
    
      ' Wait while IE loading...
    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop
    
              
              
              PASSWORDinput = Application.InputBox("Enter your Password:", "Input Box Text", Type:=2)
              IE.document.getElementByID("password-password").Value = PASSWORDinput
              'CHANGE ABOVE FOR YOUR URL HTML
              
             
      ' Wait while IE loading...
    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop
    
    
    
            IE.document.getElementByID("loginButton").Click
            'CHANGE ABOVE FOR YOUR URL HTML
            
            
 
      ' Wait while IE loading...
    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop
 
 

            
ScrapingData:
Resume ScrapingData1
ScrapingData1:
            
            
                Set doc = IE.document
                
                GetAllTables doc 'this calls the entire scraping macro below

                .Quit
            End With
            
            
            'THIS LINE ALERTS THE USER THAT LOGIN FAILED (IT CHECKS TO SEE IF ROW 8 IS BLANK BECAUSE ROW 8 SHOULD CONTAIN A TABLE TITLE)
'CHANGE THIS FOR YOUR CODE BECAUSE ROW 8 MAY NOT ALWAYS CONTAIN SOMETHING, BUT THIS WAS THE BEST WAY I COULD FIGURE OUT HOW TO ALERT THE USER THAT THE SIGN IN PROCESS FAILED AND SO SCRAPING WRONG INFO
            If ThisWorkbook.Worksheets("Auto SSD Here").Range("A8").Value = "" Then
                MsgBox "Your login credentials were incorrect, try logging in again. (If you are SURE you logged in correctly, then the code might need attention, contact ME.)"
            End If
            
CleanUp:
 
    ' Clean up
    Application.StatusBar = ""
    Application.DisplayAlerts = True
    Application.Calculation = xlCalculationAutomatic
    Application.EnableEvents = True
    Application.ScreenUpdating = True

            
            
            
            
            
            
        End Sub
        
        
        
        

        Sub GetAllTables(doc As Object)

             ' get all the tables from a webpage document, doc, and put them in a new worksheet

            Dim ws As Worksheet
            Dim rng As Range
            Dim tbl As Object
            Dim rw As Object
            Dim cl As Object
            Dim tabno As Long
            Dim nextrow As Long
            Dim I As Long
            
            
Application.DisplayAlerts = False
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual



            Set ws = Worksheets("Auto SSD Here")

            For Each tbl In doc.getElementsByTagName("TABLE")
                tabno = tabno + 1
                nextrow = nextrow + 1
                Set rng = ws.Range("B" & nextrow)
                rng.Offset(, -1) = "Table " & tabno
                For Each rw In tbl.Rows
                    For Each cl In rw.Cells
                        rng.Value = cl.outerText
                        Set rng = rng.Offset(, 1)
                        I = I + 1
                    Next cl
                    nextrow = nextrow + 1
                    Set rng = rng.Offset(1, -I)
                    I = 0
                'below I am trying to insert a status bar update for every new row
'CHANGE THIS TO SUIT YOUR SHEET SHEET THOUGH, BECAUSE THIS IS ASSUMING THERE ARE ABOUT 5-6 HUNDRED ROWS TO SCRAPE. I WASN'T SURE HOW TO MAKE THE MACRO COUNT THE ROWS, AND THEN COMPARE THE REAL ROWS
'AGAINST THE COMPLETED ROWS, SO INSTEAD IT IS AN ESTIMATE. MACRO MAY FINISH BEFORE REACHING 100% OR AFTER REACHING 120% OR SO IN MY CASE, BUT THIS IS OKAY FOR MY PURPOSES.
                Application.StatusBar = "Approx. " & nextrow / 5.5 & "% complete."

                
                Next rw
                
            Next tbl

            'ws.Cells.ClearFormats

        End Sub
 

sylvesterbetta

New Member
Joined
Nov 29, 2018
Messages
12
If the web data is in a HTML table element then you could extract the rows and columns like this:
Code:
    Dim table As Object, tRow As Object, tCell As Object
    Dim rowIndex
    Set table = IE.document.getElementsByTagName("TABLE")(0)   '0 = 1st table, 1 = 2nd table, etc.
    For Each tRow In table.Rows
        For Each tCell In tRow.Cells
            Worksheets("Auto SSD Here").Range("A1").Offset(tRow.rowIndex, tCell.cellIndex).Value = tCell.innerText
        Next
    Next
The above code assumes the data is in the first table. For a different table change the 0 in the ("TABLE")(0) line. You can determine the correct table by looking at the HTML source, or trial and error by changing the 0 to 1, 2, 3, etc.
Hi John,

Will like to check how should I chng the above coding so that I can get a specific row of the table.

Please assist.

Many thanks.
 

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
6,276
Code:
    Set table = IE.document.getElementsByTagName("TABLE")(0)   '0 = 1st table, 1 = 2nd table, etc.
    Set tRow = table.Rows(0)  '0 = 1st row, 1 = 2nd row
    For Each tCell In tRow.Cells
        Worksheets("Auto SSD Here").Range("A1").Offset(0, tCell.cellIndex).Value = tCell.innerText
    Next
 

sylvesterbetta

New Member
Joined
Nov 29, 2018
Messages
12
Code:
    Set table = IE.document.getElementsByTagName("TABLE")(0)   '0 = 1st table, 1 = 2nd table, etc.
    Set tRow = table.Rows(0)  '0 = 1st row, 1 = 2nd row
    For Each tCell In tRow.Cells
        Worksheets("Auto SSD Here").Range("A1").Offset(0, tCell.cellIndex).Value = tCell.innerText
    Next
Hi,

will like to check how can I change the coding so as that I can a specific column of a table.

Please assist.
Thanks
 

Watch MrExcel Video

Forum statistics

Threads
1,099,740
Messages
5,470,463
Members
406,701
Latest member
Mark Rob

This Week's Hot Topics

Top