Web Scraping using VBA

atr140

Board Regular
Joined
Nov 27, 2012
Messages
72
All,

I am trying to scrape some data from a password protected website. A quick search provided some code to login to the website. After the login, I am trying to grab some specific information; however, all of my attempts are coming up short. I was hoping to get some help understanding the HTML. Some research tells me that the element I am trying to access is buried in nested tables and forms. I am having trouble understanding how to access the elements. Using internet explorer, I used F12 to access the HTML code and then Ctrl + B to inspect the element. This provided me with what I thought would be enough but I am not getting anywhere.

The dropdown I am trying to access is shown below. When I hover over the tabs, several show ID names an several show class names. Others such as 'table' and 'tr' are just labeled as elements with no class or ID.

My assumption was I could get through the nesting to the bottom with something like:

VBA Code:
 Set eleBdy = HTMLDoc.getElementById("Bdy")
 Set eledataView = eleBdy.getElementById("dataView")
 Set cHeader = eledataView.getElementsByClassName("cheader")

It doesn't seem to be working. Thanks in advanced for the help. Please let me know if any other information is required!



HTML.JPG








Reference: How to Use Excel VBA to Log-In to a Website -

VBA Code:
Dim HTMLDoc As HTMLDocument
Dim oBrowser As InternetExplorer
Sub Website_Login_Test()

Dim oHTML_Element As IHTMLElement
Dim sURL As String

On Error GoTo Err_Clear
sURL ="[URL='https://www.examplewebsite.co.uk/login.aspx'][U]https://www.examplewebsite.co.uk/login.aspx[/U][/URL]"
Set oBrowser = New InternetExplorer
oBrowser.Silent = True
oBrowser.timeout = 60
oBrowser.navigate sURL
oBrowser.Visible = True

Do
' Wait till the Browser is loaded
Loop Until oBrowser.readyState = READYSTATE_COMPLETE

Set HTMLDoc = oBrowser.Document

HTMLDoc.all.loginID.Value = "exampleusername"
HTMLDoc.all.Password.Value = "examplepassword"

For Each oHTML_Element In HTMLDoc.getElementsByTagName("input")
If oHTML_Element.Type = "submit"
_Then oHTML_Element.Click: Exit For
Next

 Set eleBdy = HTMLDoc.getElementById("Bdy")
 Set eledataView = eleBdy.getElementById("dataView")
 Set cHeader = eledataView.getElementsByClassName("cheader")


' oBrowser.Refresh ' Refresh If Needed
Err_Clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
End Sub

HTML:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">


<HTML>





 
 



<HEAD>             


<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
<META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css">
<META HTTP-EQUIV="X-UA-Compatible" CONTENT="IE=edge">
<META NAME="GENERATOR" CONTENT="IBM Software Development Platform">
<META NAME="keywords" CONTENT="BMS2, PA Department of Transportation, Registration, Doing Business">
<META NAME="description" CONTENT="Bridge Management System">


<LINK rel="stylesheet" type="text/css" href="css/body.css">
<LINK rel="stylesheet" type="text/css" href="css/calendar.css">
<LINK rel="stylesheet" type="text/css" href="css/menu.css">
<LINK rel="stylesheet" type="text/css" href="css/banner.css">
<LINK rel="stylesheet" type="text/css" href="css/toolbar.css">
<LINK rel="stylesheet" type="text/css" href="css/bms.css">
<LINK rel="stylesheet" type="text/css" href="https://fonts.googleapis.com/css?family=Montserrat:400,500,900">
<link rel="stylesheet" type="text/css" href="https://fonts.googleapis.com/css?family=Roboto+Mono">
<script language=JavaScript src="javascript/common.js"></script>
<script language=JavaScript src="javascript/buttonhandler.js"></script>
<script language=JavaScript src="javascript/CascadeMenu.js"></script>
<script language=JavaScript src="javascript/ECMSCommon.js"></script>
<script language=JavaScript src="javascript/validation.js"></script>
<script language=JavaScript src="javascript/BMSCommon.js"></script>
<script language=JavaScript src="javascript/ajax.js"></script>

<script language=JavaScript src="javascript/toolbar.js"></script>


    <TITLE>Structure Home</TITLE>
<script src="javascript/jquery-1.6.4.min.js"></script>
<script src="javascript/jquery-ui.min-1.8.16.js"></script>
<script src="javascript/json2.js"></script>
<script src="javascript/jquery.infoDialog.js"></script>
<script src="javascript/jquery.structureHome.js" charset="UTF-8"></script>
<link rel="stylesheet" href="css/jquery-ui-1.8.16.css" type="text/css">

</HEAD>








<BODY ID="Bdy"  OnLoad="bodyOnLoad();"  onResize="bodyOnResize();" leftmargin="0" rightmargin="0" topmargin="0"  marginwidth="0" marginheight="0">
<!-- 
<table cellpadding="0" cellspacing="0" border="0" width="100%" background="images/bms2Header1.gif">
        <tr height="61"><td>&nbsp;</td></tr>
</table>
-->

    
<style>
.dropdown {
    position: absolute;
    display: inline-block;
    float: right;
    padding-right: 25px;
    top: 55px;
    right: 5px;
}

.dropdown-content {
    display: none;
    position: absolute;
    background-color: #f9f9f9;
    min-width: 160px;
    box-shadow: 0px 8px 16px 0px rgba(0,0,0,0.2);
    padding: 12px 16px;
    z-index: 1;
}

.dropdown:hover .dropdown-content {
    display: block;
}

.dropdown-header {
    color: #FFFFFF;
}

.dropdown-header:hover {
    cursor: pointer;
}


.dropdown-section {
    border-style: solid;
    border-width: 0px 0px 1px 0px;
    /* font-weight: 500; */
}

</style>

    
<table class="banner">
    <colgroup>
        <col WIDTH='30%' >
        <col WIDTH='40%' >
        <col WIDTH='30%' >
    </colgroup>
    <tr class="banner">
    <td class="bannerleft">
    
        <div class="penndotLink">
            <a target="_blank" href="http://www.penndot.gov" class="banner" >
                <img class="banner" alt="PennDOT" title="PennDOT" src="images/PennDOTColorLogo2.gif" height="81">
            </a>
        </div>
        
        <div class="paLink">
            <a  target="_blank" href="http://www.pa.gov"  class="bannerlink" >
                <img class="banner" alt="Pennsylvania" title="Pennsylvania" src="images/pagov_logo_2.png" style="height:24px;">
            </a>
        </div>

         <div class="homelink">
            <a href="SVCOMLogin?action=init" class="bannerlink">
                <img class="banner" alt="BMS Home" title="BMS Home" src="images/house_logo_2.png" style="width:25px;">
            </a>
        </div>         

    </td>
        
    <td class = "bannercenter" align="center">       
        <div>
            <span style="color: #0B2B6E;font-size: 50px; font-weight: 900;">BMS</span><span style="color: #FFCC33;font-size: 50px; font-weight: 900;">2</span>
        </div>
        <div class="bannerenvironmenttext">
            
        </div>

    </td>
    
    <td class="bannerright">
        <div style="position: absolute; top: 10px; right: 5px; float: right;">

            <table style="width: 330px;">
                <tr><td colspan="4" width="200px"><span style="color: #FFFFFF;font-size: 18px;">BRIDGE MANAGEMENT SYSTEM</span> <span style="color: #FFCC33;font-size: 30px; line-height: .8;">2</span></td></tr>
            </table>

        </div>
        
        

                <div class="dropdown">
                     <span class="dropdown-header">Welcome Aaron Rovenolt</span>
                    <img class="banner" src="images/user_logo.png" style="width:20px; position:relative; bottom: -1px;">             
                      <div class="dropdown-content">
                          <p class="dropdown-section">Account Actions</p>
                          
                          
                          <!-- <p><a href="#">iForms Password</a></p> -->
                          <p><a href="SVCOMLogin?action=logout">Logout</a>
                          <p><br></p>
                          <p class="dropdown-section">User Information</p>
                          <p>Userid: arovenolt</p>
                          <p>District: </p>
                          <p>Email: arovenolt@larsondesigngroup.com</p>
                     </div>
                </div>
                
                
                
        
                    

    </td>
    
    </tr>
            

    </table>



 

    




<IFRAME id='button1_frame' class='button' src='javascript:false;' ></IFRAME>
<DIV ID='button1' class='button' >
    <DIV ID='buttonItem1_2' class='buttonItem'  title='Return to Structure Home' cmd="javascript:doPDButton(&quot;href&quot;,&quot;PDTagServlet?action=pop&value=2&quot;,&quot;&quot;)"  onBefore="checkSelectionConfirmation(&quot;&quot;,&quot;&quot;,&quot;&quot;)"  >
        <A href='' ID='buttonItem1_2_Link' onclick='return false;' class='buttonItem' >Structure Home</A>
    </DIV>
    <DIV ID='buttonItem1_3' class='buttonItem'  title='Return to Structure Home' cmd="javascript:doPDButton(&quot;href&quot;,&quot;PDTagServlet?action=pop&value=3&quot;,&quot;&quot;)"  onBefore="checkSelectionConfirmation(&quot;&quot;,&quot;&quot;,&quot;&quot;)"  >
        <A href='' ID='buttonItem1_3_Link' onclick='return false;' class='buttonItem' >Structure Home</A>
    </DIV>
</DIV>
<DIV ID='toolbarBar' class='toolbarBar' >
    <DIV ID='buttonBar1' class='toolbarButton'  menu='button1'  dropdownId='Toolbar_Arrow_1' title='Back' cmd="javascript:doPDButton(&quot;href&quot;,&quot;PDTagServlet?action=pop&value=2&quot;,&quot;&quot;)"  onBefore="checkSelectionConfirmation(&quot;&quot;,&quot;&quot;,&quot;&quot;)"  >
        <A href='' ID='buttonBar1_Link' onclick='return false;' class='toolbarButton' ><img src='images/i_back.gif' border=0 align='left'>Back&nbsp;<Span Id='Toolbar_Arrow_1' class='toolbarArrow'>6</Span>&nbsp;&nbsp;&nbsp;</A>
    </DIV>
    <DIV ID='buttonBar2' class='toolbarButton'  title='Save' cmd="javascript:doPDButton(&quot;submit&quot;,&quot;pdform&quot;,&quot;&quot;)"  onBefore="checkSelectionConfirmation(&quot;true&quot;,&quot;setAction('Save')&quot;,&quot;&quot;)"  >
        <A href='' ID='buttonBar2_Link' onclick='return false;' class='toolbarButton' ><img src='images/i_save.gif' align='left' >&nbsp;Save</A>
    </DIV>
    <DIV ID='buttonBar3' class='toolbarButton'  title='Print' cmd="javascript:doPDButton(&quot;window&quot;,&quot;PDTagServlet?action=printerFriendly&jspName=WEB-INF/jsp/INVstructureHome.jsp&quot;,&quot;&quot;)"  onBefore="checkSelectionConfirmation(&quot;&quot;,&quot;&quot;,&quot;&quot;)"  >
        <A href='' ID='buttonBar3_Link' onclick='return false;' class='toolbarButton' ><img src='images/i_printer.gif' border=0 align='left'></A>
    </DIV>
    <DIV ID='buttonBar4' class='toolbarButton'  title='Click here to lauch screen help.' cmd="javascript:doPDButton(&quot;script&quot;,&quot;openScreenHelpWindow('WEB-INF/jsp/INVstructureHome.jsp')&quot;,&quot;&quot;)"  onBefore="checkSelectionConfirmation(&quot;&quot;,&quot;&quot;,&quot;&quot;)"  >
        <A href='' ID='buttonBar4_Link' onclick='return false;' class='toolbarButton' ><img src='images/i_help.gif' border=0 align='left'>Help</A>
    </DIV>
</DIV>
<SCRIPT> InitToolbar(); </SCRIPT>


<!-- Include file for displaying current ECMSMessages -->









<div id="dataView" style="width:100%; height:100%; overflow:auto;">

<script language=JavaScript src="javascript/calendarControl.js"></script>
<table><tr><td>


<script type="text/javascript">
    function submitHeader(s){
            document.pdformheader.action.value = s;
            document.pdformheader.submit();
    }
</script>

<style>

span.validation_counts {
    color: white;
    border-radius: 5px;
    padding: 0px 6px 0px 6px;
    border-color:white;
    border-width: 2px;   
}

span.validation_failures {
    background: #A93226;
    padding: 0px 6px 1px 6px;
}

span.zero_validation_failures {
    color: #A93226;
    padding: 0px 6px 1px 6px;
}

span.validation_errors {
    background: #BA4A00;
}

span.zero_validation_errors {
    color: #BA4A00;
}

span.validation_warnings {
    background: #D4AC0D;
}

span.zero_validation_warnings {
    color: #D4AC0D;
}

span.validation_infos {
    background: #2E86C1;
}

span.zero_validation_infos {
    color: #2E86C1;
}

</style>

<form name="pdformheader" action = "SVCOMScreenHeader">
    <input type="hidden" name=action>
    <TABLE class="title">
        <colgroup>
            <col width="100%" class="left" />
              <col width="12%" class="center" />
        </colgroup>
        <TBODY>   
            <tr class="title">
                <td class="title">
                    Inventory - Structure Home<input type=hidden name=HEADER_TITLE value='Inventory - Structure Home'>
                    
                </td>
                <TD valign="bottom" bgcolor="#4791C5" nowrap>
                      
                        
                        
                            <A     class="qlinkshowhide" id="hqlinks"
                                href="javascript: toggleQlinks('hqlinks','qlinks')">
                                Hide Quick Links
                            </A>
                        
                    
                </TD>
            </tr>
        </TBODY>
    </TABLE>
    <TABLE class="cheader" >
        <colgroup>
            <col width="100%" class="center" />
        </colgroup>
        <TBODY>
            <TR>
                <TD style="padding: 5px;">
                    <TABLE class="body">
                        <COLGROUP width="10%"/>
                        <COLGROUP width="18%"/>
                        <COLGROUP width="10%"/>
                        <COLGROUP width="10%"/>
                        <COLGROUP width="9%"/>
                        <COLGROUP width="16%"/>
                        <COLGROUP width="4%"/>
                        <COLGROUP width="23%"/>
                        <TBODY>
                            <TR>                   
                                <TD class="labelright" nowrap>5A01 SR ID:</TD>
                                <TD class="dataleft"  nowrap>
                                    <input type='text' name='HEADER_BRIDGE_ID' value='08018704200018' onBlur='return true;'   onkeydown="return processEnterKey('findbysrid');" size='18' maxlength='14' class="PDTextbox"  >
                                    <input     id="findbysrid" class="goButton" align="middle" type="image" alt="Find Structure by SR ID" title="Find Structure by SR ID"
                                            src="images/go5.gif" name="sridgo" width="21" height="21"
                                            onclick="submitHeader('SHOWBYSRID');"/>
                                </TD>
                                <TD class="labelright" nowrap>5A03 BRKEY:</TD>
                                <TD class="dataleft"  nowrap>
                                    <input type='text' name='HEADER_BRKEY' value='6105' onBlur='return true;'   onkeydown="return processEnterKey('findbybrkey');" size='6' maxlength='6' class="PDTextbox"  >     
                                    <input     id="findbybrkey" class="goButton" align="middle" type="image" alt="Find Structure by BRKEY" title="Find Structure by BRKEY"
                                            src="images/go5.gif" name="brkeygo" width="21" height="21"
                                            onclick="submitHeader('SHOWBYBRKEY');"/>
                                </TD>
                                <TD class="labelright" nowrap>Agency ID:</TD>
                                <TD class="dataleft" nowrap>
                                    <input type='text' name='HEADER_AGENCY_ID' value='' onBlur='return true;'   onkeydown="return processEnterKey('findbyagid');" size='15' maxlength='15' class="PDTextbox"  >
                                    <input     id="findbyagid" class="goButton" align="middle" type="image" alt="Find Structure by Agency ID" title="Find Structure by Agency ID"
                                            src="images/go5.gif" name="agencyidgo" width="21" height="21"
                                            onclick="submitHeader('SHOWBYAGENCYID');"/>
                                </TD>
                                <TD class="labelright" nowrap>Go To:</TD>
                                <TD class="dataleft" nowrap>
                                    <select name=HEADER_CODE size=1  maxlength='30' class="PDDropdown" >
<option value='99' false>ADMIN CONFIGURATION</option>
<option value='52' false>Bulletin Board Message List</option>
<option value='98' false>Configuration - Element NBE</option>
<option value='82' false>Email Configuration</option>
<option value='83' false>Email Notifications</option>
<option value='78' false>FHWA Submittal</option>
<option value='79' false>Feature Manager</option>
<option value='5' false>Inspection - Agency Inspection</option>
<option value='10' false>Inspection - Comments</option>
<option value='32' false>Inspection - Fracture Critical</option>
<option value='3' false>Inspection - Load Rating</option>
<option value='59' false>Inspection - Misc</option>
<option value='2' false>Inspection - Ratings, Schedule</option>
<option value='63' false>Inspection - Safety Features</option>
<option value='62' false>Inspection - Signings</option>
<option value='57' false>Inspection - Signs and Lights</option>
<option value='15' false>Inspection - Underwater</option>
<option value='58' false>Inspection - Walls</option>
<option value='47' false>Inspection-Element List</option>
<option value='14' false>Inventory - Agency Bridge</option>
<option value='11' false>Inventory - BP Assignment</option>
<option value='7' false>Inventory - Design</option>
<option value='8' false>Inventory - Drawings Notes</option>
<option value='4' false>Inventory - Features Intersected</option>
<option value='6' false>Inventory - Inspection Planning</option>
<option value='55' false>Inventory - Misc</option>
<option value='64' false>Inventory - Paint</option>
<option value='43' false>Inventory - Posting</option>
<option value='61' false>Inventory - Risk Assessment</option>
<option value='56' false>Inventory - Signs and Lights</option>
<option value='84' false>Inventory - Structure Group</option>
<option value='1' selected true>Inventory - Structure Home</option>
<option value='9' false>Inventory - Structure Units</option>
<option value='20' false>Inventory - Tunnel</option>
<option value='54' false>Inventory - Walls</option>
<option value='66' false>Other - APRAS Data</option>
<option value='41' false>Other - Maintenance - Completed Maintenance</option>
<option value='42' false>Other - Maintenance - Proposed  Maintenance</option>
<option value='53' false>Other - Reports</option>
<option value='27' false>Other - Structure EDMS Document List</option>
<option value='81' false>Other - Validate Structure</option>
<option value='75' false>Special Permission</option>
<option value='80' false>Validation Expressions</option>
</select>
 

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
You probably need to switch frames in order to click certain text boxes and buttons. They may seem like they would be in the same frame, but they are not. Here is a Google Chrome app that helped me figure out which frame I am in. SideeX - Smart Record-Playback Browser Automation (I know you are using IE). If you have Google Chrome, download the Sideex app and it will record every button you click and which frame you are in.
 
Upvote 0
Thanks for the reply... Unfortunately our IT department doesn't allow us to use Chrome. I will ask, but I am not liking my chances.
 
Upvote 0
I was correct. IT wont allow Chrome or any 3rd party apps. You wouldn't happen to have an IE solution?
 
Upvote 0

Forum statistics

Threads
1,214,823
Messages
6,121,779
Members
449,049
Latest member
greyangel23

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top