Extract data from webpage using IE

somnath_it2006

Well-known Member
Joined
Apr 11, 2009
Messages
574
Hi All,

From last few days I was working on website crawlers (Sorry I don't know the exact word).

By using some references I used IE control for extract data from website as per form submitting and etc..

And this is working very effectively in Data Extraction.

Here I am just sharing that methods...

References Needs to add

1. Microsoft Internet Control
2. Microsoft HTML Object Liabrary


Declarations
Code:
    Dim ie As InternetExplorer
    Dim HTMLdoc As HTMLDocument
    Dim HTMLlinks As HTMLAnchorElement

Navigate URL in IE
Code:
    ' Navigate the web
    ie.Navigate "https://www.xyz.com"
    
    ie.Visible = True
    
    'Loop unitl ie page is fully loaded
    Do Until ie.ReadyState = READYSTATE_COMPLETE
        DoEvents
    Loop
    Do Until ie.Document.ReadyState = "complete"
        DoEvents
    Loop

Extract All the hyperlinks from Webpage
Code:
    ' Set document object
    Set HTMLdoc = ie.Document
    
    ' Loop thru each link
    For Each HTMLlinks In HTMLdoc.Links
        
        ' Hyperlink
        MsgBox HTMLlinks.href
        
        ' For click on Hyperlink
        HTMLlinks.Click
        
        ' Check all the propartiese of "HTMLlinks"
        
    Next HTMLlinks

Get HTML Source data of Webpage
Code:
    ' Many times you were no able to see the HTML source data
    ' In this case you can extract using innerHTML object
    
    ' Text file created using Scripting Runtime refrence
    
    Dim EHTML
    
    ' Set object
    Set EHTML = ie.Document.all.Item
    
    
    Dim FSO As FileSystemObject
    Dim FSOFile As TextStream
    
    Set FSO = New FileSystemObject
    Set FSOFile = FSO.OpenTextFile("c:\WriteTest.txt", 2, True)
    
    FSOFile.Write EHTML.innerHTML
    
    FSOFile.Close

Extract values from form control or assign values to Form control
Code:
   ' Text box
    ie.Document.all.Item("cmpnyID").Value = Range("A1").Value
    
    ' or
    
    ie.Document.getElementById("cmpnyID").Value = Range("A1").Value
    
    
    
    ' Click on button
    ie.Document.getElementById("Default").Click
    
    
    ' You can get this "Element Id" or "Item ID" from source code

Get all the form controls from webpage
Code:
    'loop forms
    For x = 0 To ie.Document.forms - 1
        
        StrString = ie.Document.Text
        
        'loop items in form
        For i = 0 To ie.Document.forms(x).Length - 1
        
            ' name and type of the item
            MsgBox ie.Document.forms(x)(i).Name & " " & ie.Document.forms(x)(i).Type
                  
            'if it is a submit button, click it
            If ie.Document.forms(x)(i).Type = "submit" Then
                ie.Document.forms(x)(i).Click
            End If
    
        Next i
    
    Next x

Copy whole webpage on excel sheet
Code:
    ie.ExecWB 17, 0
    ie.ExecWB 12, 2
    
    ActiveSheet.PasteSpecial Format:="HTML", link:=False, DisplayAsIcon:=False
    
    Application.CutCopyMode = False

For more references please check this useful links-

http://www.tushar-mehta.com/publish_train/xl_vba_cases/vba_web_pages_services/index.htm
http://www.xtremevbtalk.com/showthread.php?t=79552
http://www.vbforums.com/showthread.php?t=406671
http://www.andreavb.com/forum/viewtopic_6988.html
http://forums.devshed.com/visual-basic-programming-52/the-web-browser-control-tips-and-more-345993.html
http://www.mrexcel.com/forum/showthread.php?t=449840
http://www.mrexcel.com/forum/showthread.php?p=2240244#post2240244


If you know something more please add to this thread...
 
Last edited:

smartguy

Well-known Member
Joined
Jul 14, 2009
Messages
768
How can i use Loop....


Hi All,

From last few days I was working on website crawlers (Sorry I don't know the exact word).

By using some references I used IE control for extract data from website as per form submitting and etc..

And this is working very effectively in Data Extraction.

Here I am just sharing that methods...

References Needs to add

1. Microsoft Internet Control
2. Microsoft HTML Object Liabrary


Declarations
Code:
    Dim ie As InternetExplorer
    Dim HTMLdoc As HTMLDocument
    Dim HTMLlinks As HTMLAnchorElement

Navigate URL in IE
Code:
    ' Navigate the web
    ie.Navigate "https://www.xyz.com"
    
    ie.Visible = True
    
    'Loop unitl ie page is fully loaded
    Do Until ie.ReadyState = READYSTATE_COMPLETE
        DoEvents
    Loop
    Do Until ie.Document.ReadyState = "complete"
        DoEvents
    Loop

Extract All the hyperlinks from Webpage
Code:
    ' Set document object
    Set HTMLdoc = ie.Document
    
    ' Loop thru each link
    For Each HTMLlinks In HTMLdoc.Links
        
        ' Hyperlink
        MsgBox HTMLlinks.href
        
        ' For click on Hyperlink
        HTMLlinks.Click
        
        ' Check all the propartiese of "HTMLlinks"
        
    Next HTMLlinks

Get HTML Source data of Webpage
Code:
    ' Many times you were no able to see the HTML source data
    ' In this case you can extract using innerHTML object
    
    ' Text file created using Scripting Runtime refrence
    
    Dim EHTML
    
    ' Set object
    Set EHTML = ie.Document.all.Item
    
    
    Dim FSO As FileSystemObject
    Dim FSOFile As TextStream
    
    Set FSO = New FileSystemObject
    Set FSOFile = FSO.OpenTextFile("c:\WriteTest.txt", 2, True)
    
    FSOFile.Write EHTML.innerHTML
    
    FSOFile.Close

Extract values from form control or assign values to Form control
Code:
   ' Text box
    ie.Document.all.Item("cmpnyID").Value = Range("A1").Value
    
    ' or
    
    ie.Document.getElementById("cmpnyID").Value = Range("A1").Value
    
    
    
    ' Click on button
    ie.Document.getElementById("Default").Click
    
    
    ' You can get this "Element Id" or "Item ID" from source code

Get all the form controls from webpage
Code:
    'loop forms
    For x = 0 To ie.Document.forms - 1
        
        StrString = ie.Document.Text
        
        'loop items in form
        For i = 0 To ie.Document.forms(x).Length - 1
        
            ' name and type of the item
            MsgBox ie.Document.forms(x)(i).Name & " " & ie.Document.forms(x)(i).Type
                  
            'if it is a submit button, click it
            If ie.Document.forms(x)(i).Type = "submit" Then
                ie.Document.forms(x)(i).Click
            End If
    
        Next i
    
    Next x

Copy whole webpage on excel sheet
Code:
    ie.ExecWB 17, 0
    ie.ExecWB 12, 2
    
    ActiveSheet.PasteSpecial Format:="HTML", link:=False, DisplayAsIcon:=False
    
    Application.CutCopyMode = False

For more references please check this useful links-

http://www.tushar-mehta.com/publish_train/xl_vba_cases/vba_web_pages_services/index.htm
http://www.xtremevbtalk.com/showthread.php?t=79552
http://www.vbforums.com/showthread.php?t=406671
http://www.andreavb.com/forum/viewtopic_6988.html
http://forums.devshed.com/visual-basic-programming-52/the-web-browser-control-tips-and-more-345993.html
http://www.mrexcel.com/forum/showthread.php?t=449840
http://www.mrexcel.com/forum/showthread.php?p=2240244#post2240244


If you know something more please add to this thread...
 

phient

New Member
Joined
May 17, 2016
Messages
1
Hi,

I'm trying to automate the downloading of an Excel file (save as) using VBA from a secure website using IE. I have automated the navigation but can't seem to select the "Save As" button on the dialog box. Can anyone help?

Thanks,
Phient


Hi All,

From last few days I was working on website crawlers (Sorry I don't know the exact word).

By using some references I used IE control for extract data from website as per form submitting and etc..

And this is working very effectively in Data Extraction.

Here I am just sharing that methods...

References Needs to add

1. Microsoft Internet Control
2. Microsoft HTML Object Liabrary


Declarations
Code:
    Dim ie As InternetExplorer
    Dim HTMLdoc As HTMLDocument
    Dim HTMLlinks As HTMLAnchorElement

Navigate URL in IE
Code:
    ' Navigate the web
    ie.Navigate "https://www.xyz.com"
    
    ie.Visible = True
    
    'Loop unitl ie page is fully loaded
    Do Until ie.ReadyState = READYSTATE_COMPLETE
        DoEvents
    Loop
    Do Until ie.Document.ReadyState = "complete"
        DoEvents
    Loop

Extract All the hyperlinks from Webpage
Code:
    ' Set document object
    Set HTMLdoc = ie.Document
    
    ' Loop thru each link
    For Each HTMLlinks In HTMLdoc.Links
        
        ' Hyperlink
        MsgBox HTMLlinks.href
        
        ' For click on Hyperlink
        HTMLlinks.Click
        
        ' Check all the propartiese of "HTMLlinks"
        
    Next HTMLlinks

Get HTML Source data of Webpage
Code:
    ' Many times you were no able to see the HTML source data
    ' In this case you can extract using innerHTML object
    
    ' Text file created using Scripting Runtime refrence
    
    Dim EHTML
    
    ' Set object
    Set EHTML = ie.Document.all.Item
    
    
    Dim FSO As FileSystemObject
    Dim FSOFile As TextStream
    
    Set FSO = New FileSystemObject
    Set FSOFile = FSO.OpenTextFile("c:\WriteTest.txt", 2, True)
    
    FSOFile.Write EHTML.innerHTML
    
    FSOFile.Close

Extract values from form control or assign values to Form control
Code:
   ' Text box
    ie.Document.all.Item("cmpnyID").Value = Range("A1").Value
    
    ' or
    
    ie.Document.getElementById("cmpnyID").Value = Range("A1").Value
    
    
    
    ' Click on button
    ie.Document.getElementById("Default").Click
    
    
    ' You can get this "Element Id" or "Item ID" from source code

Get all the form controls from webpage
Code:
    'loop forms
    For x = 0 To ie.Document.forms - 1
        
        StrString = ie.Document.Text
        
        'loop items in form
        For i = 0 To ie.Document.forms(x).Length - 1
        
            ' name and type of the item
            MsgBox ie.Document.forms(x)(i).Name & " " & ie.Document.forms(x)(i).Type
                  
            'if it is a submit button, click it
            If ie.Document.forms(x)(i).Type = "submit" Then
                ie.Document.forms(x)(i).Click
            End If
    
        Next i
    
    Next x

Copy whole webpage on excel sheet
Code:
    ie.ExecWB 17, 0
    ie.ExecWB 12, 2
    
    ActiveSheet.PasteSpecial Format:="HTML", link:=False, DisplayAsIcon:=False
    
    Application.CutCopyMode = False

For more references please check this useful links-

VBA web services
Need Help - "click" submit on web page using VB Web Browser control - Xtreme Visual Basic Talk
vb web browser tutorial-VBForums
Web Browser control - AndreaVB Visual Basic and VB.NET source code discussion board
http://forums.devshed.com/visual-basic-programming-52/the-web-browser-control-tips-and-more-345993.html
VBA Into IE Automation problem
How to submit a web form without a name?


If you know something more please add to this thread...
 

Forum statistics

Threads
1,082,276
Messages
5,364,195
Members
400,786
Latest member
ismi88

Some videos you may like

This Week's Hot Topics

  • populate from drop list with multiple tables
    Hi All, i have a drop list that displays data, what i want is when i select one of those from the list to populate text from different tables on...
  • Find list of words from sheet2 in sheet1 before a comma and extract text vba
    Hi Friends, Trying to find the solution on my task. But did not find suitable one to the need. Here is my query and sample file with details...
  • Dynamic Formula entry - VBA code sought
    Hello, really hope one of you experts can help with this - i've spent hours on this and getting no-where. .I have a set of data (more rows than...
  • Listbox Header
    Have a named range called "AccidentsHeader" Within my code I have: [CODE]Private Sub CommandButton1_Click() ListBox1.RowSource =...
  • Complex Heat Map using conditional formatting
    Good day excel world. I have a concern. Below link have a list of countries that carries each country unique data. [URL...
  • Conditional formatting
    Hi good morning, hope you can help me please, I have cells P4:P54 and if this cell is equal to 1 then i want row O to say "Fully Utilised" and to...
Top