Extract data from webpage using IE

somnath_it2006

Well-known Member
Joined
Apr 11, 2009
Messages
574
Hi All,

From last few days I was working on website crawlers (Sorry I don't know the exact word).

By using some references I used IE control for extract data from website as per form submitting and etc..

And this is working very effectively in Data Extraction.

Here I am just sharing that methods...

References Needs to add

1. Microsoft Internet Control
2. Microsoft HTML Object Liabrary


Declarations
Code:
    Dim ie As InternetExplorer
    Dim HTMLdoc As HTMLDocument
    Dim HTMLlinks As HTMLAnchorElement


Navigate URL in IE
Code:
    ' Navigate the web
    ie.Navigate "https://www.xyz.com"
    
    ie.Visible = True
    
    'Loop unitl ie page is fully loaded
    Do Until ie.ReadyState = READYSTATE_COMPLETE
        DoEvents
    Loop
    Do Until ie.Document.ReadyState = "complete"
        DoEvents
    Loop


Extract All the hyperlinks from Webpage
Code:
    ' Set document object
    Set HTMLdoc = ie.Document
    
    ' Loop thru each link
    For Each HTMLlinks In HTMLdoc.Links
        
        ' Hyperlink
        MsgBox HTMLlinks.href
        
        ' For click on Hyperlink
        HTMLlinks.Click
        
        ' Check all the propartiese of "HTMLlinks"
        
    Next HTMLlinks


Get HTML Source data of Webpage
Code:
    ' Many times you were no able to see the HTML source data
    ' In this case you can extract using innerHTML object
    
    ' Text file created using Scripting Runtime refrence
    
    Dim EHTML
    
    ' Set object
    Set EHTML = ie.Document.all.Item
    
    
    Dim FSO As FileSystemObject
    Dim FSOFile As TextStream
    
    Set FSO = New FileSystemObject
    Set FSOFile = FSO.OpenTextFile("c:\WriteTest.txt", 2, True)
    
    FSOFile.Write EHTML.innerHTML
    
    FSOFile.Close


Extract values from form control or assign values to Form control
Code:
   ' Text box
    ie.Document.all.Item("cmpnyID").Value = Range("A1").Value
    
    ' or
    
    ie.Document.getElementById("cmpnyID").Value = Range("A1").Value
    
    
    
    ' Click on button
    ie.Document.getElementById("Default").Click
    
    
    ' You can get this "Element Id" or "Item ID" from source code


Get all the form controls from webpage
Code:
    'loop forms
    For x = 0 To ie.Document.forms - 1
        
        StrString = ie.Document.Text
        
        'loop items in form
        For i = 0 To ie.Document.forms(x).Length - 1
        
            ' name and type of the item
            MsgBox ie.Document.forms(x)(i).Name & " " & ie.Document.forms(x)(i).Type
                  
            'if it is a submit button, click it
            If ie.Document.forms(x)(i).Type = "submit" Then
                ie.Document.forms(x)(i).Click
            End If
    
        Next i
    
    Next x


Copy whole webpage on excel sheet
Code:
    ie.ExecWB 17, 0
    ie.ExecWB 12, 2
    
    ActiveSheet.PasteSpecial Format:="HTML", link:=False, DisplayAsIcon:=False
    
    Application.CutCopyMode = False


For more references please check this useful links-

http://www.tushar-mehta.com/publish_train/xl_vba_cases/vba_web_pages_services/index.htm
http://www.xtremevbtalk.com/showthread.php?t=79552
http://www.vbforums.com/showthread.php?t=406671
http://www.andreavb.com/forum/viewtopic_6988.html
http://forums.devshed.com/visual-ba...web-browser-control-tips-and-more-345993.html
http://www.mrexcel.com/forum/showthread.php?t=449840
http://www.mrexcel.com/forum/showthread.php?p=2240244#post2240244


If you know something more please add to this thread...
 
Last edited:

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.
How can i use Loop....


Hi All,

From last few days I was working on website crawlers (Sorry I don't know the exact word).

By using some references I used IE control for extract data from website as per form submitting and etc..

And this is working very effectively in Data Extraction.

Here I am just sharing that methods...

References Needs to add

1. Microsoft Internet Control
2. Microsoft HTML Object Liabrary


Declarations
Code:
    Dim ie As InternetExplorer
    Dim HTMLdoc As HTMLDocument
    Dim HTMLlinks As HTMLAnchorElement


Navigate URL in IE
Code:
    ' Navigate the web
    ie.Navigate "https://www.xyz.com"
    
    ie.Visible = True
    
    'Loop unitl ie page is fully loaded
    Do Until ie.ReadyState = READYSTATE_COMPLETE
        DoEvents
    Loop
    Do Until ie.Document.ReadyState = "complete"
        DoEvents
    Loop


Extract All the hyperlinks from Webpage
Code:
    ' Set document object
    Set HTMLdoc = ie.Document
    
    ' Loop thru each link
    For Each HTMLlinks In HTMLdoc.Links
        
        ' Hyperlink
        MsgBox HTMLlinks.href
        
        ' For click on Hyperlink
        HTMLlinks.Click
        
        ' Check all the propartiese of "HTMLlinks"
        
    Next HTMLlinks


Get HTML Source data of Webpage
Code:
    ' Many times you were no able to see the HTML source data
    ' In this case you can extract using innerHTML object
    
    ' Text file created using Scripting Runtime refrence
    
    Dim EHTML
    
    ' Set object
    Set EHTML = ie.Document.all.Item
    
    
    Dim FSO As FileSystemObject
    Dim FSOFile As TextStream
    
    Set FSO = New FileSystemObject
    Set FSOFile = FSO.OpenTextFile("c:\WriteTest.txt", 2, True)
    
    FSOFile.Write EHTML.innerHTML
    
    FSOFile.Close


Extract values from form control or assign values to Form control
Code:
   ' Text box
    ie.Document.all.Item("cmpnyID").Value = Range("A1").Value
    
    ' or
    
    ie.Document.getElementById("cmpnyID").Value = Range("A1").Value
    
    
    
    ' Click on button
    ie.Document.getElementById("Default").Click
    
    
    ' You can get this "Element Id" or "Item ID" from source code


Get all the form controls from webpage
Code:
    'loop forms
    For x = 0 To ie.Document.forms - 1
        
        StrString = ie.Document.Text
        
        'loop items in form
        For i = 0 To ie.Document.forms(x).Length - 1
        
            ' name and type of the item
            MsgBox ie.Document.forms(x)(i).Name & " " & ie.Document.forms(x)(i).Type
                  
            'if it is a submit button, click it
            If ie.Document.forms(x)(i).Type = "submit" Then
                ie.Document.forms(x)(i).Click
            End If
    
        Next i
    
    Next x


Copy whole webpage on excel sheet
Code:
    ie.ExecWB 17, 0
    ie.ExecWB 12, 2
    
    ActiveSheet.PasteSpecial Format:="HTML", link:=False, DisplayAsIcon:=False
    
    Application.CutCopyMode = False


For more references please check this useful links-

http://www.tushar-mehta.com/publish_train/xl_vba_cases/vba_web_pages_services/index.htm
http://www.xtremevbtalk.com/showthread.php?t=79552
http://www.vbforums.com/showthread.php?t=406671
http://www.andreavb.com/forum/viewtopic_6988.html
http://forums.devshed.com/visual-ba...web-browser-control-tips-and-more-345993.html
http://www.mrexcel.com/forum/showthread.php?t=449840
http://www.mrexcel.com/forum/showthread.php?p=2240244#post2240244


If you know something more please add to this thread...
 
Upvote 0
Hi,

I'm trying to automate the downloading of an Excel file (save as) using VBA from a secure website using IE. I have automated the navigation but can't seem to select the "Save As" button on the dialog box. Can anyone help?

Thanks,
Phient


Hi All,

From last few days I was working on website crawlers (Sorry I don't know the exact word).

By using some references I used IE control for extract data from website as per form submitting and etc..

And this is working very effectively in Data Extraction.

Here I am just sharing that methods...

References Needs to add

1. Microsoft Internet Control
2. Microsoft HTML Object Liabrary


Declarations
Code:
    Dim ie As InternetExplorer
    Dim HTMLdoc As HTMLDocument
    Dim HTMLlinks As HTMLAnchorElement


Navigate URL in IE
Code:
    ' Navigate the web
    ie.Navigate "https://www.xyz.com"
    
    ie.Visible = True
    
    'Loop unitl ie page is fully loaded
    Do Until ie.ReadyState = READYSTATE_COMPLETE
        DoEvents
    Loop
    Do Until ie.Document.ReadyState = "complete"
        DoEvents
    Loop


Extract All the hyperlinks from Webpage
Code:
    ' Set document object
    Set HTMLdoc = ie.Document
    
    ' Loop thru each link
    For Each HTMLlinks In HTMLdoc.Links
        
        ' Hyperlink
        MsgBox HTMLlinks.href
        
        ' For click on Hyperlink
        HTMLlinks.Click
        
        ' Check all the propartiese of "HTMLlinks"
        
    Next HTMLlinks


Get HTML Source data of Webpage
Code:
    ' Many times you were no able to see the HTML source data
    ' In this case you can extract using innerHTML object
    
    ' Text file created using Scripting Runtime refrence
    
    Dim EHTML
    
    ' Set object
    Set EHTML = ie.Document.all.Item
    
    
    Dim FSO As FileSystemObject
    Dim FSOFile As TextStream
    
    Set FSO = New FileSystemObject
    Set FSOFile = FSO.OpenTextFile("c:\WriteTest.txt", 2, True)
    
    FSOFile.Write EHTML.innerHTML
    
    FSOFile.Close


Extract values from form control or assign values to Form control
Code:
   ' Text box
    ie.Document.all.Item("cmpnyID").Value = Range("A1").Value
    
    ' or
    
    ie.Document.getElementById("cmpnyID").Value = Range("A1").Value
    
    
    
    ' Click on button
    ie.Document.getElementById("Default").Click
    
    
    ' You can get this "Element Id" or "Item ID" from source code


Get all the form controls from webpage
Code:
    'loop forms
    For x = 0 To ie.Document.forms - 1
        
        StrString = ie.Document.Text
        
        'loop items in form
        For i = 0 To ie.Document.forms(x).Length - 1
        
            ' name and type of the item
            MsgBox ie.Document.forms(x)(i).Name & " " & ie.Document.forms(x)(i).Type
                  
            'if it is a submit button, click it
            If ie.Document.forms(x)(i).Type = "submit" Then
                ie.Document.forms(x)(i).Click
            End If
    
        Next i
    
    Next x


Copy whole webpage on excel sheet
Code:
    ie.ExecWB 17, 0
    ie.ExecWB 12, 2
    
    ActiveSheet.PasteSpecial Format:="HTML", link:=False, DisplayAsIcon:=False
    
    Application.CutCopyMode = False


For more references please check this useful links-

VBA web services
Need Help - "click" submit on web page using VB Web Browser control - Xtreme Visual Basic Talk
vb web browser tutorial-VBForums
Web Browser control - AndreaVB Visual Basic and VB.NET source code discussion board
http://forums.devshed.com/visual-ba...web-browser-control-tips-and-more-345993.html
VBA Into IE Automation problem
How to submit a web form without a name?


If you know something more please add to this thread...
 
Upvote 0

Forum statistics

Threads
1,214,606
Messages
6,120,483
Members
448,967
Latest member
visheshkotha

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top