copy specific link from explorer to excel

nemadedurgesh

New Member
Joined
Dec 2, 2016
Messages
5
Hi,

I have created macro which copy links from HTML and paste in excel sheet. I need specific links to selected and others to be ignored. Below is the sample of HTML code and VBA which I have created. Can anybody please help me so that I can select specific links

From below HTML, I need links to be selected if h2 class="post-box-title"

==========================
HTML:
<article class="item-list">
         
                   <h2 class="post-box-title">
                             <a href="http://www.ingovtjob.com/upsc-recruitment-2016-17-notification-12-economics-officer-posts/">UPSC recruitment 2016-17 notification 12 economics officer posts</a>
                   </h2>
                  
                   <p class="post-meta">
===========================




Below is the VBA Code which selects all link.

===========================
Code:
Sub Button1_Click()
Dim IE As Object
Dim doc, element
Dim doc1, element1
Dim doc2, element2
Dim IEDoc
Dim d As Variant
Dim e As Variant
Dim f As Variant
 
Dim ElementCol As Object
Dim Link As Object
Dim erow As Long
 
    Set objWSS = CreateObject("WScript.Shell")
    Set IE = CreateObject("InternetExplorer.Application")
     
    With IE
            .Visible = True
            .navigate "http://www.ingovtjob.com"
        
Application.StatusBar = "Trying to go to website…"
DoEvents
'Loop
Set html = IE.document
'Display text of HTML document returned in a cell
Range("A1") = html.DocumentElement.innerHTML
Set ElementCol = html.getElementsByTagName("a")
 
'--------------------------------
For Each Link In ElementCol
 
            erow = Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
            Cells(erow, 1).Value = Link
            Cells(erow, 1).Columns.AutoFit
 
Next
'close down IE, reset status bar & turn on screenupdating
Set IE = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
 
 
End With
End Sub
===========================
 
Last edited by a moderator:

Excel Facts

Format cells as date
Select range and press Ctrl+Shift+3 to format cells as date. (Shift 3 is the # sign which sort of looks like a small calendar).

nemadedurgesh

New Member
Joined
Dec 2, 2016
Messages
5
below is html sample...

<article class="item-list">

<h2 class="post -box-title">
<a href="http://www.ingovt_job.com/upsc-recruitment-2016-17-notification-12-economics-officer-posts/">UPSC recruitment 2016-17 notification 12 economics officer posts</a>
</h2>

<p class="post-meta">
 
Last edited by a moderator:

Ombir

Active Member
Joined
Oct 1, 2015
Messages
433
Where these links exists on this page. Need some sample of the links ?
 

Ombir

Active Member
Joined
Oct 1, 2015
Messages
433
Try this:
Code:
Sub Ombir_03Dec2016()
Dim index       As Integer
Const pageno    As Integer = 3
Dim i           As Long
Dim ie          As InternetExplorer
Dim doc         As HTMLDocument
Dim url         As HTMLAnchorElement
Dim urlcoll     As Object
Dim data()      As String

Set ie = New InternetExplorer

index = 1
ReDim data(1 To 6000, 1 To 2)
weblink = "http://www.ingovtjob.com/"

Do While index <= pageno

index = index + 1

    With ie
        .Visible = True
        .Navigate weblink
        Do While .ReadyState <> 4: DoEvents: Loop
    End With
    
    Set doc = ie.Document
    Set urlcoll = doc.Links
    
    For Each url In urlcoll
        If url.parentElement.className = "post-box-title" Then
            i = i + 1
            data(i, 1) = url.innerText
            data(i, 2) = url.href
        End If
    Next
    weblink = "http://www.ingovtjob.com/" & "page/" & index
Loop
Range("A1:B1") = Array("Title", "Link")
Range("A2").Resize(i, 2) = data
ActiveSheet.UsedRange.Columns.AutoFit
ie.Quit
End Sub

Change this line in the above code
Code:
"Const pageno    As Integer = 3"
to number of pages you want to scrap for link. As of now I've included up to 3 pages.

I used early binding so you need reference below two libraries from VBE editor before running the code.

1. Microsoft HTML object library
2. Microsoft Internet controls

Regards,
Ombir
 

basit

New Member
Joined
Aug 26, 2017
Messages
1
Try this:
Code:
Sub Ombir_03Dec2016()
Dim index       As Integer
Const pageno    As Integer = 3
Dim i           As Long
Dim ie          As InternetExplorer
Dim doc         As HTMLDocument
Dim url         As HTMLAnchorElement
Dim urlcoll     As Object
Dim data()      As String

Set ie = New InternetExplorer

index = 1
ReDim data(1 To 6000, 1 To 2)
weblink = "http://www.ingovtjob.com/"

Do While index <= pageno

index = index + 1

    With ie
        .Visible = True
        .Navigate weblink
        Do While .ReadyState <> 4: DoEvents: Loop
    End With
    
    Set doc = ie.Document
    Set urlcoll = doc.Links
    
    For Each url In urlcoll
        If url.parentElement.className = "post-box-title" Then
            i = i + 1
            data(i, 1) = url.innerText
            data(i, 2) = url.href
        End If
    Next
    weblink = "[url=http://startjobs.pk/category/nts/]NTS Jobs 2017 in Pakistan www.nts.org.pk Apply Online Latest[/url]" & "page/" & index
Loop
Range("A1:B1") = Array("Title", "Link")
Range("A2").Resize(i, 2) = data
ActiveSheet.UsedRange.Columns.AutoFit
ie.Quit
End Sub

Change this line in the above code
Code:
"Const pageno    As Integer = 3"
to number of pages you want to scrap for link. As of now I've included up to 3 pages.

I used early binding so you need reference below two libraries from VBE editor before running the code.

1. Microsoft HTML object library
2. Microsoft Internet controls

Regards,
Ombir

great tips you've shared .
 

Watch MrExcel Video

Forum statistics

Threads
1,129,588
Messages
5,637,275
Members
416,963
Latest member
samfuge

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top