Extract all the links of a website

Hello. Is it possible to slightly modify this macro in such a way that it extrapolates for each link listed in the column the link of the next level to the first that contains the word indicated as column header for each heading inserted?
Thank you
 
Upvote 0

Excel Facts

Format cells as date
Select range and press Ctrl+Shift+3 to format cells as date. (Shift 3 is the # sign which sort of looks like a small calendar).
Good evening.
Can you tell me if it is possible to use the macro to be able to extrapolate for each url indicated in the column to only the links that have within them the string indicated in the column header (from column b onwards)?
Thank you
 
Upvote 0
Hello

Place the desired string on the second row of each column.

Code:
Sub Read_Linksb()
Dim c00, myimg, i%, Lk, it, lr%, c%
With CreateObject("MSXML2.XMLHTTP")
    .Open "Get", "https://www.MrExcel.com", False
    .Send
    c00 = .responseText
End With
Cells(1, 1) = "List"
With CreateObject("htmlfile")
    .Body.innerHTML = c00
    i = 1
    Set myimg = .getElementsByTagName("img")
    For Each Lk In myimg
        i = i + 1
        Cells(i, 1) = Lk.src
    Next
    For Each it In .Links
        i = i + 1
        Cells(i, 1) = it.href
    Next
End With
For c = 2 To Cells(2, Columns.Count).End(xlToLeft).Column
    Cells(1, c) = Cells(1, 1)
    Cells(2, c) = "*" & Cells(2, c) & "*"
    Range("a1:a" & Range("a" & Rows.Count).End(xlUp).Row).AdvancedFilter _
    xlFilterCopy, Range(Cells(1, c), Cells(2, c)), Cells(3, c), False
Next
End Sub
 
Upvote 0
Buonasera

Code:
Sub Read_Linksb()
Dim c00, myimg, i%, Lk, it, lr%, c%, nu%
i = 1
For nu = 3 To Range("b" & Rows.Count).End(xlUp).Row ' start at B3
    With CreateObject("MSXML2.XMLHTTP")
        .Open "Get", Cells(nu, 2), False
        .Send
        c00 = .responseText
    End With
    Cells(1, 1) = "List"
    With CreateObject("htmlfile")
        .Body.innerHTML = c00
        Set myimg = .getElementsByTagName("img")
        For Each Lk In myimg
            i = i + 1
            Cells(i, 1) = Lk.src
        Next
        For Each it In .Links
            i = i + 1
            Cells(i, 1) = it.href
        Next
    End With
Next
For c = 3 To Cells(2, Columns.Count).End(xlToLeft).Column
    Cells(1, c) = Cells(1, 1)
    Cells(2, c) = "*" & Cells(2, c) & "*"
    Range("a1:a" & Range("a" & Rows.Count).End(xlUp).Row).AdvancedFilter _
    2, Range(Cells(1, c), Cells(2, c)), Cells(3, c), False
Next
End Sub
 
Upvote 0
I tried the macro. In essence it extrapolates links as the previous macro and then transfers only the links that contain the string in the header.
You could avoid the links coming out in the "a" column because if the urls are so many then the excel sheet would not be enough. At most you could transfer all the links extracted in the other sheet occupying the first column then the second etc.
Then, at the moment the macro is started, the domain level of the links to be transferred can be established (for example, only the links of the second or third level, etc.).


My final intention is to extract from the link the emails of the page to which the column titles refer.


Thank you
 
Upvote 0
I am not sure if I understand what you want. Do you need to explore all site levels?
This would mean accessing all subpages, until exhausting all internal site links.
It should be possible to list all site links, including subpages.
 
Upvote 0
It would be useful to have them all and subsequently to be able to decide what link the link inserts into the dedicated column at the correspondence of the string given.
So there will be: 1) the extraction of all links;
2) selection by the user of what link insertion (if the maximum level of domains and 6 and user chosen 2 in the columns will only be inserted into link to the second level).
Could you? Of course, the user could write it from 1 to Num Massimo.
 
Upvote 0
I would say it is tricky. I am having a busy week but will work on this as soon as possible...
 
Upvote 0

Forum statistics

Threads
1,214,630
Messages
6,120,634
Members
448,973
Latest member
ChristineC

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top