VBA HTML scraping problems

deltamu

New Member
Joined
Jun 14, 2018
Messages
10
Hi,
I would like to get data from a protected website. Unfortunately I cannot give you the link because it contains sensitive data.
I succeed in login via vba code, then I select the page where to search for data and I get the page with results. Below is an excerpt of the html code of the page containing a part of the results (I replaced sensitive data with fakes).
HTML:
...
<div class="main-content colore">
        <div class="centered">
            <table summary="List" width="60%">   
                   <tbody><tr>
                    <th scope="col">code</th>                       
                    <th scope="col">date</th>
                    <th scope="col"></th>                                                               
                </tr>
                <tr class="">
                    <td class="centered">025304812454</td>                       
                    <td class="centered">11/06/2020</td>
                    <td>
                        <a href="/link/search?path=5:rows:0:pdf&amp;interface=ILinkListener">
                            <img class="no-border" src="/link/images/pdf.gif" alt="PDF form">
                        </a>
                    </td>                       
                   </tr><tr class="odd">
                    <td class="centered">02223348970</td>                       
                    <td class="centered">03/04/2019</td>
                    <td>
                        <a href="/link/search?path=5:rows:1:pdf&amp;interface=ILinkListener">
                            <img class="no-border" src="/link/images/pdf.gif" alt="PDF form">
                        </a>
                    </td>                       
                   </tr><tr class="">
                    <td class="centered">54879898989</td>                       
                    <td class="centered">18/02/2019</td>
                    <td>
                        <a href="/link/searh?path=5:rows:2:pdf&amp;interface=ILinkListener">
                            <img class="no-border" src="/link/images/pdf.gif" alt="PDF form">
                        </a>
                    </td>                       
                   </tr><tr class="odd">
                    <td class="centered">458789898987</td>                       
                    <td class="centered">03/05/2017</td>
                    <td>
                        <a href="/link/searh?path=5:rows:3:pdf&amp;interface=ILinkListener">
                            <img class="no-border" src="/link/images/pdf.gif" alt="PDF form">
                        </a>
                    </td>                       
                   </tr><tr class="">
                    <td class="centered">95841225458</td>                       
                    <td class="centered">08/08/2017</td>
                    <td>
                        <a href="/link/searh?path=5:rows:4:pdf&amp;interface=ILinkListener">
                            <img class="no-border" src="/link/images/pdf.gif" alt="PDF form">
                        </a>
                    </td>                       
                   </tr><tr class="odd">
                    <td class="centered">1221455487878</td>                       
                    <td class="centered">23/01/2017</td>
                    <td>
                        <a href="/link/searh?path=5:rows:5:pdf&amp;interface=ILinkListener">
                            <img class="no-border" src="/link/images/pdf.gif" alt="PDF form">
                        </a>
                    </td>                       
                  
...
...
...
...                         
            </tbody></table>
        </div>       
    </div>
...

A first step would be to copy the "List" table to an Excel sheet. Then I should check on the dates to download specific PDFs. The problem is that despite many attempts I have never been able to select the part I'm interested in.

Suggestions are welcome. Thank you in advance.
 

Excel Facts

Format cells as time
Select range and press Ctrl+Shift+2 to format cells as time. (Shift 2 is the @ sign).
Maybe something like this:
VBA Code:
    Dim HTMLdoc As HTMLDocument
    Dim table As HTMLTable, tRow As HTMLTableRow, tCell As HTMLTableCell
    Dim destCell As Range
    
    Set destCell = ActiveSheet.Range("A1")
    Set HTMLdoc = IE.document
    Set table = HTMLdoc.querySelector("table[summary='List']")
    For Each tRow In table.Rows
        For Each tCell In tRow.Cells
            destCell.Offset(tRow.RowIndex, tCell.cellIndex).Value = tCell.innerText
        Next
    Next
 
Upvote 0
Thank you very much for your help. After some other attempts I managed to do the job in the end but in a very artisan and unorthodox way. Your way is much more tidy, I will treasure it
 
Upvote 0

Forum statistics

Threads
1,214,632
Messages
6,120,655
Members
448,975
Latest member
sweeberry

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top