Excel VBA Scraping IMAGES from an HTML data table

Kc7487

New Member
Joined
Mar 19, 2016
Messages
20
I have an HTML data table on a website, and have set up a code to scrape all of the data from it. HOWEVER, it does not scrape the images in this HTML data table. I want to scrape the images and have them go in their associated column of their associated rows. If I cannot scrape the images, it would be just as good to scrape the text that indicates there is supposed to be an image in that cell.

I asked on another post, but I think I made it too wordy and did not specify scraping an IMAGE in the title. Here is the link to that post: http://www.mrexcel.com/forum/excel-questions/935067-web-data-scraping-html-data-table.html

The HTML from one cell of one row of the table I am trying to scrape is as follows. There are many rows and many columns, but the cells that contain images are always in the 2nd column, and it is always the same image, with the HTML looking like this. There is not always an image for each row, but I need to know when there is. Some times there is a "G" like there is in this example. Other times no "G" and only the image. I have removed the <'s so that it displays on this forum.

TD colspan=1 align="RIGHT" class="USUALCLASS" NOWRAP BGCOLOR="WHITE">
FONT Face='Arial,Verdana,Helvetica' Size=1 COLOR="BLACK"> img src= '../images/Icons/S.gif' alt='This indicates STATUS' width=10 align=middle> Font color='#0070C0'>G/Font>/FONT>
/TD>
 
Last edited:

Excel Facts

How to change case of text in Excel?
Use =UPPER() for upper case, =LOWER() for lower case, and =PROPER() for proper case. PROPER won't capitalize second c in Mccartney

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
6,751
Referring to the cl variable in the code in your other thread (a bit difficult to follow when you've split your question like this) you can get the image URL (or determine if there is an image) like this:
Code:
    If Not cl.getElementsByTagName("IMG")(0) Is Nothing Then
        Debug.Print "URL = " & cl.getElementsByTagName("IMG")(0).src
    End If
and then use the URLDownloadToFile Windows API function to download it and insert it as a picture in the Excel cell.
 
Last edited:

Kc7487

New Member
Joined
Mar 19, 2016
Messages
20
Thanks again John_w!!

I have experimented with putting this code into mine, but am still uncertain onwhere exactly to put it?

Also you'd mentioned "and then use the URLDownloadToFile Windows API function to downloadit and insert it as a picture in the Excel cell"...... When I put thiscode in the right spot, is this code doing this already? Downloading andinserting the image in the Excel cell? Or is that the next step I have totake AFTER this code?

Alternatively, I do not care about retaining the image, even if it can dosomething as simple as putting an "S" in the cell, or displaying thewords from the HTML "'This indicates STATUS" or anything at all thatis fine. The image is not needed, just an indication of the fact that there ISan image IN the those certain cells. I assume it would probably be quickerto do that anyway coding-wise?

For reference, this is my current looping code that pulls all the text from thetable and puts it in the proper cells, (but is not pulling the images andputting them in the proper cells (or an S, or anything)):

Code:
Set ws = ThisWorkbook.Worksheets("WORKSHEET NAME")

            For Each tblIn doc.getElementsByTagName("TABLE")
               tabno = tabno + 1
               nextrow = nextrow + 1
               Set rng = ws.Range("B" & nextrow)
               rng.Offset(, -1) = "Table " & tabno
               For Each rw In tbl.Rows
                   For Each cl In rw.Cells
                       rng.Value = cl.outerText
                       Set rng = rng.Offset(, 1)
                       I = I + 1
                   Next cl
                   nextrow = nextrow + 1
                   Set rng = rng.Offset(1, -I)
                   I = 0
               'below I am trying to insert a status bar update for every new row
               Application.StatusBar = "Approx. " & nextrow / 5.3 & "%complete."

               
               Next rw
               
            Next tbl

Thanks for the assistance!
 

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
6,751
Given a URL to a specific file, URLDownloadFile simply downloads the file to a folder on your computer. If the file is an image, a separate step is needed to insert it as a picture into an Excel cell. The code is basically Worksheet.Pictures.Insert(file), but you could record a macro to get more complete code.

Put my code inside the For Each cl loop, since that is where you refer to the HTML table cell.
 

Kc7487

New Member
Joined
Mar 19, 2016
Messages
20

ADVERTISEMENT

Thanks!
Great I seem to be getting somehwere. But not quite there yet.... I put your code in the For Each cl loop, and it is running. It seems that if I press "Ctrl" and "G" together that I pull up the "Immediate Window", and I can see a list of the images this code found inside of the immediate window.

However, in this list of the images found, it does not say which cell the code found it in. And the code does not put an S or the image URL or anything in the cell it pulled it from. As such I am not sure what to do next to get an indication of which cells have the image and which do not. (I would use the macro recorder, but I don't know what actions to perform so that Excel puts "S"'s inside of the cell's that have the image in the HTML, I'm not really sure that is something I can even do manually to get the code?)

Thanks again for any and all assistance!</SPAN>
 

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
6,751
I put your code in the For Each cl loop, and it is running. It seems that if I press "Ctrl" and "G" together that I pull up the "Immediate Window", and I can see a list of the images this code found inside of the immediate window.
Yes, because Debug.Print outputs to the Immediate Window as a way of developing, testing and debugging the code.

However, in this list of the images found, it does not say which cell the code found it in. And the code does not put an S or the image URL or anything in the cell it pulled it from. As such I am not sure what to do next to get an indication of which cells have the image and which do not.
Try changing the inner loop to:
Code:
                    For Each cl In rw.Cells
                        If Not cl.getElementsByTagName("IMG")(0) Is Nothing Then
                            rng.Value = "S"  'the HTML cell contains an image
                        Else
                            rng.Value = cl.innerText
                        End If
                        Set rng = rng.Offset(, 1)
                        I = I + 1
                    Next cl
 

Kc7487

New Member
Joined
Mar 19, 2016
Messages
20

ADVERTISEMENT

This did just the trick, I can't thank you enough John_w, really appreciate the help!

If you get a moment, I wonder if it would be possible to only have this code only do the image check and insert an S in one specific column? The second column in the HTML Data Table to be specific? The rest of the columns can just do normal inner text without any worrying about any images, it is only in the second column that I need images to turn into an "S".

The reason I ask: I realized after adding and running this code that there were more images in the table that I had not previously noticed, and of course the code did not previously care about. I don't need these images, but now the text inside of their cells is all being replaces with an "S". (for instance, a column header in the table has a small image of an arrow next to the name of the column. The arrow appears because it is by default sorted in order by that column. Because of the image of this arrow, the code turns the column header into "S" instead of the usual text of the column header).

Or alternatively, maybe it would be possible to reference the specific images that I'd like to turn into an S? Like by referencing the "image source" as well (in the HTML it is.... img src= '../images/Icons/S.gif'........ or maybe by referencing the "alt" (I'm not sure what the "alt" is) in the HTML is............alt='This indicates STATUS'

Thanks!
 

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
6,751
If you get a moment, I wonder if it would be possible to only have this code only do the image check and insert an S in one specific column? The second column in the HTML Data Table to be specific? The rest of the columns can just do normal inner text without any worrying about any images, it is only in the second column that I need images to turn into an "S".
Replace the For Each cl loop with:
Code:
                    For Each cl In rw.Cells
                        IF cl.cellIndex = 1 And Not cl.getElementsByTagName("IMG")(0) Is Nothing Then
                            rng.Value = "S"  'the 2nd HTML columnn cell contains an image
                        Else
                            rng.Value = cl.innerText
                        End If
                        Set rng = rng.Offset(, 1)
                        I = I + 1
                    Next cl
 

Kc7487

New Member
Joined
Mar 19, 2016
Messages
20
Thanks for all of your assistance with this stuff John_w, works like a charm! I appreciate it.
 

Watch MrExcel Video

Forum statistics

Threads
1,129,555
Messages
5,637,021
Members
416,954
Latest member
Gohar hussain

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top