Web Data Scraping from HTML Data Table

Kc7487

New Member
Joined
Mar 19, 2016
Messages
20
Hello all!

I posted a couple questions about an ongoing project a while back. With the help of people here, and some modifications to the code, I now have a code which signs into the website specified in a certain cell, using user input boxes for username and password. The code will skip this sign in process if the user is already logged in, and go straight to the scraping data part of the code. The code will then use the TR and TD tags to copy over all the data from the HTML table to excel. The code then does some error checking stuff, some formatting, and other various things. The code then schedules the same thing to happen every ten minutes. (I will trim the code down to only the troublesome part however.)

I have one main problem now. In the HTML table, there is an image in one column (always the same column, the 2nd column, which appears in column C in Excel due to column A containing only the table name of the scrapped table) of some of the rows. I found where it is in the HTML, and will paste one row of the table below (heavily edited to protect proprietary stuff, but the row with the image is mostly exact from the source with a few edits that I kept track of so can change wording back.) The HTML looks like this:

EDIT, I couldn't post the exact HTML, it got all wonky on the website.... I took out the < and </ on a bunch of stuff

Code:
TR>
	

	TR STUFF
		
		
		
		TD STUFF>
		TD>		



THIS IS THE IMPORTANT PART RIGHT HERE:
		TD colspan=1 align="RIGHT" class="USUAL CLASS" OTHER STUFF HERE DON'T THINK WILL BE NEEDED>		
			FONT Face='Arial,Verdana,Helvetica' Size=1 COLOR="BLACK">    img src= '../images/Icons/S.gif' alt='This indicates STATUS' width=10 align=middle> Font color='#FONTCOLOR'>GFont>/FONT>
		TD>		
	


				TD STUFF">
					<STUFF>		
				/TD>
	
				TD STUFF			
					
				/TD>
	
	TDSTUFF</TD>
	

TD STUFF/TD>


THERE ARE LIKE 20-30 MORE TD AND /TD TAGS THEN:

	TR>
	
THEN THE NEXT ROW STARTS, SOME HAVE THAT IMAGE THING AND SOME DON'T


I can see where this appears in the HTML, but I CANNOT get this image to appear when running my code. I will post only the part which is actually scrapping the data below.

Code:
Set ws = ThisWorkbook.Worksheets("WORKSHEET NAME")

            For Each tbl In doc.getElementsByTagName("TABLE")
                tabno = tabno + 1
                nextrow = nextrow + 1
                Set rng = ws.Range("B" & nextrow)
                rng.Offset(, -1) = "Table " & tabno
                For Each rw In tbl.Rows
                    For Each cl In rw.Cells
                        rng.Value = cl.outerText
                        Set rng = rng.Offset(, 1)
                        I = I + 1
                    Next cl
                    nextrow = nextrow + 1
                    Set rng = rng.Offset(1, -I)
                    I = 0
                'below I am trying to insert a status bar update for every new row
                Application.StatusBar = "Approx. " & nextrow / 5.3 & "% complete."

                
                Next rw
                
            Next tbl

With this information, (the HTML sample row and the current code) would anybody be able to help point me in the right direction so that this code ALSO either puts the image into Excel, OR puts any marker whatsoever (an S maybe?). I see in the HTML that it says..... ;img src= '../images/Icons/S.gif' alt='This indicates STATUS'.....What does the alt= part mean? Could the code simply pull the alt part? Maybe wherever this image appears the code would say 'This indicates STATUS' inside that cell?

Bonus points if anybody can come up with a better way to update the status bar (I am currently just guessing that there will be a bit over 500 rows so using a static 5.3% rows complete...

Thanks for any and all assistance!
 
Last edited:

Excel Facts

Quick Sum
Select a range of cells. The total appears in bottom right of Excel screen. Right-click total to add Max, Min, Count, Average.
I feel like I made this question too wordy again, and over edited things. Let me try to consolidate this question.

I have this code which scrapes data tables from a web page.

Code:
Set ws = ThisWorkbook.Worksheets("WORKSHEET NAME")

            For Each tbl In doc.getElementsByTagName("TABLE")
                tabno = tabno + 1
                nextrow = nextrow + 1
                Set rng = ws.Range("B" & nextrow)
                rng.Offset(, -1) = "Table " & tabno
                For Each rw In tbl.Rows
                    For Each cl In rw.Cells
                        rng.Value = cl.outerText
                        Set rng = rng.Offset(, 1)
                        I = I + 1
                    Next cl
                    nextrow = nextrow + 1
                    Set rng = rng.Offset(1, -I)
                    I = 0
                'below I am trying to insert a status bar update for every new row
                Application.StatusBar = "Approx. " & nextrow / 5.3 & "% complete."

                
                Next rw
                
            Next tbl

The second column of the HTML data table on the web page I am scraping is NOT working properly. It is supposed to have an image on it. Neither the image, nor any indication of the fact that an image was supposed to be there, appears in Excel. I NEED some indication of this image.

I will post a snippet of a row's HTML code from the webpage I am trying to scrape. The image only appears on some of the rows, not all. I need to have knowledge of where this image appears in the table and where it does not in Excel. Here is the HTML (note I am replacing all of the "<"s with the word" OPENARROWHEAD" because it shows up weird in this forum if I leave them in normal:

Code:
OPENARROWHEADTR bgColor="#FFFFFF">
		
		
		
		OPENARROWHEADTD colspan=1 align="center" class="USUALCLASS" NOWRAP BGCOLOR="WHITE">			
			OPENARROWHEADFONT Face='Arial,Verdana,Helvetica' Size=1 COLOR="BLACK">3OPENARROWHEAD/FONT>
		OPENARROWHEAD/TD>		
		OPENARROWHEADTD colspan=1 align="RIGHT" class="USUALCLASS" NOWRAP BGCOLOR="WHITE">			
			OPENARROWHEADFONT Face='Arial,Verdana,Helvetica' Size=1 COLOR="BLACK"> OPENARROWHEADimg src= '../images/Icons/S.gif' alt='This indicates STATUS' width=10 align=middle>OPENARROWHEAD/FONT>
		OPENARROWHEAD/TD>		
	
				OPENARROWHEADTD colspan=1 align="left" class="USUALCLASS" NOWRAP  BGCOLOR="WHITE">
					OPENARROWHEADFONT Face='Arial,Verdana,Helvetica' Size=1 COLOR="BLACK">OOOPENARROWHEAD/FONT>		
				OPENARROWHEAD/TD>

Could anybody assist with a code modification which would allow for Excel to scrape the fact that a row has this in it? I do not care what letters/text/pictures/etc... Excel puts in the cell, I just need some indication. As it stands now, the code just makes the cell which is supposed to have the image blank. (Or sometimes it will contain the letter "G", based on this column sometimes having a G, but it will never have any indication of the image that is supposed to be there.) Anyway, any help would be greatly appreciated, thanks!
 
Upvote 0

Forum statistics

Threads
1,214,965
Messages
6,122,499
Members
449,089
Latest member
Raviguru

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top