HTML formatted tables into Excel

EconSean

Board Regular
Joined
Apr 21, 2002
Messages
129
Greetings everyone,

I have roughly 100 HTML files, each containing a single table of formatted data (created with the SAS statistical software package).

I realize that I can open an HTML file with Excel, do what I need (e.g., copy, format, save as an Excel file), but I would like to try to automate the process, if at all possible, even if it's simply open the HTML file and save it as Excel. That would save me quite a few clicks it seems.

Ideally I could pass a directory name to Excel (via VBA) and have a series of "operations" (i.e., save as an Excel file) performed on each HTML file in the directory, ultimately resulting in a corresponding collection of "new" files (in my case, Excel files).

If this was very fancy, I would love to take each HTML file and copy the one table in each, and paste into a separate worksheet in a new Excel file (so 100 HTML files would result in one Excel file with 100 worksheets).

I did try to generate a macro while using the Import External Data "tool", thinking that might get me pointed in the right direction, but didn't have luck with that (although I am able to get the tool to work, just not within a macro environment).

I must admit that I haven't written any VBA code in quite a while, so any insights that you might have would be greatly appreciated.

Thanks in advance for your time.

Regards,

Sean
 

Excel Facts

Square and cube roots
The =SQRT(25) is a square root. For a cube root, use =125^(1/3). For a fourth root, use =625^(1/4).
Hi

This code might do what you need. It basically loops through all files in the specified folder and imports them into a new workbook (1 worksheet for each file). I just did a quick test on HTML files and it seems to work ok. This is some old code that I had which used Application.Filesearch (and therefore would not work in Excel 2007/2010) so I've changed it to use the MS Scripting object.

Let me know how you get on.

HTH
DK

Code:
Sub ImportMultipleFilesintoOneWorkbook()
    Dim sFoldername As String
    Dim shtDest As Worksheet
    Dim wbDest As Workbook
    Dim wbSource As Workbook
    Dim oFSObj As Object
    Dim oFSFolder As Object
    Dim oFSFile As Object


    On Error GoTo ErrHandler


    sFoldername = InputBox("Enter folder path")
    If Len(sFoldername) = 0 Then Exit Sub


    'These objects are required to search a particular folder for files
    Set oFSObj = CreateObject("Scripting.FilesystemObject")
    Set oFSFolder = oFSObj.GetFolder(sFoldername)


    'This is the workbook that will contain all imported files
    Set wbDest = Workbooks.Add(xlWBATWorksheet)


    For Each oFSFile In oFSFolder.Files


        Set shtDest = wbDest.Worksheets.Add

        Set wbSource = Application.Workbooks.Open(oFSFile.Path)

        wbSource.Sheets(1).UsedRange.Copy Destination:=shtDest.Range("A1")

        wbSource.Close


    Next oFSFile



ExitSub:
    Set oFSFile = Nothing
    Set oFSFolder = Nothing
    Set oFSObj = Nothing
    Exit Sub

ErrHandler:
    MsgBox "An error occurred: " & Err.Description
    Resume ExitSub


End Sub
 
Upvote 0
DK,

Thank you so much, this is terrific, works wonderfully, and I am grateful for the assistance.

Can I now be greedy and ask for your thoughts on a tweak? Do you have any ideas on a method for taking the HTML file name (omitting the HTML extension) and applying the name to the corresponding tab in the Excel file?

Sean

Hi

This code might do what you need. It basically loops through all files in the specified folder and imports them into a new workbook (1 worksheet for each file). I just did a quick test on HTML files and it seems to work ok. This is some old code that I had which used Application.Filesearch (and therefore would not work in Excel 2007/2010) so I've changed it to use the MS Scripting object.

Let me know how you get on.

HTH
DK

Code:
Sub ImportMultipleFilesintoOneWorkbook()
    Dim sFoldername As String
    Dim shtDest As Worksheet
    Dim wbDest As Workbook
    Dim wbSource As Workbook
    Dim oFSObj As Object
    Dim oFSFolder As Object
    Dim oFSFile As Object


    On Error GoTo ErrHandler


    sFoldername = InputBox("Enter folder path")
    If Len(sFoldername) = 0 Then Exit Sub


    'These objects are required to search a particular folder for files
    Set oFSObj = CreateObject("Scripting.FilesystemObject")
    Set oFSFolder = oFSObj.GetFolder(sFoldername)


    'This is the workbook that will contain all imported files
    Set wbDest = Workbooks.Add(xlWBATWorksheet)


    For Each oFSFile In oFSFolder.Files


        Set shtDest = wbDest.Worksheets.Add

        Set wbSource = Application.Workbooks.Open(oFSFile.Path)

        wbSource.Sheets(1).UsedRange.Copy Destination:=shtDest.Range("A1")

        wbSource.Close


    Next oFSFile



ExitSub:
    Set oFSFile = Nothing
    Set oFSFolder = Nothing
    Set oFSObj = Nothing
    Exit Sub

ErrHandler:
    MsgBox "An error occurred: " & Err.Description
    Resume ExitSub


End Sub
 
Upvote 0
I do think that I nearly have a solution for extracting the HTML file names and putting them on the tabs when the tables are drawn into Excel.

I had asked DK about this after the "solution" was posted. Howver, I think that I am close to solving this, and am happy to share; I don't want anyone wasting their time on my question when I've nearly got it (I think/hope).

I'll paste what I have come up with soon.

Thanks,

Sean
 
Upvote 0
Here is what I've been able to come up with. File names should be 31 or fewer characters; other tab naming characteristics and rules apply (e.g., no "strange" characters).

Sub ImportMultipleFilesintoOneWorkbook()

Dim sFoldername As String
Dim shtDest As Worksheet
Dim wbDest As Workbook
Dim wbSource As Workbook
Dim oFSObj As Object
Dim oFSFolder As Object
Dim oFSFile As Object
Dim shtName As String

On Error GoTo ErrHandler

sFoldername = InputBox("Enter folder path")

If Len(sFoldername) = 0 Then Exit Sub

'These objects are required to search a particular folder for files

Set oFSObj = CreateObject("Scripting.FilesystemObject")

Set oFSFolder = oFSObj.GetFolder(sFoldername)

'This is the workbook that will contain all imported files

Set wbDest = Workbooks.Add(xlWBATWorksheet)

For Each oFSFile In oFSFolder.Files

Set shtDest = wbDest.Worksheets.Add
Set wbSource = Application.Workbooks.Open(oFSFile.Path)

wbSource.Sheets(1).UsedRange.Copy Destination:=shtDest.Range("A4")

shtName = Left(oFSFile.Name, Len(oFSFile.Name) - 5) 'sheet name is the name of the HTML file after taking off .html extension

If Len(shtName) > 31 Then 'Excel tab names can be maximum 31 characters so truncate if necessary (will crash if this causes duplicate tab names)

shtName = Left(shtName, 31)

End If

shtDest.Name = shtName 'Set the tab name to the HTML file name

wbSource.Close

Next oFSFile

ExitSub:

Set oFSFile = Nothing
Set oFSFolder = Nothing
Set oFSObj = Nothing

Exit Sub

ErrHandler:
MsgBox "An error occurred: " & Err.Description
Resume ExitSub

End Sub
 
Upvote 0
I erroneously started a new thread about this...here is what I just wrote (I will delete that thread):

Greetings everyone,

I am trying to import data tables contained in HTML files into Excel, but have hit a bit of a snag. I have roughly 100 distinct files stored in a directory and need to take the data in each and put on a separate worksheet in Excel, adding the HTML file's name as the label on the worksheet tab.

The code below worked fine in an earlier iteration of this project. In short, the macro takes the name of the folder where the "input" files are located, then opens each file in that folder, one by one, copies the data over to a tab in Excel, changes the tab name to match the name of the HTML file, and then moves on to the next file until all in the folder have been "processed."

However, the macro now crashes, giving me an error. Specifically, "The information cannot be pasted because the Copy area and the paste area are not the same size and shape. Try one of the following: Click a single cell and then paste. Select a rectangle that's the same size and shape and then paste."

The earlier HTML files only contained one table, while the newer files, which are resulting in the crash, have two table stacked on top of each other (and of different dimensions). I assume that this is the issue, although I honestly can't tell why this is causing a problem.

From HTML file to file, the dimensions of the data tables can and do differ, so I can't see that I simply select a range, copy that, and paste into Excel (or if I can, I just don't see how to accomplish that).

Any ideas? Many thanks in advance for any thoughts that you might be able to share.

Sean


Sub ImportMultipleFilesintoOneWorkbook()

Dim sFoldername As String
Dim shtDest As Worksheet
Dim wbDest As Workbook
Dim wbSource As Workbook
Dim oFSObj As Object
Dim oFSFolder As Object
Dim oFSFile As Object
Dim shtName As String

On Error GoTo ErrHandler

sFoldername = InputBox("Enter folder path")

If Len(sFoldername) = 0 Then Exit Sub

'These objects are required to search a particular folder for files
Set oFSObj = CreateObject("Scripting.FilesystemObject")
Set oFSFolder = oFSObj.GetFolder(sFoldername)

'This is the workbook that will contain all imported files
Set wbDest = Workbooks.Add(xlWBATWorksheet)

For Each oFSFile In oFSFolder.Files
Set shtDest = wbDest.Worksheets.Add
Set wbSource = Application.Workbooks.Open(oFSFile.Path)
wbSource.Sheets(1).UsedRange.Copy Destination:=shtDest.Range("A1")

shtName = Left(oFSFile.Name, Len(oFSFile.Name) - 5) 'sheet name is the name of the HTML file after taking off .html extension
If Len(shtName) > 31 Then 'Excel tab names can be maximum 31 characters so truncate if necessary (will crash if this causes duplicate tab names)
shtName = Left(shtName, 31)
End If
shtDest.Name = shtName 'Set the tab name to the HTML file name

wbSource.Close

Next oFSFile

ExitSub:
Set oFSFile = Nothing
Set oFSFolder = Nothing
Set oFSObj = Nothing
Exit Sub

ErrHandler:
MsgBox "An error occurred: " & Err.Description
Resume ExitSub
End Sub
 
Upvote 0
Hi mate

Can you post an example of your HTML data and I'll see if I can help. If it's sensitive then just take out the actual data values (replace with x's or something) - as long as the structure of the data is the same that's fine.

DK

EDIT: i don't come here much any more as I don't do a huge amount of VBA/Excel stuff these days - just saw my "Join Date" - can't believe I joined MrExcel over 11 years ago!
 
Upvote 0
DK,

Do you mean upload an example HTML file, or do you have something else in mind?

Thanks,

Sean
 
Upvote 0
If you can just paste the data into code tags I think that should work as long as it's not too large but can you turn off HTML (it's in the Additional Options when you Go Advanced - set to HTML Off). For example:

Code:
<html><body><table>
<tr>
<td>Name</td>
<td>Age</td>
</tr>
<tr>
<td>Steve</td>
<td>23</td>
</tr>
<td>Diana</td>
<td>28</td>
</tr>
</table>
</body>
</html>
 
Upvote 0

Forum statistics

Threads
1,224,594
Messages
6,179,792
Members
452,942
Latest member
VijayNewtoExcel

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top