VBA to extract text from multiple .txt files in subfolders

mintz

Board Regular
Joined
Aug 5, 2015
Messages
129
I'm looking for a VBA macro to do the following:
- given a specific path, search all folders and subfolders and extract text from all files named "contact.txt"
- for each .txt file insert the content into a new row under column A and its path into column B
These solutions the closest to what I'm trying to achieve
 

Excel Facts

Can a formula spear through sheets?
Use =SUM(January:December!E7) to sum E7 on all of the sheets from January through December

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
7,128
Try this macro, changing the start folder path as required.
VBA Code:
Public Sub Import_All_Contacts()
       
    With ActiveSheet
        Import_Contacts "C:\path\to\folder\", .Cells(.Rows.Count, "A").End(xlUp).Offset(1)
    End With
    
    MsgBox "Done"
       
End Sub


Private Function Import_Contacts(folderPath As String, destCell As Range) As Long

    Static FSO As Object
    Dim thisFolder As Object, subfolder As Object
    Dim thisFile As Object, ts As Object
    Dim n As Long
    
    If FSO Is Nothing Then Set FSO = CreateObject("Scripting.FileSystemObject")
    
    Set thisFolder = FSO.GetFolder(folderPath)
    
    n = 0
    For Each thisFile In thisFolder.Files
        If LCase(thisFile.Name) = "contact.txt" Then
            Set ts = FSO.OpenTextFile(thisFile, ForReading)
            destCell.Offset(n, 0).Value = ts.ReadAll
            destCell.Offset(n, 1).Value = thisFile.Path
            ts.Close
            n = n + 1
        End If
    Next
    
    'Look in subfolders
    
    For Each subfolder In thisFolder.SubFolders
        n = n + Import_Contacts(subfolder.Path, destCell.Offset(n))
    Next
    
    Import_Contacts = n

End Function
 
Solution

mintz

Board Regular
Joined
Aug 5, 2015
Messages
129
Try this macro, changing the start folder path as required.
Just a little fix, I forgot about having some .txt with Chinese characters, how do I read them into the sheet?

I tried this solution with enabling Microsoft ADO 6.1 Library in References:

VBA Code:
Dim adoStream As ADODB.Stream
Set adoStream = New ADODB.Stream

adoStream.Charset ="UTF-8" 'set the correct charset
adoStream.Open
adoStream.LoadFromFile FilePath

LstStr = adoStream.ReadText

Cells(1,1) = LstStr

adoStream.Close
Set adoStream = Nothing

The result output:
���^$X���V
 

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
7,128

ADVERTISEMENT

The result output:
���^$X���V
That probably means the file isn't UTF-8 encoded, although it contains UTF-8 or Unicode characters.

What is the file encoding? If it's Unicode encoded, try this. The only change is Format:=TristateTrue and some FSO constants defined.
VBA Code:
Public Sub Import_All_Contacts_LB()
       
    With ActiveSheet
        Import_Contacts "C:\path\to\folder\", .Cells(.Rows.Count, "A").End(xlUp).Offset(1)
    End With
    
    MsgBox "Done"
       
End Sub


Private Function Import_Contacts(folderPath As String, destCell As Range) As Long

    Static FSO As Object
    Dim thisFolder As Object, subfolder As Object
    Dim thisFile As Object, ts As Object
    Dim n As Long
    Const ForReading = 1
    Const TristateTrue = -1
    
    If FSO Is Nothing Then Set FSO = CreateObject("Scripting.FileSystemObject")
    
    Set thisFolder = FSO.GetFolder(folderPath)
    
    n = 0
    For Each thisFile In thisFolder.Files
        If LCase(thisFile.Name) = "contact.txt" Then
            Set ts = FSO.OpenTextFile(thisFile, ForReading, Format:=TristateTrue)
            destCell.Offset(n, 0).Value = ts.ReadAll
            destCell.Offset(n, 1).Value = thisFile.Path
            ts.Close
            n = n + 1
        End If
    Next
    
    'Look in subfolders
    
    For Each subfolder In thisFolder.SubFolders
        n = n + Import_Contacts(subfolder.Path, destCell.Offset(n))
    Next
    
    Import_Contacts = n

End Function
 
Last edited:

mintz

Board Regular
Joined
Aug 5, 2015
Messages
129
That probably means the file isn't UTF-8 encoded, although it contains UTF-8 or Unicode characters.
Yes that was exactly the issue, the file is UTF-16 encoded rather than UTF-8, I set the correct charset and that did the trick
 

mintz

Board Regular
Joined
Aug 5, 2015
Messages
129

ADVERTISEMENT

Yes that was exactly the issue, the file is UTF-16 encoded rather than UTF-8, I set the correct charset and that did the trick
I just tested multiple files and turns out I have multiple encodings, some files are UTF-8 with English only characters, other files are UTF-16 with both English and Chinese characters
If I set the charset to UTF-16, the UTF-8 files come out wrong, and vice versa
Is it possible to process variable encodings?
 

Marc L

Well-known Member
Joined
Apr 5, 2021
Messages
1,769
Office Version
  1. 2010
Platform
  1. Windows

It could be by reading the 'encoding signature' within the text file …​
 

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
7,128
I just tested multiple files and turns out I have multiple encodings, some files are UTF-8 with English only characters, other files are UTF-16 with both English and Chinese characters
If I set the charset to UTF-16, the UTF-8 files come out wrong, and vice versa
Is it possible to process variable encodings?
I don't know how to determine the encoding.

Here is my macro modified to read each .txt file twice and output both conversions to separate rows. The first read is the same as the previous macro (Format:=TristateTrue) and the second read converts the UTF-8 bytes to a Unicode (VBA) string. It is then up to you to look at the results and delete the rows which are incorrect.
VBA Code:
Option Explicit

#If VBA7 Then
    'Maps a multi-byte character string to a wide-character (Unicode) string
    Private Declare PtrSafe Function MultiByteToWideChar Lib "kernel32" _
        (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As LongPtr, ByVal cchMultiByte As Long, ByVal lpWideCharStr As LongPtr, ByVal cchWideChar As Long) As Long
#Else
    Private Declare Function MultiByteToWideChar Lib "kernel32" _
        (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
#End If

'UTF-8 code page
Private Const CP_UTF8 = 65001


Public Sub Import_All_Contacts()
      
    With ActiveSheet
        Import_Contacts "C:\path\to\folder\", .Cells(.Rows.Count, "A").End(xlUp).Offset(1)
    End With
   
    MsgBox "Done"
      
End Sub


Private Function Import_Contacts(folderPath As String, destCell As Range) As Long

    Static FSO As Object
    Dim thisFolder As Object, subfolder As Object
    Dim thisFile As Object, ts As Object
    Dim n As Long
    Const ForReading = 1
    Const TristateTrue = -1
   
    If FSO Is Nothing Then Set FSO = CreateObject("Scripting.FileSystemObject")
   
    Set thisFolder = FSO.GetFolder(folderPath)
   
    n = 0
    For Each thisFile In thisFolder.Files
        If LCase(thisFile.Name) = "contact.txt" Then
            Set ts = FSO.OpenTextFile(thisFile, ForReading, Format:=TristateTrue)
            destCell.Offset(n, 0).Value = ts.ReadAll
            destCell.Offset(n, 1).Value = thisFile.Path
            ts.Close
            n = n + 1
            Set ts = FSO.OpenTextFile(thisFile, ForReading)
            destCell.Offset(n, 0).Value = UTF8StringToVBAString(ts.ReadAll)
            destCell.Offset(n, 1).Value = thisFile.Path
            ts.Close
            n = n + 1
        End If
    Next
   
    'Look in subfolders
   
    For Each subfolder In thisFolder.SubFolders
        n = n + Import_Contacts(subfolder.Path, destCell.Offset(n))
    Next
   
    Import_Contacts = n

End Function


'Use MultiByteToWideChar API to convert a string of multi-byte characters to a wide-character (Unicode) VBA string
Private Function UTF8StringToVBAString(ByRef UTF8string As String) As String
   
    Dim UTF8bytes() As Byte
    Dim bufferSize As Long
   
    If UTF8string <> vbNullString Then
   
        'Convert string to bytes array
        UTF8bytes = StrConv(UTF8string, vbFromUnicode)
       
        'Get required size of output string
        bufferSize = MultiByteToWideChar(CP_UTF8, 0, VarPtr(UTF8bytes(0)), UBound(UTF8bytes) + 1, 0, 0)
       
        'Allocate output string
        UTF8StringToVBAString = String$(bufferSize, 0)
       
        'Convert UTF8 bytes array to Unicode output string
        MultiByteToWideChar CP_UTF8, 0, VarPtr(UTF8bytes(0)), UBound(UTF8bytes) + 1, StrPtr(UTF8StringToVBAString), bufferSize
    
    Else
   
        UTF8StringToVBAString = ""
       
    End If

End Function
 
Master Excel Bundle

Excel contains over 450 functions, with more added every year. That’s a huge number, so where should you start? Right here with this bundle.

Forum statistics

Threads
1,152,044
Messages
5,767,800
Members
425,437
Latest member
blaix

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top