VBA to extract text from multiple .txt files in subfolders

mintz

Board Regular
Joined
Aug 5, 2015
Messages
129
I'm looking for a VBA macro to do the following:
- given a specific path, search all folders and subfolders and extract text from all files named "contact.txt"
- for each .txt file insert the content into a new row under column A and its path into column B
These solutions the closest to what I'm trying to achieve
 

Excel Facts

How to change case of text in Excel?
Use =UPPER() for upper case, =LOWER() for lower case, and =PROPER() for proper case. PROPER won't capitalize second c in Mccartney
Try this macro, changing the start folder path as required.
VBA Code:
Public Sub Import_All_Contacts()
       
    With ActiveSheet
        Import_Contacts "C:\path\to\folder\", .Cells(.Rows.Count, "A").End(xlUp).Offset(1)
    End With
    
    MsgBox "Done"
       
End Sub


Private Function Import_Contacts(folderPath As String, destCell As Range) As Long

    Static FSO As Object
    Dim thisFolder As Object, subfolder As Object
    Dim thisFile As Object, ts As Object
    Dim n As Long
    
    If FSO Is Nothing Then Set FSO = CreateObject("Scripting.FileSystemObject")
    
    Set thisFolder = FSO.GetFolder(folderPath)
    
    n = 0
    For Each thisFile In thisFolder.Files
        If LCase(thisFile.Name) = "contact.txt" Then
            Set ts = FSO.OpenTextFile(thisFile, ForReading)
            destCell.Offset(n, 0).Value = ts.ReadAll
            destCell.Offset(n, 1).Value = thisFile.Path
            ts.Close
            n = n + 1
        End If
    Next
    
    'Look in subfolders
    
    For Each subfolder In thisFolder.SubFolders
        n = n + Import_Contacts(subfolder.Path, destCell.Offset(n))
    Next
    
    Import_Contacts = n

End Function
 
Upvote 0
Solution
Try this macro, changing the start folder path as required.
Just a little fix, I forgot about having some .txt with Chinese characters, how do I read them into the sheet?

I tried this solution with enabling Microsoft ADO 6.1 Library in References:

VBA Code:
Dim adoStream As ADODB.Stream
Set adoStream = New ADODB.Stream

adoStream.Charset ="UTF-8" 'set the correct charset
adoStream.Open
adoStream.LoadFromFile FilePath

LstStr = adoStream.ReadText

Cells(1,1) = LstStr

adoStream.Close
Set adoStream = Nothing

The result output:
���^$X���V
 
Upvote 0
The result output:
���^$X���V
That probably means the file isn't UTF-8 encoded, although it contains UTF-8 or Unicode characters.

What is the file encoding? If it's Unicode encoded, try this. The only change is Format:=TristateTrue and some FSO constants defined.
VBA Code:
Public Sub Import_All_Contacts_LB()
       
    With ActiveSheet
        Import_Contacts "C:\path\to\folder\", .Cells(.Rows.Count, "A").End(xlUp).Offset(1)
    End With
    
    MsgBox "Done"
       
End Sub


Private Function Import_Contacts(folderPath As String, destCell As Range) As Long

    Static FSO As Object
    Dim thisFolder As Object, subfolder As Object
    Dim thisFile As Object, ts As Object
    Dim n As Long
    Const ForReading = 1
    Const TristateTrue = -1
    
    If FSO Is Nothing Then Set FSO = CreateObject("Scripting.FileSystemObject")
    
    Set thisFolder = FSO.GetFolder(folderPath)
    
    n = 0
    For Each thisFile In thisFolder.Files
        If LCase(thisFile.Name) = "contact.txt" Then
            Set ts = FSO.OpenTextFile(thisFile, ForReading, Format:=TristateTrue)
            destCell.Offset(n, 0).Value = ts.ReadAll
            destCell.Offset(n, 1).Value = thisFile.Path
            ts.Close
            n = n + 1
        End If
    Next
    
    'Look in subfolders
    
    For Each subfolder In thisFolder.SubFolders
        n = n + Import_Contacts(subfolder.Path, destCell.Offset(n))
    Next
    
    Import_Contacts = n

End Function
 
Last edited:
Upvote 0
That probably means the file isn't UTF-8 encoded, although it contains UTF-8 or Unicode characters.
Yes that was exactly the issue, the file is UTF-16 encoded rather than UTF-8, I set the correct charset and that did the trick
 
Upvote 0
Yes that was exactly the issue, the file is UTF-16 encoded rather than UTF-8, I set the correct charset and that did the trick
I just tested multiple files and turns out I have multiple encodings, some files are UTF-8 with English only characters, other files are UTF-16 with both English and Chinese characters
If I set the charset to UTF-16, the UTF-8 files come out wrong, and vice versa
Is it possible to process variable encodings?
 
Upvote 0

It could be by reading the 'encoding signature' within the text file …​
 
Upvote 0
I just tested multiple files and turns out I have multiple encodings, some files are UTF-8 with English only characters, other files are UTF-16 with both English and Chinese characters
If I set the charset to UTF-16, the UTF-8 files come out wrong, and vice versa
Is it possible to process variable encodings?
I don't know how to determine the encoding.

Here is my macro modified to read each .txt file twice and output both conversions to separate rows. The first read is the same as the previous macro (Format:=TristateTrue) and the second read converts the UTF-8 bytes to a Unicode (VBA) string. It is then up to you to look at the results and delete the rows which are incorrect.
VBA Code:
Option Explicit

#If VBA7 Then
    'Maps a multi-byte character string to a wide-character (Unicode) string
    Private Declare PtrSafe Function MultiByteToWideChar Lib "kernel32" _
        (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As LongPtr, ByVal cchMultiByte As Long, ByVal lpWideCharStr As LongPtr, ByVal cchWideChar As Long) As Long
#Else
    Private Declare Function MultiByteToWideChar Lib "kernel32" _
        (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
#End If

'UTF-8 code page
Private Const CP_UTF8 = 65001


Public Sub Import_All_Contacts()
      
    With ActiveSheet
        Import_Contacts "C:\path\to\folder\", .Cells(.Rows.Count, "A").End(xlUp).Offset(1)
    End With
   
    MsgBox "Done"
      
End Sub


Private Function Import_Contacts(folderPath As String, destCell As Range) As Long

    Static FSO As Object
    Dim thisFolder As Object, subfolder As Object
    Dim thisFile As Object, ts As Object
    Dim n As Long
    Const ForReading = 1
    Const TristateTrue = -1
   
    If FSO Is Nothing Then Set FSO = CreateObject("Scripting.FileSystemObject")
   
    Set thisFolder = FSO.GetFolder(folderPath)
   
    n = 0
    For Each thisFile In thisFolder.Files
        If LCase(thisFile.Name) = "contact.txt" Then
            Set ts = FSO.OpenTextFile(thisFile, ForReading, Format:=TristateTrue)
            destCell.Offset(n, 0).Value = ts.ReadAll
            destCell.Offset(n, 1).Value = thisFile.Path
            ts.Close
            n = n + 1
            Set ts = FSO.OpenTextFile(thisFile, ForReading)
            destCell.Offset(n, 0).Value = UTF8StringToVBAString(ts.ReadAll)
            destCell.Offset(n, 1).Value = thisFile.Path
            ts.Close
            n = n + 1
        End If
    Next
   
    'Look in subfolders
   
    For Each subfolder In thisFolder.SubFolders
        n = n + Import_Contacts(subfolder.Path, destCell.Offset(n))
    Next
   
    Import_Contacts = n

End Function


'Use MultiByteToWideChar API to convert a string of multi-byte characters to a wide-character (Unicode) VBA string
Private Function UTF8StringToVBAString(ByRef UTF8string As String) As String
   
    Dim UTF8bytes() As Byte
    Dim bufferSize As Long
   
    If UTF8string <> vbNullString Then
   
        'Convert string to bytes array
        UTF8bytes = StrConv(UTF8string, vbFromUnicode)
       
        'Get required size of output string
        bufferSize = MultiByteToWideChar(CP_UTF8, 0, VarPtr(UTF8bytes(0)), UBound(UTF8bytes) + 1, 0, 0)
       
        'Allocate output string
        UTF8StringToVBAString = String$(bufferSize, 0)
       
        'Convert UTF8 bytes array to Unicode output string
        MultiByteToWideChar CP_UTF8, 0, VarPtr(UTF8bytes(0)), UBound(UTF8bytes) + 1, StrPtr(UTF8StringToVBAString), bufferSize
    
    Else
   
        UTF8StringToVBAString = ""
       
    End If

End Function
 
Upvote 0

Forum statistics

Threads
1,213,504
Messages
6,114,016
Members
448,543
Latest member
MartinLarkin

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top