VBA to extract text from multiple .txt files in subfolders

mintz

Board Regular
Joined
Aug 5, 2015
Messages
129
I'm looking for a VBA macro to do the following:
- given a specific path, search all folders and subfolders and extract text from all files named "contact.txt"
- for each .txt file insert the content into a new row under column A and its path into column B
These solutions the closest to what I'm trying to achieve
 

Marc L

Well-known Member
Joined
Apr 5, 2021
Messages
1,301
Office Version
  1. 2010
Platform
  1. Windows
How can I read the encoding signature of the text file?
Should be just with some VBA basics …​
For further help zip a couple of text files - one for each encoding format - and link it on a files host website like Dropbox for example …​
 

Excel Facts

Return population for a City
If you have a list of cities in A2:A100, use Data, Geography. Then =A2.Population and copy down.

mintz

Board Regular
Joined
Aug 5, 2015
Messages
129
Should be just with some VBA basics …​
For further help zip a couple of text files - one for each encoding format - and link it on a files host website like Dropbox for example …​

Here are two files, one in UTF-8 and the other one in UTF-16

 

Marc L

Well-known Member
Joined
Apr 5, 2021
Messages
1,301
Office Version
  1. 2010
Platform
  1. Windows
There is no BOM in your UTF-8 text file so encoded as a classic ASCII file, seems to not need some encoding conversion​
as it does not contain any specific character like can be found in some UTF-8 file with BOM …​
The UTF-16 file contains a BOM which well matches one of the two UTF-16 encoding formats.​
Before to go further on my side let's try on your side the '_autodetect' charset with ADODB.Stream (or even '_autodetect_all') …​
 

Marc L

Well-known Member
Joined
Apr 5, 2021
Messages
1,301
Office Version
  1. 2010
Platform
  1. Windows
If I set the charset to UTF-16, the UTF-8 files come out wrong, and vice versa
Is it possible to process variable encodings?
According to post #10 attachment ADODB.Stream with its '_autodetect_all' Charset well works​
like in this VBA demonstration to paste to the top of the worksheet module :​
VBA Code:
Dim Obj(1) As Object, R&, S$()

Sub ScanFolder(ByVal FOLD$)
      Const C = "Contact.txt"
        Dim oFold As Object
    If Obj(1).FileExists(FOLD & C) Then
        R = R + 1
        Obj(0).Open
        Obj(0).LoadFromFile FOLD & C
        S(R, 1) = Obj(0).ReadText
        Obj(0).Close
        S(R, 2) = FOLD
    End If
        For Each oFold In Obj(1).GetFolder(FOLD).SubFolders:  ScanFolder oFold.Path & "\":  Next
End Sub

Sub Demo1()
    Dim P$
        P = "D:\Tests4Noobs\":  If Dir(P, 16) <> "." Then Beep: Exit Sub
        R = 0
        UsedRange.Clear
        ReDim S(1 To Rows.Count, 1 To 2)
        Set Obj(0) = CreateObject("ADODB.Stream")
            Obj(0).Charset = "_autodetect_all"
        Set Obj(1) = CreateObject("Scripting.FileSystemObject")
        ScanFolder P
    If R Then
            Application.ScreenUpdating = False
        With [A1:B1].Resize(R)
            .Columns(2).VerticalAlignment = xlCenter
            .Value2 = S
            .Columns.AutoFit
        End With
            Application.ScreenUpdating = True
    End If
        Erase Obj, S
End Sub
 

Marc L

Well-known Member
Joined
Apr 5, 2021
Messages
1,301
Office Version
  1. 2010
Platform
  1. Windows

ADVERTISEMENT


Addendum : previous post demonstration does not need to activate any additional reference …​
 

mintz

Board Regular
Joined
Aug 5, 2015
Messages
129
According to post #10 attachment ADODB.Stream with its '_autodetect_all' Charset well works​
like in this VBA demonstration to paste to the top of the worksheet module :​
This works out on my side with correct output in 90% of the files , some others come up with this result:
Untitled.png


This file shows as UTF-8 on my side:
Untitled2.png

They might be different encodings than what's showing?
Here is the file upload:
 

Marc L

Well-known Member
Joined
Apr 5, 2021
Messages
1,301
Office Version
  1. 2010
Platform
  1. Windows

ADVERTISEMENT

So I can detect if a file is encoded in UTF-16LE and if not it should be an UTF-8 file as the solution of this thread​
but the question is all your text files use only this two formats or more ?​
 

mintz

Board Regular
Joined
Aug 5, 2015
Messages
129
So I can detect if a file is encoded in UTF-16LE and if not it should be an UTF-8 file as the solution of this thread​
but the question is all your text files use only this two formats or more ?​
Yes, based on the output I got from the file in previous post (#16), the encoding seems to be neither UTF-8 nor UTF-16 for that particular file
That particular file shows UTF-8 on Notepad++ but I still got the wrong output for it, unsure what could be the reason
 

Marc L

Well-known Member
Joined
Apr 5, 2021
Messages
1,301
Office Version
  1. 2010
Platform
  1. Windows
No, the attachment in post #16 is a real UTF-8 no BOM encoded file and I can read it and fill the expected result into an Excel sheet without any issue …​
So your 10% files with issue are all under UTF-8 or other encoding formats are necessary ?​
 

mintz

Board Regular
Joined
Aug 5, 2015
Messages
129
No, the attachment in post #16 is a real UTF-8 no BOM encoded file and I can read it and fill the expected result into an Excel sheet without any issue …​
So your 10% files with issue are all under UTF-8 or other encoding formats are necessary ?​
Yes, in case it is under UTF-8, then I assume all the other files with issue are also under UTF-8 (I haven't found other encodings other than UTF-8 or UTF-16)
 

Forum statistics

Threads
1,141,863
Messages
5,709,063
Members
421,611
Latest member
Lisa W

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top