Converting all files that can be read by MS Word into TXT files

Dr. Demento

Well-known Member
Joined
Nov 2, 2010
Messages
618
Office Version
  1. 2019
  2. 2016
Platform
  1. Windows
The goal is to convert any file that can be read by MS Word and convert it directly (and hopefully quickly) into a text file. I swear I had this working at one point but it's certainly not working anymore. I don't recall where I got the text in blue, but I don't know if I have the syntax wrong (red line) or if I was just up too late; I've certainly never seen anything similar.

If the blue section that supposedly writes the Word file directly to the text file is garbage, a direction forward would be much appreciated.

Thanks y'all.

Code:
Global Const WordExtensions As String = _
  ".docx|.dotx|.dotm|.doc|" & _
  ".dot|.txt|.rtf|.htm|" & _
  ".html|.mht|.mhtml|.xml|" & _
  ".wps|.pdf|.xps|.odt"

Sub convert_Word()
' ~~ Import Word documents into Excel
' https://www.excelguru.ca/forums/showthread.php?8900-Help-with-VBA-to-extract-data-from-Word-to-Excel#2

Dim FSO As FileSystemObject
  Set FSO = New Scripting.FileSystemObject

Dim oFile As Variant

Dim var As Variant, _
    arr As Variant, _
    arrExt As Variant
  arrExt = Split(WordExtensions, "|")

Dim str As String, _
    strFile As String, _
    strFolder As String, _
    strExt As String
  strFolder = "C:\Test\"
  
Dim i As Long, _
    k As Long

  For k = LBound(arrExt) To UBound(arrExt)
    str = "cmd /c Dir " & Chr(34) & strFolder & "*" & arrExt(k) & Chr(34) & " /b/s "
    arr = Split(CreateObject("wscript.shell").exec(str).StdOut.ReadAll, vbCrLf)
    
    If UBound(arr) > 0 Then
      For i = 0 To UBound(arr) - 1
      
        With GetObject(arr(i))

          ' ~~ Output directly to TXT file
	 [COLOR="#0000FF"] strFile = Split(arr(i), arrExt(k))(0)
          Set oFile = FSO.CreateTextFile(strFile & ".txt")
          [COLOR="#FF0000"]oFile.Write arr(i).value & " "[/COLOR]
          oFile.Close[/COLOR]
          
          ' ~~ Output to Excel
    '      For Each var In .Tables
    '        var.Range.Copy
    '        Sheets.Add(, Sheets(Sheets.Count)).Paste Cells(1)
    '      Next var
          
          .Close 0
        End With
      Next i
    End If 'Ubound(arr)
  Next k
  
End Sub
 

Excel Facts

Highlight Duplicates
Home, Conditional Formatting, Highlight Cells, Duplicate records, OK to add pink formatting to any duplicates in selected range.
WHen I debug your code then the line:
Rich (BB code):
oFile.Write arr(i).value & " "
errors out.

This is because arr(i) is just a string, it is not a file. You will need to open the file with that name first, and then convert the contents to text.


There is another problem with your code:

Your code first converts files with a number of extensions. And it creates the .txt files from these _in the same directory as the source files_.
Then it comes to the .txt extension and so it will try to create output files for the outputfiles it just created. That is not going to work.

So either you have to process the .txt files first, by putting them upfront in your WordExtensions variable, or you need another output directory. The last would be preferable to avoid conflicts.

But to get that organised you need to run the cmd command without the /s option (this adds the full pathname to each directorylisting)

So all in all a bit more work to do for you.
 
Upvote 0

Forum statistics

Threads
1,213,536
Messages
6,114,202
Members
448,554
Latest member
Gleisner2

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top