Converting all files that can be read by MS Word into TXT files

Dr. Demento

Well-known Member
The goal is to convert any file that can be read by MS Word and convert it directly (and hopefully quickly) into a text file. I swear I had this working at one point but it's certainly not working anymore. I don't recall where I got the text in blue, but I don't know if I have the syntax wrong (red line) or if I was just up too late; I've certainly never seen anything similar.

If the blue section that supposedly writes the Word file directly to the text file is garbage, a direction forward would be much appreciated.

Thanks y'all.

Code:
Global Const WordExtensions As String = _
  ".docx|.dotx|.dotm|.doc|" & _
  ".dot|.txt|.rtf|.htm|" & _
  ".html|.mht|.mhtml|.xml|" & _
  ".wps|.pdf|.xps|.odt"

Sub convert_Word()
' ~~ Import Word documents into Excel
' https://www.excelguru.ca/forums/showthread.php?8900-Help-with-VBA-to-extract-data-from-Word-to-Excel#2

Dim FSO As FileSystemObject
  Set FSO = New Scripting.FileSystemObject

Dim oFile As Variant

Dim var As Variant, _
    arr As Variant, _
    arrExt As Variant
  arrExt = Split(WordExtensions, "|")

Dim str As String, _
    strFile As String, _
    strFolder As String, _
    strExt As String
  strFolder = "C:\Test\"
  
Dim i As Long, _
    k As Long

  For k = LBound(arrExt) To UBound(arrExt)
    str = "cmd /c Dir " & Chr(34) & strFolder & "*" & arrExt(k) & Chr(34) & " /b/s "
    arr = Split(CreateObject("wscript.shell").exec(str).StdOut.ReadAll, vbCrLf)
    
    If UBound(arr) > 0 Then
      For i = 0 To UBound(arr) - 1
      
        With GetObject(arr(i))

          ' ~~ Output directly to TXT file
	 [COLOR="#0000FF"] strFile = Split(arr(i), arrExt(k))(0)
          Set oFile = FSO.CreateTextFile(strFile & ".txt")
          [COLOR="#FF0000"]oFile.Write arr(i).value & " "[/COLOR]
          oFile.Close[/COLOR]
          
          ' ~~ Output to Excel
    '      For Each var In .Tables
    '        var.Range.Copy
    '        Sheets.Add(, Sheets(Sheets.Count)).Paste Cells(1)
    '      Next var
          
          .Close 0
        End With
      Next i
    End If 'Ubound(arr)
  Next k
  
End Sub
 

sijpie

Well-known Member
WHen I debug your code then the line:
Rich (BB code):
oFile.Write arr(i).value & " "
errors out.

This is because arr(i) is just a string, it is not a file. You will need to open the file with that name first, and then convert the contents to text.


There is another problem with your code:

Your code first converts files with a number of extensions. And it creates the .txt files from these _in the same directory as the source files_.
Then it comes to the .txt extension and so it will try to create output files for the outputfiles it just created. That is not going to work.

So either you have to process the .txt files first, by putting them upfront in your WordExtensions variable, or you need another output directory. The last would be preferable to avoid conflicts.

But to get that organised you need to run the cmd command without the /s option (this adds the full pathname to each directorylisting)

So all in all a bit more work to do for you.
 

Some videos you may like

This Week's Hot Topics

Top