Need to parse text from Word into Sheets

Jmayo

New Member
Joined
Mar 28, 2020
Messages
3
Office Version
  1. 2019
Platform
  1. MacOS
I have been given the task of taking almost 30 Word docs and extracting the information and compiling it into Excel. I began doing this manually by using Find/Replace which is how I was able to complete the first 3 sheets, however, this is cumbersome and time consuming and these will need updating on a regular basis. I have seen multiple threads on this topic but I am not proficient enough with VBA to modify the code to work for my needs. I have attached examples of both the Excel spreadsheet and samples of the Word docs I'll be using. The Excel workbook already has several sheets filled in as an example of what the completed project should look like. Basically, the macro needs to open up the document correlating to the name of the sheet (each one is unique), extract the UID, Abbreviation, Alias, Name, and Units and place them on the same row under their respective column starting on ROW 5 then move to the next row and do the same. Once all the data has been extracted (and each sheet has a unique number of rows) then the document is to be closed and the sheet formatted as in the example sheets to highlight the data. Thanks for any help or guidance you can provide.

Here is the link to the files on OneDrive:

Also asked here
Macro to extract text from Word doc
 
Last edited by a moderator:

Excel Facts

Spell Check in Excel
Press F7 to start spell check in Excel. Be careful, by default, Excel does not check Capitalized Werds (whoops)
Try the following Excel macro:
VBA Code:
Sub GetWordData()
'Note: this code requires a reference to the Word object model. See under the VBE's Tools|References.
Application.ScreenUpdating = False
Dim wdApp As New Word.Application, wdDoc As Word.Document, c As Long, r As Long
Dim strFolder As String, strFile As String, WkBk As Workbook, WkSht As Worksheet
strFolder = GetFolder: If strFolder = "" Then GoTo ErrExit
Set WkBk = ActiveWorkbook
strFile = Dir(strFolder & "\*.docx", vbNormal)
While strFile <> ""
  Set wdDoc = wdApp.Documents.Open(Filename:=strFolder & "\" & strFile, AddToRecentFiles:=False, Visible:=False)
  Set WkSht = WkBk.Sheets.Add: r = 4
  WkSht.Name = Split(strFile, ".doc")(0)
  WkBk.Sheets(1).Range.Copy
  WkSht.Paste
  WkSht.Range("A2").Value = WkSht.Name
  With wdDoc
    With .Range
      With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        'Find blocks of text of interest
        .Text = "Uid:*Units:*^13"
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindStop
        .MatchWildcards = True
        .Execute
      End With
      Do While .Find.Found
        r = r + 1
        'Parse & write the text to Excel
        For c = 1 To 4
          WkSht.Cells(r, c).Value = Trim(Split(Split(.Text, vbCr)(c - 1), ":")(1))
        Next
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
    .Close SaveChanges:=False
  End With
  strFile = Dir()
Wend
ErrExit:
wdApp.Quit
Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing: Set WkBk = Nothing
Application.ScreenUpdating = True
End Sub
 
Function GetFolder() As String
    Dim oFolder As Object
    GetFolder = ""
    Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
    If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
    Set oFolder = Nothing
End Function
As coded, the macro assumes your workbook's first worksheet is formatted with the required layout and contains just the first four rows that are to appear on all worksheets.
 
Upvote 0
Hi Paul. Thanks for the quick response and appreciate your help but I'm a true novice when it comes to VBA coding as you'll come to realize by my following questions.
  • What exactly do I need to do to accomplish this "this code requires a reference to the Word object model"? I found the References under Tools–do I need to create or find a specific Reference other than those that are checked? I saw the Microsoft Word Object Library and selected it in addition to what was already checked.
  • What changes to your code do I need to make to tailor it so as to find the referenced Word documents? Where do I need to paste in the file/directory path to the specified files? Sorry for being such a rookie at this.
Here's what I've tried so far:​
  • Copied the code into the workbook's editor.
  • Went into Tools/References and selected/added Microsoft Word 16.0 Object Library
  • Changed/designated the files' folder by modifying the GetFolder = ""
    • Function GetFolder() As String
      Dim oFolder As Object
      GetFolder = "my file/directory path"
  • Went into Debug and began stepping through the code. Got this error "Run-time error '429': ActiveX component can't create object" when it tried to execute Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
Again, my deepest apologies for being so green but I appreciate your expertise. I know enough to know this can only be accomplished using VBA but don't have the skills to pull it off.
 
Upvote 0
You have apparently set the Word reference correctly.

You should not have messed with the GetFolder function. The function exists to allow you to choose a source folder when you run the macro. If you want to hard-code the path, you can delete the function entirely and change:
strFolder = GetFolder: If strFolder = "" Then GoTo ErrExit
to:
strFolder = "my file/directory path"
 
Upvote 0
Unfortunately, those changes didn't allow the macro to run either. I tried pasting the macro in as written and then stepping through the function, however, the browser didn't pop up to locate the folder where the Word files are located. Instead, I got an error message stating that "ActiveX component can't create object". So, I then tried hard-coding the path. Apparently that worked fine as a pop-up window appeared and said I had to grant permission for the file to be accessed (which I did). I continued to step through the macro but a multitude of steps were jumped over as the highlight went from While strFile <> "" directly to wdApp.Quit. So, not sure how the macro was supposed to behave, I stepped again and Word launched as well as another error message appeared this time saying "out of memory".

I'm sure a lot of these issues are arising because this is all being done on a Mac and I am trying to incorporate a Word document. So, with your indulgence, I'd like to propose a slight change to the approach I'm taking. I am also working via a remote Windows desktop and incorporated into the application is Excel 2013 which has VBE. I've already transferred most of the project over to it and I tried your code as written in the new worksheet. The function to find the folder worked - SOLVED. The biggest change is the file format. I will save the documents on the remote desktop as text files (.txt) and the application available to read them is Notepad. Would it be too much trouble to modify the code with these
changes?

Thanks for your patience and help.
 
Upvote 0
I'm sure a lot of these issues are arising because this is all being done on a Mac and I am trying to incorporate a Word document.
Well, the fact you're using a Mac makes a fundamental difference. You should have said so at the outset. The code I provided is for a Windows PC; it will not work on a Mac.
So, with your indulgence, I'd like to propose a slight change to the approach I'm taking. I am also working via a remote Windows desktop and incorporated into the application is Excel 2013 which has VBE.
The code should work just fine there.
I've already transferred most of the project over to it and I tried your code as written in the new worksheet. The function to find the folder worked - SOLVED. The biggest change is the file format. I will save the documents on the remote desktop as text files (.txt) and the application available to read them is Notepad. Would it be too much trouble to modify the code with these changes?
Another crucial change. Notepad cannot be automated this way. Why, if you're using .txt files did you supply .docx files for testing? The processing can still be done with Word, however. Simply change:
strFile = Dir(strFolder & "\*.docx", vbNormal)
to:
strFile = Dir(strFolder & "\*.txt", vbNormal)
and change:
WkSht.Name = Split(strFile, ".doc")(0)
to:
WkSht.Name = Split(strFile, ".txt")(0)
 
Upvote 0
Some further code revisions for you to implement -
Change:
VBA Code:
  Set WkSht = WkBk.Sheets.Add: r = 4
  WkSht.Name = Split(strFile, ".doc")(0)
  WkBk.Sheets(1).Range.Copy
  WkSht.Paste
  WkSht.Range("A2").Value = WkSht.Name
to:
VBA Code:
  With WkBk
    Set WkSht = .Sheets.Add
    .Sheets("Sheet1").UsedRange.Copy
  End With
  With WkSht
    .Paste
    .Name = Split(strFile, ".doc")(0)
    .Range("A2").Value = WkSht.Name
    For c = 1 To 5
      .Columns(c).ColumnWidth = Worksheets("Sheet1").Columns(c).ColumnWidth
    Next
    For r = 1 To 4
      .Rows(r).RowHeight = Worksheets("Sheet1").Rows(r).RowHeight
    Next
  End With
  r = 4
Be aware that, given the amount of data to process in your samples, you should expect the code to take 20-30 seconds per file.
Note, too, that the revised code expects the 'master' sheet to be named Sheet1.
 
Upvote 0

Forum statistics

Threads
1,213,489
Messages
6,113,947
Members
448,534
Latest member
benefuexx

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top