MSWord Delete duplicate words in selected text

mole999

Well-known Member
Joined
Oct 23, 2004
Messages
10,524
Office Version
  1. 2019
  2. 2016
  3. 2013
Platform
  1. Windows
As per the title, I'm looking for a bit of code so that selected text will have all duplicate words removed ONLY in the selection, effectively left with a word cloud, even better would to be have an extensible static list in the VBA to delete whole words, and any word less than three characters so (the, if, as) would all go as well, icing on the cake, then sort the words alphabetically so similar words can be identified quickly

not asking much am I, nothing I have seen on the net gets close

ONLY SELECTED
LESS THAN FOUR CHARACTERS
EXCLUDED WORDS
SORTED
 

Excel Facts

When did Power Query debut in Excel?
Although it was an add-in in Excel 2010 & Excel 2013, Power Query became a part of Excel in 2016, in Data, Get & Transform Data.
So I have made a fudge to this, via splitting and sorting as paragraphs

I want to loop a set of redundant words

How best to build an array for How And Its, with a space either side to feed into the code as a loop

Code:
With Selection.Find
        .Text = " How "
        .Replacement.Text = " "
        .Forward = True
        .Wrap = wdFindAsk
        .Format = False
        .MatchCase = True
        .MatchWholeWord = True
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll


    With Selection.Find
        .Text = " its "
        .Replacement.Text = " "
        .Forward = True
        .Wrap = wdFindAsk
        .Format = False
        .MatchCase = True
        .MatchWholeWord = True
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
 
Last edited:
Upvote 0
My hashed attempt at resolving my issue, using macro recorder and some items from the web
1. Manually copy all the text I want to look at to a new document
2. Select the story, change case to proper, turn off alerts, set the font and size, remove bullets etc
3. Remove punctuation
4. Replace all spaces with a paragraph mark ^p
5. sort by paragraph
6. DeleteDuplicateParagraphs

Code:
Sub DeleteDuplicateParagraphs()
    'PURPOSE: Remove Duplicate Paragraphs Throughout the Entire Word Document
    'SOURCE: www.TheSpreadsheetGuru.com/the-code-vault
    Dim p1 As Paragraph
    Dim p2 As Paragraph
    Dim DupCount As Long

    DupCount = 0

    For Each p1 In ActiveDocument.Paragraphs
        If p1.Range.Text <> vbCr Then    'Ignore blank paragraphs

            For Each p2 In ActiveDocument.Paragraphs
                If p1.Range.Text = p2.Range.Text Then
                    DupCount = DupCount + 1
                    If p1.Range.Text = p2.Range.Text And DupCount > 1 Then p2.Range.Delete
                End If
            Next p2

        End If

        'Reset Duplicate Counter
        DupCount = 0

    Next p1

End Sub

7. remove ^p and replace with space
8. Delete high volume words "And It Was When" etc (need a loop to be efficient)
9. Remove double spaces

nick it, use it, improve it
 
Upvote 0
As per the title, I'm looking for a bit of code so that selected text will have all duplicate words removed ONLY in the selection, effectively left with a word cloud, even better would to be have an extensible static list in the VBA to delete whole words, and any word less than three characters so (the, if, as) would all go as well, icing on the cake, then sort the words alphabetically so similar words can be identified quickly

not asking much am I, nothing I have seen on the net gets close

ONLY SELECTED
LESS THAN FOUR CHARACTERS
EXCLUDED WORDS
SORTED
Deleting the duplicates is trivial, as is compiling a sorted list, but which words go into the list - all of the selection, the duplicates, something else? Where does the list go?
 
Upvote 0
I want something like,
the words to be removed will be added to the code "Been, By, Call, But" type stuff

I lifted a bit of yours from msofficeforums for dealing with punctuation back in 2011

MyArray

With Selection.Find
.Text = MyArray
.Replacement.Text = " "
.Forward = True
.Wrap = wdFindAsk
.Format = False
.MatchCase = True
.MatchWholeWord = True
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll

The list will be static (in the VBA) until I update what is in the list
 
Last edited:
Upvote 0
To delete a series of words you could use something like:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim StrFnd As String, StrRep As String, i As Long
StrFnd = "One,Two,Three,Four,Five,Six"
With ActiveDocument.Range.Find
  .ClearFormatting
  .Replacement.ClearFormatting
  .Forward = True
  .Wrap = wdFindContinue
  .MatchCase = True
  .MatchWholeWord = True
  .MatchWildcards = False
  .MatchSoundsLike = False
  .MatchAllWordForms = False
  .Replacement.Text = ""
  For i = 0 To UBound(Split(StrFnd, ","))
    .Text = Split(StrFnd, ",")(i)
    .Execute Replace:=wdReplaceAll
  Next
End With
Application.ScreenUpdating = True
End Sub
But I can't see what this has to do with duplicates or adding items to a list.
 
Upvote 0
I'm trying to take

This book is provided on the worldwide web as a service to the community of practitioners and students. Reproduction for educational purposes is permitted with appropriate citation. If you find this work helpful or have suggestions for additions or corrections, please email Chris Hendrickson: cth@cmu.edu. A hardcopy Instructor's Manual with problem solutions is available for a fee of $ 10 to cover reproduction, mailing and handling. Send a check made out to Carnegie Mellon University to Ms. Patty Langer, Department of Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA 15213.

This book develops a specific viewpoint in discussing the participants, the processes and the techniques of project management for construction. This viewpoint is that of owners who desire completion of projects in a timely, cost effective fashion. Some profound implications for the objectives and methods of project management result from this perspective:
•The "life cycle" of costs and benefits from initial planning through operation and disposal of a facility are relevant to decision making. An owner is concerned with a project from the cradle to the grave. Construction costs represent only one portion of the overall life cycle costs.
•Optimizing performance at one stage of the process may not be beneficial overall if additional costs or delays occur elsewhere. For example, saving money on the design process will be a false economy if the result is excess construction costs.
•Fragmentation of project management among different specialists may be necessary, but good communication and coordination among the participants is essential to accomplish the overall goals of the project. New information technologies can be instrumental in this process, especially the Internet and specialized Extranets.
•Productivity improvements are always of importance and value. As a result, introducing new materials and automated construction processes is always desirable as long as they are less expensive and are consistent with desired performance.
•Quality of work and performance are critically important to the success of a project since it is the owner who will have to live with the results.

to

Accomplish Additional Additions Always Among Appropriate Are Automated Available Beneficial Benefits Book Can Carnegie Check Chris Citation. Civil Communication Community Completion Concerned Consistent Construction Construction. Coordination Corrections, Cost Costs Costs. Cover Cradle Critically Cth@Cmu.Edu. Cycle Cycle" Decision Delays Department Design Desirable Desire Desired Develops Different Discussing Disposal Economy Educational Effective Elsewhere. Email Engineering, Environmental Especially Essential Example, Excess Expensive Extranets. Facility False Fashion. Fee Find Fragmentation Goals Good Grave. Handling. Hardcopy Helpful Hendrickson: If Implications Importance Important Improvements Information Initial Instructor's Instrumental Internet Introducing Is Langer, Less Life Live Long Made Mailing Making. Management Manual Materials May Mellon Methods Money Ms. Necessary, New Objectives Occur One Only Operation Optimizing Out Overall Owner Owners PA Participants Participants, Patty Performance Performance. Permitted Perspective: Pittsburgh, Planning Please Portion Practitioners Problem Process Process, Processes Productivity Profound Project Project. Projects Provided Purposes Quality Relevant Represent Reproduction Reproduction, Result Result, Results. Saving Send Service Since Solutions Some Specialists Specialized Specific Stage Students. Success Suggestions Techniques Technologies Through Timely, University University, Value. Viewpoint Web Who Will Work Worldwide You

The list will remove "You,Out,Who,Will" etcetera
 
Upvote 0
So, do you want to delete all words of 3 characters or less, or only words from a predefined list?
 
Upvote 0
I would like to know how to delete three characters or less within the module, thanx
 
Upvote 0
Try the following - I've included code for both deleting items from an exclusion list and deleting words of 3 characters or less. There's some overlap, but you can delete/comment-out whichever portions you don't need.
Code:
Sub ListBuilder()
Application.ScreenUpdating = False
Dim StrIn As String, StrTmp As String, StrExcl As String, StrOut() As String
Dim i As Long, j As Long
'Define the exlusions list
StrExcl = "A,Am,An,And,Are,As,At,B,Be,But,By,C,Can,Cm,D,Did," & _
          "Do,Does,E,Eg,En,Eq,Etc,F,For,G,Get,Go,Got,H,Has,Have," & _
          "He,Her,Him,How,I,Ie,If,In,Into,Is,It,Its,J,K,L,M,Me," & _
          "Mi,Mm,My,N,Na,Nb,No,Not,O,Of,Off,Ok,On,One,Or,Our,Out," & _
          "P,Q,R,Re,S,She,So,T,The,Their,Them,They,This,T,To,U,V," & _
          "Via,Vs,W,Was,We,Were,Who,Will,With,Would,X,Y,Yd,You,Your,Z"
With ActiveDocument.Range
  'Delete words of 3 characters or less
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "<[! ]{1,3}>"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindContinue
    .Format = False
    .MatchWildcards = True
    .Execute Replace:=wdReplaceAll
  End With
  'Capitalise the first letter of each word
  .Case = wdTitleWord
  'Get the document's text
  StrIn = .Text
  'Strip out unwanted characters. Amongst others, hyphens and formatted single quotes are retained at this stage
  For i = 1 To 255
    Select Case i
      'To strip out unwanted characters
      Case 1 To 38, 40 To 63, 91 To 96, 123 To 144, 147 To 149, 152 To 171, 174 To 191, 247
      StrIn = Replace(StrIn, Chr(i), " ")
    End Select
  Next
  'Convert smart single quotes to plain single quotes & delete any at the start/end of a word
  StrIn = Replace(Replace(Replace(Replace(StrIn, Chr(145), "'"), Chr(146), "'"), "' ", " "), " '", " ")
  'Process the exclusions list
  For i = 0 To UBound(Split(StrExcl, ","))
    While InStr(StrIn, " " & Split(StrExcl, ",")(i) & " ") > 0
      StrIn = Replace(StrIn, " " & Split(StrExcl, ",")(i) & " ", " ")
    Wend
  Next
  'Clean up any duplicate spaces
  While InStr(StrIn, "  ") > 0
    StrIn = Replace(StrIn, "  ", " ")
  Wend
  StrIn = " " & Trim(StrIn) & " "
  j = UBound(Split(StrIn, " "))
  For i = 1 To j
    StrTmp = Split(StrIn, " ")(1)
    While InStr(StrIn, " " & StrTmp & " ") > 0
      StrIn = Replace(StrIn, " " & StrTmp & " ", " ")
    Wend
    'Update the output array
    ReDim Preserve StrOut(i - 1)
    StrOut(i - 1) = StrTmp
    If UBound(Split(StrIn, " ")) = 1 Then Exit For
  Next
  WordBasic.SortArray StrOut()
  .Text = Join(StrOut, ", ")
End With
Application.ScreenUpdating = True
End Sub
 
Upvote 0

Forum statistics

Threads
1,214,914
Messages
6,122,211
Members
449,074
Latest member
cancansova

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top