Convert Formula to Sub : Macro to remove duplicate text within a cell

abramo

New Member
Joined
May 14, 2013
Messages
31
I have kindly been provided the following Formulas, each intended to remove duplicate text within a individual cells in a selected column. I would appreciate if these were incorporated in Sub macro form, which I know how to use (insert in Module and access and access and use through the Macro menu).


More specifically, issue is as follows:
Cells in a selected column contain duplicate text that needs to be removed as follows, :

- apple apple other text banana banana banana other text banana other text apple
Result after Macro should be:
- apple other text banana other text banana other text apple


This means Macro needs to delete ONLY all duplicate text that is located consequent (next) to each other, being repetitive text by error, rather than delete all duplicate text in the cell. Meaning, if text is not a repetitive duplicate in sequence (such as: boat hull, boat hull) then it should be left as is, despite found multiple times within the cell.


Function 1 expected to do this, is as follows:


Code:
Option Explicit
Private colOutput As New Collection
Public Function split_cell(str As String) As String
Dim strInput() As String
Dim iArrayCount As Integer
Dim iCollectionCount As Integer
Dim strWord As String
strInput = Split(Range("a1"), Chr(32))
For iArrayCount = 0 To UBound(strInput)
    '   Get the word
    strWord = strInput(iArrayCount)
    If Not KeyExist(strWord) Then
        If colOutput.Count = 0 Then
            colOutput.Add strWord, strWord
        Else
            colOutput.Add strWord, strWord, , colOutput.Count
        End If
    End If
Next iArrayCount
For iCollectionCount = 1 To colOutput.Count
    If iCollectionCount = 1 Then
        split_cell = split_cell & colOutput(iCollectionCount)
    Else
        split_cell = split_cell & Chr(32) & colOutput(iCollectionCount)
    End If
Next iCollectionCount
End Function
Public Function KeyExist(strKeyName As String) As Boolean
Dim i As Integer
KeyExist = False
If Not colOutput Is Nothing Then
    If colOutput.Count = 0 Then Exit Function
    For i = 1 To colOutput.Count
        If colOutput.Item(i) = strKeyName Then
            KeyExist = True
            Exit For
        End If
    Next i
End If
End Function


Function 2 expected to do the same job:


Code:
Function UniTerms(rngString As Range, Optional strDelim As String = " ") As String
vW = Split(rngString, strDelim)
U = vW(0)
For j = 1 To UBound(vW)
  If vW(j) <> vW(j - 1) Then U = U & strDelim & vW(j)
Next
UniTerms = U
End Function

Well, sorry to say I did not manage to test and use any of the functions above. I am a simple user that only knows how to insert a macro in a module and use that macro from the Macro menu to process data in cells of a selected column. The above functions, as useful as I trust they are, they are of no use to me. They have to be incorporated in a Sub, which thereafter I know how to use.

If someone "out there" can spare some time to apply these functions within a macro I can use, this, needless to say, will be of great help to me and to many other "non-experts" looking for such a solution. Thanks.
.
 

Excel Facts

How to total the visible cells?
From the first blank cell below a filtered data set, press Alt+=. Instead of SUM, you will get SUBTOTAL(9,)
Select the data you want to convert and the run this macro.

Code:
[COLOR=darkblue]Sub[/COLOR] Adjacent_UniTerms()
    
    [COLOR=darkblue]Dim[/COLOR] vData [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR], v [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] i [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR], j [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR], k [COLOR=darkblue]As[/COLOR] Long
    [COLOR=darkblue]Const[/COLOR] strDelimiter [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]String[/COLOR] = " "
    
    [COLOR=darkblue]If[/COLOR] Selection.Count > 100000 [COLOR=darkblue]Then[/COLOR]
        [COLOR=darkblue]If[/COLOR] MsgBox(Format(Selection.Count, "#,000") & " cells selected. " & vbLf & vbLf & _
                "Do you really want to continue? ", vbExclamation + vbYesNo, _
                "Large Selection") = vbNo [COLOR=darkblue]Then[/COLOR] [COLOR=darkblue]Exit[/COLOR] [COLOR=darkblue]Sub[/COLOR]
    [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
    
    [COLOR=darkblue]If[/COLOR] Selection.Count = 1 [COLOR=darkblue]Then[/COLOR]
        [COLOR=darkblue]ReDim[/COLOR] vData(1 [COLOR=darkblue]To[/COLOR] 1, 1 [COLOR=darkblue]To[/COLOR] 1)
        vData(1, 1) = Selection.Value
    [COLOR=darkblue]Else[/COLOR]
        vData = Selection.Value
    [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
        
    [COLOR=darkblue]For[/COLOR] i = [COLOR=darkblue]LBound[/COLOR](vData, 1) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](vData, 1)
        [COLOR=darkblue]For[/COLOR] j = [COLOR=darkblue]LBound[/COLOR](vData, 2) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](vData, 2)
            [COLOR=darkblue]If[/COLOR] InStr(vData(i, j), strDelimiter) [COLOR=darkblue]Then[/COLOR]
                v = Split(vData(i, j), strDelimiter)
                vData(i, j) = v(0)
                [COLOR=darkblue]For[/COLOR] k = [COLOR=darkblue]LBound[/COLOR](v) To [COLOR=darkblue]UBound[/COLOR](v) - 1
                    [COLOR=darkblue]If[/COLOR] Trim(v(k)) <> Trim(v(k + 1)) [COLOR=darkblue]Then[/COLOR]
                        vData(i, j) = vData(i, j) & strDelimiter & v(k + 1)
                    [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
                [COLOR=darkblue]Next[/COLOR] k
            [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
    [COLOR=darkblue]Next[/COLOR] j, i
    
    Selection = vData
    
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Sub[/COLOR]
 
Upvote 0
@AlphaFrog : you are my hero! suggested macro works perfect.


Dare I now introduce some "nuances" to the required solution?
(OP now wearing helmet as precaution)

- duplicate text term not necessarily followed by space but alternatively by some punctuation mark (semi colon, dot, etc)
- duplicate text term in plural (followed by "s") example:

Code:
nice apple nice apples;

whereby we keep the first instance irrespective if it is single or plural (if first instance is plural we keep plural)


(OP now in hiding, still wearing helmet as precaution)
.
 
Last edited:
Upvote 0
@AlphaFrog : you are my hero! suggested macro works perfect.
Here is an much shorter alternate to AlphaFrog's macro (I think it should be faster as well, but that is just an untested guess)...
Code:
Sub RemoveAdjacentDuplicateWords()
  Dim X As Long, V As Variant, Rng As Range, Text As String, Txt() As String
  For Each Rng In Selection
    Text = Rng.Value
    Txt = Split(Text)
    For X = 0 To UBound(Txt)
      For Each V In Array(121, 13, 5, 3, 3, 2)
        Text = Replace(Text, Txt(X) & " " & Txt(X), Txt(X))
      Next
    Next
    Rng = Text
  Next
End Sub


Dare I now introduce some "nuances" to the required solution?
(OP now wearing helmet as precaution)

- duplicate text term not necessarily followed by space but alternatively by some punctuation mark (semi colon, dot, etc)
I'll look into modifying my code for this (should not be too difficult)... I'll be back shortly.

Dare I now introduce some "nuances" to the required solution?
(OP now wearing helmet as precaution)

- duplicate text term in plural (followed by "s") example:

Code:
nice apple nice apples;
This one is problematic because not all plurals are formed by adding an 's' to the end of a word (for example, country/countries and glass/glasses). Assuming you don't care about those kinds of plurals and want to restrict it to just 's' words, would it be possible for the first occurrence to be either singular or plural?
 
Last edited:
Upvote 0
Dare I now introduce some "nuances" to the required solution?
(OP now wearing helmet as precaution)

- duplicate text term not necessarily followed by space but alternatively by some punctuation mark (semi colon, dot, etc)
I'll look into modifying my code for this (should not be too difficult)... I'll be back shortly.
In thinking about this, I think we need more direction from you. Consider the following...

apple, apple other text (banana) banana banana; other text banana other text apple

Do we keep the comma attached to the first "apple"? What about the parentheses around the first "banana"? And what should happen to the semi-colon attached to the third "banana" in a row after that banana is removed (if it stays, then do we remove the space in front of it)?
 
Last edited:
Upvote 0
@Rick : thumbs up for a truly kosher solution!
Needless to say, works as advertised.

Hope it will be possible to apply this to adjacent identical "short phrases", punctuation marks and plurals?

Code:
nice apple, nice apples;

Knowing one cannot have it all in life (life being a b****) let's have the best we can do. Thanks!

EDIT: This post written while not having seen Rick's last post : reply coming soonest!
 
Last edited:
Upvote 0
@Rick : thanks -- you got me there, but it would not spoil a vast eternal plan if we kept the first instance, in whatever form it happens to come -- in real life, on "average" the first instance of the adjacent duplicates is usually the most "desirable" of duplicates.

.
 
Upvote 0
Hope it will be possible to apply this to adjacent identical "short phrases", punctuation marks and plurals?

Code:
nice apple, nice apples;
You will be addressing the punctuation marks, so I'll wait for that response; however removing duplicate short phrases... that will be a challenge I think. The program would need some way to search one, two, three four, etc. words for the same occurrence following it... and it would have to do that from each word in the "sentence". Sounds like that would be a major slowdown to me... plus I am not sure how to do that at the moment... and you want to complicate things by adding internal punctuations on top of all that. I kind of think you are asking more than we might be able to produce for you. I'll think on it (after I see your response about the punctuations).
 
Upvote 0
@Rick : thanks -- you got me there, but it would not spoil a vast eternal plan if we kept the first instance, in whatever form it happens to come -- in real life, on "average" the first instance of the adjacent duplicates is usually the most "desirable" of duplicates.
That still complicates things quite a bit. What about if the text that was returned to you had no punctuations at all... only letters and spaces? That would be relatively easy to do.
 
Upvote 0
@Rick : Thanks -- sure, suggested solution (without keeping any punctuation) will do fine for this application, as it is not so "demanding" in this respect.

.
 
Upvote 0

Forum statistics

Threads
1,214,957
Messages
6,122,466
Members
449,086
Latest member
kwindels

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top