Duplicate Text Removal

euanh

New Member
Joined
Jan 7, 2004
Messages
20
I have a list of addresses that I get sent from an external source in Excel and the list has some duplicates in it which I need to remove.

For example and entry might be : J Bloggs 1 any street Any Town London Town London Town (the duplicate being London Town)

I have a macro that can remove single word duplicates, ie: London London, but I need one that will remove duplicates of more than one word as in the example above.

The duplicates always occur next to each other so the cell won't start London Town with text inbetween the duplicate entries.

Does anyone know whether there is a macro/UDF that I could use to remove these entries ?

Ordinarily I would go back to the source and ask them to stop duplicating the entry however this will take weeks as it's an external company.

Any help greatly appreciated.
Euan :unsure:
 

Excel Facts

Copy a format multiple times
Select a formatted range. Double-click the Format Painter (left side of Home tab). You can paste formatting multiple times. Esc to stop

pgc01

MrExcel MVP
Joined
Apr 25, 2006
Messages
19,884
Hi Euan

Would a macro that removes any duplicate words in a text do?

Won't that be a problem for

Jack London 1 any street Any Town London Town London Town

results in

Jack London 1 any street Any Town

Since London already appears 1 time in the name, it won't show in the address.

Kind regards
PGC
 

euanh

New Member
Joined
Jan 7, 2004
Messages
20
Hi,

The function I am using at the moment is as follows.


Function RemoveDup(str1)

Dim regEx

Set regEx = CreateObject("vbscript.regexp")

regEx.Global = False

regEx.Pattern = "\b(\w+)\b\s+\1"

regEx.IgnoreCase = True

RemoveDup = regEx.Replace(str1, "$1")

End Function



=RemoveDup(A1)

As for removing words that appear twice in say the address but also in the name, this could cause a problem so if I can get a macro to do this by looking for duplicated words that apear is sequence this would be better.

Thanks
Euan
 

pgc01

MrExcel MVP
Joined
Apr 25, 2006
Messages
19,884
Hi Euan

I replaced the pattern.

Please try

Code:
Function RemoveDup(sS)

With CreateObject("vbscript.regexp")
  .Pattern = "\b(\w+)\b\s+\b(\w+)\b\s+\1\s+\2"
  .IgnoreCase = True
  .Global = False
   RemoveDup = .Replace(sS, "$1 $2")
End With

End Function

Hope this helps
PGC

EDIT: simplified the pattern
Book2
ABCDE
1JBloggs1anystreetAnyTownLondonTownLondonTownJBloggs1anystreetAnyTownLondonTown
2MyohmyohmyMyohmy
3repeat1repeat2repeat2repeat1repeat2
4arepeatarepeatarepeat
5 
6
Sheet3
 

Forum statistics

Threads
1,136,926
Messages
5,678,610
Members
419,776
Latest member
mikelowski

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top