60000 Rows of 1 sentence each. Remove duplicates of words (Case-Sensitive), keep 1.

Singh_Edm

New Member
Joined
Dec 18, 2013
Messages
30
Hi there, Good day!
I searched the forum for 'duplicates' and read about 20 posts from the 6 pages of results but I couldn't find an answer. I think the code I am requesting assistance for might be a very short one.

I have an excel file with 60,000 rows and 1 column. Each row has a sentence.

For e.g.
I like the sunshine.
Bill also likes the Sunshine.

The desired output is a column that has all the words with duplicates removed (case-sensitive). For e.g.
I
like
the
sunshine
also
likes
Sunshine

Please help. Thanks.
 

Excel Facts

Remove leading & trailing spaces
Save as CSV to remove all leading and trailing spaces. It is faster than using TRIM().
Singh_Edm,

1. What version of Excel, and, Windows are you using?

2. Are you using a PC or a Mac?

Here is a macro solution for you to consider, to get you started in the right direction, based on the flat text you have displayed.

Sample raw data, and, results:


Excel 2007
ABC
1I like the sunshine.I
2Bill also likes the Sunshine.like
3the
4sunshine
5Bill
6also
7likes
8Sunshine
9
Sheet1


Please TEST this FIRST in a COPY of your workbook (always make a backup copy before trying new code, you never know what you might lose).

Code:
Sub GetUniqueWords()
' hiker95, 11/13/2015, ME902034
Dim rng As Range, c As Range, s, i As Long
Application.ScreenUpdating = False
Set rng = Range("A1:A" & Range("A" & Rows.Count).End(xlUp).Row)
With CreateObject("Scripting.Dictionary")
  For Each c In rng
    If c <> "" Then
      s = Split(Trim(c), " ")
      For i = LBound(s) To UBound(s)
        If Not .Exists(s(i)) Then
          .Add s(i), s(i)
        End If
      Next i
    End If
  Next c
  Range("C1").Resize(.Count) = Application.Transpose(.Keys)
End With
With Range("C1:C" & Range("C" & Rows.Count).End(xlUp).Row)
  .Value = Evaluate("IF(ROW(),SUBSTITUTE(" & .Address & ",""."",""""),"""")")
End With
Application.ScreenUpdating = True
End Sub

Before you use the macro with Excel 2007 or newer, save your workbook, Save As, a macro enabled workbook with the file extension .xlsm, and, answer the "do you want to enable macros" question as "yes" or "OK" (depending on the button label for your version of Excel) the next time you open your workbook.

Then run the GetUniqueWords macro.
 
Upvote 0
Hi hiker95,
IT WORKED!!!!!!!!!!!!
You won't believe it. It took less than 2 seconds for it to go through 60624 rows, each with a sentence of around 7-15 words and then it gave the desired output in Column C which is of 29677 rows! Less than 2 seconds!!!!!!! Thanks

I have another excel column which has words where Remove Duplicates function was used 2 years back but it wasn't case sensitive so it is about 26000 rows. How can I compare the two columns and spit out in another column those words that are not present in the 26000 but they are in the 29677 column?
 
Upvote 0
Hi hiker95,
IT WORKED!!!!!!!!!!!!
You won't believe it. It took less than 2 seconds for it to go through 60624 rows, each with a sentence of around 7-15 words and then it gave the desired output in Column C which is of 29677 rows! Less than 2 seconds!!!!!!! Thanks

Singh_Edm,

Thanks for the feedback.

You are very welcome. Glad I could help.

And, come back anytime.
 
Upvote 0
I have another excel column which has words where Remove Duplicates function was used 2 years back but it wasn't case sensitive so it is about 26000 rows. How can I compare the two columns and spit out in another column those words that are not present in the 26000 but they are in the 29677 column?

Singh_Edm,

In order to continue, and, so that I can get it right the first time, I will have to see your actual workbook/worksheets.

To start, you can post your workbook/worksheets to the following free site (sensitive data changed), and provide us with a link to your workbook:

https://dropbox.com


If you are not able to provide the above, then:

Click on the Reply to Thread button, and just put the word BUMP in the thread. Then, click on the Post Quick Reply button, and someone else will assist you.
 
Upvote 0
BUMP
Hi
I will provide a simple case.
Column1 is
Tom
tom
Jill
jill
Joe

<tbody>
</tbody>

Column2 is
Tom
Jill
Suzy

<tbody>
</tbody>


I want to compare both and then spill out what's absent in Column 2 but present in Column 1
tom
jill
Joe

<tbody>
</tbody>

Thanks,
 
Last edited:
Upvote 0
I want to compare both and then spill out what's absent in Column 2 but present in Column 1
Try

Rich (BB code):
Sub FindMissing()
  Dim d As Object
  Dim a As Variant, e As Variant
  
  Set d = CreateObject("scripting.dictionary")
  a = Range("A1", Range("A" & Rows.Count).End(xlUp)).Value
  For Each e In a
    d(e) = 1
  Next e
  a = Range("B1", Range("B" & Rows.Count).End(xlUp)).Value
  For Each e In a
    If d.exists(e) Then d.Remove (e)
  Next e
  If d.Count Then Range("C1").Resize(d.Count).Value = Application.Transpose(d.keys)
End Sub
 
Upvote 0

Forum statistics

Threads
1,215,129
Messages
6,123,217
Members
449,091
Latest member
jeremy_bp001

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top