Is there a way (formulaic or vba solution) to create a listing of all words in a column and the number

Galena1

Active Member
Joined
Jun 10, 2010
Messages
293
of occurrences for each word?

I have a column of text. There are 500 or so records. What I what is to create a list (on a separate tab) of every unique word in that column and the number of times it occurs. I know this list would be exhaustive, but I'm sure Excel can handle it. I also know that words such as 'the' and 'and' would have the highest number of counts (occurrences), but I can chop them off manually. What I want to end up with is a Top 20 Listing of Unique Words that are not generic.

Anyone have the solution for this?

An even more robust approach would be to search all 50 columns in my spreadsheet tracker and return every unique string and a count of the number of occurrences.

Thanks in advance for thinking about my challenge.

:)
 

Some videos you may like

Excel Facts

Copy PDF to Excel
Select data in PDF. Paste to Microsoft Word. Copy from Word and paste to Excel.

JoeMo

MrExcel MVP
Joined
May 26, 2009
Messages
16,747
By "records" do you mean strings? If so, is each word in a string separated from the next by a space? Is there punctuation like . or , or ; or : .....?

Can you post a small sample of the contents of a typical cell in the column?
 

Akashwani

Well-known Member
Joined
Mar 14, 2009
Messages
2,911
Hi,

If you only have one word per cell, try this......

How to extract a unique distinct list from a column in excel | Get Digital Help - Microsoft Excel resource

Sample data and example formulas......

Excel Workbook
ABCDE
1DataUnique ListCount
2Apple
Apple
3
3OrangeOrange1
4PearPear1
5AppleBanana2
6BananaGrapes2
7BananaPlum1
8GrapesMelon2
9AppleCherry1
10GrapesLemon1
11Plum
12Melon
13Cherry
14Lemon
15Melon
16
Sheet3


The formula in C2 needs entering with ctrl shift enter NOT just enter, you can then copy it down.
You will obviously need to change the cell references to suit your layout.

I hope that helps.

Ak
 

Galena1

Active Member
Joined
Jun 10, 2010
Messages
293
By "records" do you mean strings? If so, is each word in a string separated from the next by a space? Is there punctuation like . or , or ; or : .....?

Can you post a small sample of the contents of a typical cell in the column?
On 22-JAN-2013, PV Systems confirmed to Global Medical Safety (GMS), PV Operations (PV Ops) that the “The Request Investigation Comment” Icon in SCEPTRE used for Automated Referral to Complaint Vigilance (CV), was not functioning as designed. If the case did not generate a change log in Compliant Vigilance’s system due to an issue on the product Quality Management System (PQMS) side of the feed, the automated process will not trigger. Due to this issue, section 7 and section 8 of WI-06466 must be followed for consumer cases.

They are huge strings of text. Problem statements. No 2 are alike. They are all unique. There are no delimiters as it is conversational annotation (e.g. free form)
 

Galena1

Active Member
Joined
Jun 10, 2010
Messages
293
Not one word. Entire textual paragraphs. Free form. No delimiters. Each cell is totally random and contains up the the Excel limit of cell characters allowed, in some cases.

On 22-JAN-2013, PV Systems confirmed to Global Medical Safety (GMS), PV Operations (PV Ops) that the “The Request Investigation Comment” Icon in SCEPTRE used for Automated Referral to Complaint Vigilance (CV), was not functioning as designed. If the case did not generate a change log in Compliant Vigilance’s system due to an issue on the product Quality Management System (PQMS) side of the feed, the automated process will not trigger. Due to this issue, section 7 and section 8 of WI-06466 must be followed for consumer cases.
 

JoeMo

MrExcel MVP
Joined
May 26, 2009
Messages
16,747
Not one word. Entire textual paragraphs. Free form. No delimiters. Each cell is totally random and contains up the the Excel limit of cell characters allowed, in some cases.

On 22-JAN-2013, PV Systems confirmed to Global Medical Safety (GMS), PV Operations (PV Ops) that the “The Request Investigation Comment” Icon in SCEPTRE used for Automated Referral to Complaint Vigilance (CV), was not functioning as designed. If the case did not generate a change log in Compliant Vigilance’s system due to an issue on the product Quality Management System (PQMS) side of the feed, the automated process will not trigger. Due to this issue, section 7 and section 8 of WI-06466 must be followed for consumer cases.
I'll willing to take a coarse swing at it, treating acronyms as words, things like WI-06466 as words , .... if you think it might be useful.
 

Akashwani

Well-known Member
Joined
Mar 14, 2009
Messages
2,911
Hi,

Does this help?....

Excel udf: Word frequency | Get Digital Help - Microsoft Excel resource

Example results from the above solution......

Excel Workbook
ABCDE
1
2On 22-JAN-2013, PV Systems confirmed to Global Medical Safety (GMS), PV Operations (PV Ops) that the The Request Investigation Comment Icon in SCEPTRE used for Automated Referral to Complaint Vigilance (CV), was not functioning as designed. If the case did not generate a change log in Compliant Vigilances system due to an issue on the product Quality Management System (PQMS) side of the feed, the automated process will not trigger. Due to this issue, section 7 and section 8 of WI-06466 must be followed for consumer cases.On1
322-JAN-2013,1
4PV2
5Systems1
6confirmed1
7to4
8Global1
9Medical1
10Safety1
11(GMS),1
12Operations1
13(PV1
14Ops)1
15that1
16the5
17The1
18Request1
19Investigation1
20Comment1
21Icon1
22in2
23SCEPTRE1
24used1
25for2
26Automated1
27Referral1
28Complaint1
29Vigilance1
30(CV),1
31was1
32not3
33functioning1
34as1
35designed.1
36If1
37case1
38did1
39generate1
40a1
41change1
42log1
43Compliant1
44Vigilances1
45system1
46due1
47an1
48issue1
49on1
50product1
51Quality1
52Management1
53System1
54(PQMS)1
55side1
56of2
57feed,1
58automated1
59process1
60will1
61trigger.1
62Due1
63this1
64issue,1
65section2
6671
67and1
6881
69WI-064661
70must1
71be1
72followed1
73consumer1
74cases.1
Sheet3


I do not write or understand VBA, so I cannot help you further with this, sorry.
I guess the VBA code will need to be edited to resolve the issue with punctuation.

I hope that helps.

Ak
 

JoeMo

MrExcel MVP
Joined
May 26, 2009
Messages
16,747
Not one word. Entire textual paragraphs. Free form. No delimiters. Each cell is totally random and contains up the the Excel limit of cell characters allowed, in some cases.

On 22-JAN-2013, PV Systems confirmed to Global Medical Safety (GMS), PV Operations (PV Ops) that the “The Request Investigation Comment” Icon in SCEPTRE used for Automated Referral to Complaint Vigilance (CV), was not functioning as designed. If the case did not generate a change log in Compliant Vigilance’s system due to an issue on the product Quality Management System (PQMS) side of the feed, the automated process will not trigger. Due to this issue, section 7 and section 8 of WI-06466 must be followed for consumer cases.
Here's a module you can try. I have done some cursory testing by filling 12,500 cells in column A with your sample data quoted above (run time approximately 3 minutes on my machine). Limitations are described in a header for the module. Note that the data are assumed to be in Column A starting in A1, and the procedure must be run with the data sheet active.
Code:
'---------------------------------------------------------------------------------------
' Module    : CountUniqueWordsInRange
' Author    : JoeMo
' Date      : 3/28/2013
' Purpose   : Run from the activesheet. Assumes all data are in column A of the activesheet.
'             Requires that words are separated by the space character.
'             Returns all unique words with a count of the number of occurrences of
'             each word in column A to a new sheet named "Unique Words".
'             Limit on total word count that can be handled
'             is 17,179,869,184 (Excel 2007 or later versions). Limit on number of
'             unique words that can be handled is 1,048,575 (Excel 2007 or later versions).
'             Treats numbers and acronyms as words.
'---------------------------------------------------------------------------------------

Sub CountUniqueWordsInRange()
Dim rS As Range, sSht As Worksheet, dSht As Worksheet, aSht As Worksheet
Dim Punc As Variant, lRs As Long, lRd As Long, c As Range
Dim totWords As Long, colCt As Long, vA As Variant, vO() As Variant
Dim i As Long, j As Long, k As Long, Ct As Long, n As Long
'define source range
Set sSht = ActiveSheet
lRs = sSht.Range("A" & Rows.Count).End(xlUp).Row
Set rS = sSht.Range("A1", "A" & lRs)
'Get total word count
totWords = CountWords(rS)
'determine how many columns needed to list all words
colCt = WorksheetFunction.RoundUp(totWords / Rows.Count, 0)
If colCt > Columns.Count Then
    MsgBox "Too many words to list in one sheet - truncate the input range and try again." & vbNewLine & "Goodbye."
    Exit Sub
End If
With Application
.ScreenUpdating = False
.Calculation = xlCalculationManual
.DisplayAlerts = False
.StatusBar = "PROCESSING YOUR DATA - PLEASE BE PATIENT"
End With
'Add sheet to list all words in source range
On Error Resume Next
Worksheets("All Words").Delete
On Error GoTo 0
ActiveWorkbook.Sheets.Add after:=sSht
Set aSht = ActiveSheet
aSht.Name = "All Words"
'list all words
Punc = Array(".", ",", ";", ":", "?", "!", "~", "@", "#", "$", _
    "(", ")", "/", Chr(34), Chr(147), Chr(148))
For Each c In rS
    If Not IsEmpty(c) Then
        vA = Split(Trim(c.Value), " ")
        Ct = Ct + UBound(vA) + 1
        ReDim Preserve vO(1 To Ct)
        For i = LBound(vA) To UBound(vA)
            For j = LBound(Punc) To UBound(Punc)
                vA(i) = Replace(vA(i), Punc(j), "")
            Next j
        Next i
        For j = LBound(vA) To UBound(vA)
            k = k + 1
            vO(k) = vA(j)
        Next j
    End If
Next c
'put all words into All Words sheet
k = 0
n = 0
For i = 1 To colCt
    Do Until n = aSht.Rows.Count Or k = UBound(vO)
        k = k + 1
        n = n + 1
        aSht.Cells(n, i).Value = vO(k)
    Loop
    n = 0
Next i
'copy all words to sheet "Unique Words" and remove duplicates
On Error Resume Next
Worksheets("Unique Words").Delete
On Error GoTo 0
aSht.Copy after:=sSht
Set dSht = ActiveSheet
dSht.Name = "Unique Words"
For i = 1 To colCt
    dSht.Range("A1").CurrentRegion.Columns(i).RemoveDuplicates Columns:=1, Header:=xlNo
Next i
'Get word count remaining after dups removal from individual columns
totWords = CountWords(dSht.Range("A1").CurrentRegion)
If totWords > dSht.Rows.Count Then
    MsgBox "Too many words remaining for a single column after first pass - Goodbye."
    Exit Sub
End If
'Consolidate columns and remove dups again
With dSht
    lRd = .Range("A" & Rows.Count).End(xlUp).Row + 1
    For i = 2 To colCt
        .Range(Cells(1, i), Cells(Rows.Count, i).End(xlUp)).Cut Destination:=Cells(lRd, 1)
    Next i
End With
'Final dups removal from the one remaining column
dSht.Range("A1").CurrentRegion.RemoveDuplicates Columns:=1, Header:=xlNo
dSht.Range("A1").EntireRow.Insert
With dSht.Range("A1:B1")
    .Value = Array("Word", "Count")
    .Font.Bold = True
End With
'Get count of each unique word
lRd = dSht.Range("A" & Rows.Count).End(xlUp).Row
dSht.Range("B2").FormulaR1C1 = "=COUNTIF('All Words'!C[-1]:C[" & colCt - 2 & "],'Unique Words'!RC[-1])"
With dSht.Range("B2", "B" & lRd)
    .FillDown
    .Calculate
    .Copy
    .PasteSpecial xlPasteValues
    Application.CutCopyMode = False
End With
With dSht.Range("A1:B1")
    .EntireColumn.AutoFit
End With
With Application
.ScreenUpdating = True
.Calculation = xlCalculationAutomatic
.DisplayAlerts = True
.StatusBar = False
End With
End Sub
Function CountWords(R As Range) As Long
Dim lChars As Long, c As Range, Ct As Long
For Each c In R
    Ct = 0
    lChars = Len(Trim(c.Value))
    If lChars = 0 Then
        Ct = 0
    Else
        Ct = Len(Trim(c.Value)) - Len(Replace(Trim(c.Value), " ", "")) + 1
    End If
    CountWords = CountWords + Ct
Next c
End Function
 

Watch MrExcel Video

Forum statistics

Threads
1,095,866
Messages
5,446,969
Members
405,424
Latest member
fidarbearing

This Week's Hot Topics

Top