Count number of occurrences of all words across sheet

Chris78

New Member
Joined
Aug 11, 2017
Messages
17
Office Version
  1. 365
Platform
  1. Windows
Hi Folks,
I'm pretty new to VBA but confident this is the right place to find some help...
I'm trying extract the list of all distinct words from a worksheet, followed by the number of occurrences the same word is found in the complete source sheet.
Only words having more than 3 characters should be considered and punctuation ignored.

Si basically, this is an example of the source and expected result
Source:
| A |
1|Humpty Dumpty sat on a wall
2|Humpty had a great fall
3|Humpty Dumpty again

Expected Result:
| A | B |
1|HUMPTY | 3 |
2|DUMPTY | 2 |
3|WALL | 1 |
4|GREAT | 1 |
5|FALL | 1 |
6|AGAIN | 1 |

I got it that far my macro generates the list of words but am struggling to:
a - list *distinct* words
b - count the number of occurrences

Actual VBA code

Sub MakeWordList()
Dim InputSheet As Worksheet
Dim WordListSheet As Worksheet
Dim PuncChars As Variant, x As Variant
Dim i As Long, r As Long
Dim txt As String
Dim wordCnt As Long
Dim MinLen As Long
Dim AllWords As Range
Dim PC As PivotCache
Dim PT As PivotTable

Application.ScreenUpdating = False
Set InputSheet = ActiveSheet
Set WordListSheet = Worksheets.Add(after:=Worksheets(Sheets.Count))
WordListSheet.Range("A1") = "All Words"
WordListSheet.Range("A1").Font.Bold = True
InputSheet.Activate
wordCnt = 2
MinLen = 3
PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _
"$", "%", "&", "(", ")", " - ", "_", "--", "+", _
"=", "~", "/", "", "{", "}", "[", "]", """", "?", "*")
r = 1


' Loop until blank cell is encountered
Do While Cells(r, 1) <> ""
' covert to UPPERCASE
txt = UCase(Cells(r, 1))
' Remove punctuation
For i = 0 To UBound(PuncChars)
txt = Replace(txt, PuncChars(i), "")
Next i
' Remove excess spaces
txt = WorksheetFunction.Trim(txt)
' Extract the words
x = Split(txt)
For i = 0 To UBound(x)
If Len(x(i)) > MinLen Then
WordListSheet.Cells(wordCnt, 1) = x(i)
'WordListSheet.Cells(wordCnt, 2) = ?here would come the occurrence-count?
wordCnt = wordCnt + 1

End If
Next i
r = r + 1
Loop
End Sub
 
The easiest way would be to remove the words from the dictionary after they have been put in and just before you post them to your worksheet. This would be easily adaptable to Rick's code as well. You'll want to make sure that the words in your list will be in uppercase.

Code:
Sub WordCounts()
  Dim Dict As Object, Data As Variant, Regex As Object, Match As Object[COLOR=#ff0000], Word As Variant[/COLOR]
  Set Regex = CreateObject("VBScript.Regexp")
  Regex.Pattern = "[A-Z0-9](?:[A-Z0-9]|-(?=[A-Z0-9])){2,}[A-Z0-9]"
  Regex.Global = True
  Set Dict = CreateObject("Scripting.Dictionary")
  Data = UCase(Join(Application.Transpose(Range("A1", Cells(Rows.Count, "A").End(xlUp)))))
  For Each Match In Regex.Execute(Data)
    Dict.Item(Match.Value) = Dict.Item(Match.Value) + 1
  Next Match
[COLOR=#ff0000]  For Each Word In Array("WORD1", "WORD2", "WORD3", _[/COLOR]
[COLOR=#ff0000]                          "WORD4", "WORD5", "WORD6")[/COLOR]
[COLOR=#ff0000]     If Dict.Exists(Word) Then Call Dict.Remove(Word)[/COLOR]
[COLOR=#ff0000]   Next Word[/COLOR]
  [C1].Resize(Dict.Count, 2) = Application.Transpose(Array(Dict.Keys, Dict.Items))
End Sub
 
Upvote 0

Excel Facts

Last used cell?
Press Ctrl+End to move to what Excel thinks is the last used cell.

Forum statistics

Threads
1,214,833
Messages
6,121,868
Members
449,053
Latest member
Mesh

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top