Count Word and Random Phrase Frequency

Sleeplol

Board Regular
Joined
Apr 10, 2019
Messages
194
Office Version
  1. 365
  2. 2019
Platform
  1. Windows
Hello Everyone,
I'm attempting to count the word frequency as well as two and three word phrases from a column. I'm stuck and would really appreciate some help.

Getting error:
Run-time error 1004.png


On this line:
ErrorLastLine.png


Setup:
A macro populates column A with values from AP101:AP50000 (formulas combining additional row cells into a single cell).
(Problem) Macro populates the count of B,E,H

Runs fine though throttles for about 2 min for 30000ish rows; then bugs out after populating the results.
I've tried referencing the sheet with Worksheets("DashTest").Range on all three, but it still errors.

Below is a quick shot of what it looks like, as well as the entire script.
Thanks for any help

WordCountDisplay.png



VBA Code:
Sub test()
Dim a, e, s, Ignore As String, temp, x
Dim dic As Object, dic2 As Object, dic3 As Object
Set dic = CreateObject("Scripting.Dictionary")
dic.CompareMode = 1
Set dic2 = CreateObject("Scripting.Dictionary")
dic2.CompareMode = 1
Set dic3 = CreateObject("Scripting.Dictionary")
dic3.CompareMode = 1
With Range("m1").CurrentRegion.Offset(1)
Ignore = Join(Application.Transpose(.Resize(.Rows.Count - 1).Value), Chr(2))
End With
a = Range("a1", Range("a" & Rows.Count).End(xlUp)).Value
With CreateObject("VBScript.RegExp")
.Global = True: .IgnoreCase = True
.Pattern = "([\$\(\)\-\^\|\\\[\]\*\+\?\.])"
Ignore = "[^\w ]|\b(" & Replace(.Replace(Ignore, "\$1"), Chr(2), "|") & ")\b"
For Each e In a
If e <> "" Then
x = GetSentense(e, 2)
If IsArray(x) Then
For Each s In x
dic2(s) = dic2(s) + 1
Next
End If
x = GetSentense(e, 3)
If IsArray(x) Then
For Each s In x
If s <> "" Then dic3(s) = dic3(s) + 1
Next
End If
.Pattern = Ignore
temp = Application.Trim(.Replace(e, ""))
For Each s In Split(temp)
If s <> "" Then dic(s) = dic(s) + 1
Next
End If
Next
End With
Range("b2").Resize(dic.Count, 2).Value = _
Application.Transpose(Array(dic.keys, dic.items))
Range("e2").Resize(dic2.Count, 2).Value = _
Application.Transpose(Array(dic2.keys, dic2.items))
Range("h2").Resize(dic3.Count, 2).Value = _
Application.Transpose(Array(dic3.keys, dic3.items))
End Sub
Function GetSentense(ByVal txt As String, myStep)
Dim i As Long, ii As Long, temp, x
On Error Resume Next
x = Split(txt): ReDim temp(UBound(x) - myStep)
If Err Then GetSentense = Empty: Exit Function
On Error GoTo 0
For i = 0 To UBound(x) - myStep - 1
For ii = 1 To myStep
temp(i) = Trim$(temp(i) & " " & x(i + ii - 1))
Next
Next
GetSentense = temp
End Function
 

Excel Facts

Workdays for a market open Mon, Wed, Friday?
Yes! Use "0101011" for the weekend argument in NETWORKDAYS.INTL or WORKDAY.INTL. The 7 digits start on Monday. 1 means it is a weekend.
How does the code it 'bug out'?
 
Upvote 0
Transpose function has a limit of 65536 item to process, so maybe the items in dict3 is above that limit?
I have a macro to generate word\phrase frequency, let me know if you're interested.
 
Upvote 0
That makes perfect sense Akuini. If you're referring to individual words/phrases as items; then yes, the transpose function would definitely be tapped out.
Thanks for the explanation.
If your script manages more items; I am very interested :)
 
Upvote 0
Ok, try this:
  • The code will generate word/phrase frequency
  • Data must be in column A, start at A1.
  • Data can't be more than 65536 rows, because it's using Application.Transpose.
  • sNumber = "1,2,3" means it generates 3 frequency list: single word & 2 word phrase & 3 word phrase, you can change that to suit.
  • Word with apostrophe such as "you're" is counted as one word.
  • Word with underscore such as "aa_bb" is counted as one word.
  • Tested on text (from a novel) with 161K words with sNumber = "1,2,3", it took 10.5 seconds
VBA Code:
Sub regexPhraseFrequency1()
'generate word/phrase frequency
'Data must be in column A, start at A1
'sNumber = "1,2,3"  means it generates 3 frequency list: single word & 2 word phrase & 3 word phrase
'tested on text (from a novel) with 161K words with sNumber = "1,2,3", it took 10.5 seconds


Dim i As Long
Dim sNumber As String, txa As String
Dim z, T
Dim obj As New DataObject

T = Timer
Application.ScreenUpdating = False
Range("C:Z").ClearContents

txa = Join(Application.Transpose(Range("A1", Cells(Rows.Count, "A").End(xlUp))), vbLf)

'--- CHANGE sNumber VALUE TO SUIT -----------------------------------
sNumber = "1,2,3"   'list each number of words in a phrase, change to suit
                    'sNumber = "1"  will generate 1 frequency list of single word
                    'sNumber = "1,2"  will generate 2 frequency list: single word & 2 word phrase

z = Split(sNumber, ",")
  
    For i = LBound(z) To UBound(z)
        Call toProcess(CLng(z(i)), txa)
    Next

Range("C:Z").Columns.AutoFit
Application.ScreenUpdating = True

Debug.Print Timer - T

End Sub

Sub toProcess(n As Long, tx As String)
'phrase frequency

Dim regEx As Object, matches As Object, x As Object, d As Object
Dim i As Long, rc As Long
Dim va, q
Dim txb As String

    tx = Replace(tx, "'", "___") 'replace apostrophe with "___", so it will match pattern "\w+"
  
        Set regEx = CreateObject("VBScript.RegExp")
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
        End With
      
      
If n > 1 Then
      
        regEx.Pattern = "( ){2,}"
  
        If regEx.test(tx) Then
           tx = regEx.Replace(tx, " ") 'remove excessive space
        End If
      
        tx = Trim(tx)
        tx = Replace(tx, " ", "_____") 'replace space with "_____", , so it won't match pattern "\W+"

        regEx.Pattern = "\W+" 'non words character, Matches any character that is not
                              'a word character (alphanumeric & underscore). Equivalent to [^A-Za-z0-9_]
 
        If regEx.test(tx) Then
           tx = regEx.Replace(tx, vbLf) 'replace non words character with new line char
        End If
      
        tx = Replace(tx, "_____", " ") ' change it back to space
        tx = Replace(tx, vbLf & " ", vbLf & "") 'remove space in the beginning of every line
      
End If

  
    Set d = CreateObject("scripting.dictionary")
    d.CompareMode = vbTextCompare

    regEx.Pattern = Trim(WorksheetFunction.Rept("\w+ ", n)) 'match n words (the phrase) separated by a space
      
            Set matches = regEx.Execute(tx)
          
            For Each x In matches
                d(CStr(x)) = d(CStr(x)) + 1 'get phrase frequency
            Next

For i = 1 To n - 1
      
        regEx.Pattern = "^\w+ "
        txb = tx
        If regEx.test(txb) Then
           txb = regEx.Replace(txb, "")   'remove first word in each line to get different combination of n words (phrase)
         
            regEx.Pattern = Trim(WorksheetFunction.Rept("\w+ ", n))
            Set matches = regEx.Execute(txb)
          
            For Each x In matches
                d(CStr(x)) = d(CStr(x)) + 1     'get phrase frequency
            Next

        End If

Next

If d.Count = 0 Then MsgBox "Nothing with " & n & " word phrase found": Exit Sub

'Range("D:E").ClearContents

rc = Cells(1, Columns.Count).End(xlToLeft).Column
'put the result in col D:E

With Cells(2, rc + 2).Resize(d.Count, 2)
  
    If d.Count < 65536 Then 'Transpose function has a limit of 65536 item to process
      
        .Value = Application.Transpose(Array(d.Keys, d.items))
      
    Else
      
        ReDim va(1 To d.Count, 1 To 2)
        i = 0
            For Each q In d.Keys
                i = i + 1
                va(i, 1) = q: va(i, 2) = d(q)
            Next
        .Value = va
      
    End If
  
    'get the apostrophe back
    .Replace What:="___", Replacement:="'", lookAt:=xlPart, SearchFormat:=False, ReplaceFormat:=False
    .Sort Key1:=.Cells(1, 2), Order1:=xlDescending, Key2:=.Cells(1, 1), Order2:=xlAscending, Header:=xlNo
  
  
End With


Cells(1, rc + 2) = n & " WORD"
Cells(1, rc + 3) = "FREQUENCY"

End Sub

EXAMPLE:
Book1
ABCDEFGHIJ
1x x1 WORDFREQUENCY2 WORDFREQUENCY3 WORDFREQUENCY
2xBoone5Boone Boone2Keenan Boone Boone3
3Kaleb Owen BlaiseBlaise3Kaleb Owen2in a mood2
4Boone x3Keenan Boone2Kaleb Owen Blaise2
5Iker Isaiah a2Owen Blaise2a Keenan Boone1
6Axl Blaise ParkerI'm2a Keenan1Axl Blaise Parker1
7Kaleb Owen BlaiseKaleb2a mood1you're in a1
8I'm sorry,Keenan2Axl Blaise1
9"I'm tired, too.Owen2Blaise Parker1
10 you're in a moodaa_bb1Iker Isaiah1
11 youAxl1I'm sorry1
12a Keenan Boone Boonec221I'm tired1
13Keenan Boone BooneIker1in a1
14aa_bbin1x x1
15c22Isaiah1you're in1
16mood1
17Parker1
18sorry1
19tired1
20too1
21you1
22you're1
example
 
Upvote 0
Hi Akuini,
This macro is extraordinary! I was curious why you clear up to column Z, now i know; to make room for longer phrases. Incredible.
Many thanks for offering this to me.

Quick question though, with this macro, is it possible to have a column, let's say AA (or what-not), that is used as a words to ignore list, that only extracts words from the single word column without affecting the 2 word and up phrase columns??

BTW, your macro just ran about 30K rows with a total French/English unique word count of over 900K in under 2min (WITH NO ERROR) I count that as a WIN!!
 
Upvote 0
Quick question though, with this macro, is it possible to have a column, let's say AA (or what-not), that is used as a words to ignore list, that only extracts words from the single word column without affecting the 2 word and up phrase columns??
I don't understand: that is used as a words to ignore list, . Can you give me an example?
 
Upvote 0
A column, for instance "BA" that could be populated with items that would be auto extracted from the single word list and not the other phrase lists.

In the below photo, the word "to" is added; this means that it would be extracted from the 1 Word list in Column "D". However, it would not affect any of the multi-word phrases, for instance
"to the floor" would still be in the three word list.

Words to ignore.png
to.png
 
Upvote 0
Hi Akuini,
Actually, I can figure out the "ignore list" as a separate macro.

However, how would i run your Frequency macro from a separate sheet.
I would "paste" the items into A1 of worksheet "Frequency" , then need to run the macro from worksheet "Dash".

How would I be able to run the macro from a different sheet?
 
Upvote 0

Forum statistics

Threads
1,216,075
Messages
6,128,665
Members
449,462
Latest member
Chislobog

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top