Akuini

VBA Macro to create Word & Phrase Frequency

@wiscochris
I'll try to change the code according to your new requirement, maybe this weekend, I'm a bit busy at the moment.
 

Excel Facts

Format cells as time
Select range and press Ctrl+Shift+2 to format cells as time. (Shift 2 is the @ sign).
@wiscochris
Try this:

Put the list starting at A1 in sheet2, you may change it in this part:
VBA Code:
    'if you have stop words list then put the list starting at A1 in sheet2
    With Sheets("Sheet2")
        Set rSW = .Range("A1", .Cells(.Rows.Count, "A").End(xlUp))
    End With

The example has three 1 WORD (G, I, K), 2 WORD (M, O, Q), and 3 WORD (S, U, W)columns. It would be great if I could change that from 3 to 5 or 20 or whatever, depending upon my needs.
You can change the number in this part:
VBA Code:
Const NOC As Long = 4  'number of column as result

THE CODE;
VBA Code:
Option Explicit
Const sNumber As String = "1,2,3"  '"1,2,3"
'sNumber = "1"  will generate 1 word frequency list
'sNumber = "1,2,3"  will generate 1 word, 2 word & 3 word frequency list

Const xPattern As String = "A-Z0-9_'"
'define the word characters, the above pattern will include letter, number, underscore & apostrophe as word character
'word with apostrophe such as "you're" counts as one word.
'word with underscore such as "aa_bb" counts as one word.


Const NOC As Long = 4  'number of column as result

Const xCol As String = "G:XFD" 'columns to clear
Const BF As String = "B:F" 'DATA
Const HC As Long = 10000  'helper column


Dim VBX
Dim SN As Long
Dim rSW As Range


Sub Word_Phrase_Frequency_v1x()

Dim i As Long, j As Long, n As Long, k As Long
Dim txa As String
Dim z, t, s

t = Timer
Application.ScreenUpdating = False
Range(xCol).Clear

'if there are errors, remove them
On Error Resume Next
Range(BF).SpecialCells(xlCellTypeFormulas, xlErrors).ClearContents
Range(BF).SpecialCells(xlConstants, xlErrors).ClearContents
On Error GoTo 0

j = Range("A" & Rows.Count).End(xlUp).Row

For i = 2 To j ' i is row number

    txa = Join(Application.Transpose(Application.Transpose(Cells(i, "B").Resize(1, 5))), " ")
    z = Split(sNumber, ",")
    SN = (UBound(z) + 1)
    ReDim VBX(1 To 1, 1 To SN * NOC * 2)
        
    'if you have stop words list then put the list starting at A1 in sheet2
    With Sheets("Sheet2")
        Set rSW = .Range("A1", .Cells(.Rows.Count, "A").End(xlUp))
    End With
    
    If rSW.Cells(1) <> "" Then Call stopWord(xPattern, txa)
        
        'TO PROCESS
        For k = LBound(z) To UBound(z)
            Call toProcessY(CLng(z(k)), txa, xPattern, i)
        Next
    
    Cells(i, "G").Resize(1, UBound(VBX, 2)) = VBX
Next


For Each s In z
    For k = 1 To NOC
        n = n + 1
        VBX(1, n) = s & " WORD"
        n = n + 1
        VBX(1, n) = "count"
    Next
Next
Cells(1, "G").Resize(1, UBound(VBX, 2)) = VBX


Range(xCol).Columns.AutoFit
Columns(HC).Resize(, 2).Clear
Application.ScreenUpdating = True

Debug.Print "It's done in:  " & Timer - t & " seconds"

End Sub


Sub stopWord(xP As String, tx As String)
Dim n As Long
Dim stW, x
Dim regEx As Object
stW = rSW.Value
        Set regEx = CreateObject("VBScript.RegExp")
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
        End With
tx = " " & tx
For Each x In stW
        regEx.Pattern = "[^" & xP & "]" & x & "[^" & xP & "]"
        If regEx.Test(tx) Then
           tx = regEx.Replace(tx, "|")   'replace stop word with "|"
        End If
Next

End Sub

Sub toProcessY(n As Long, ByVal tx As String, xP As String, rn As Long)
'phrase frequency

Dim regEx As Object, matches As Object, x As Object, d As Object
Dim i As Long, rc As Long
Dim va, q
Static y As Long

        Set regEx = CreateObject("VBScript.RegExp")
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
        End With

If n > 1 Then

        regEx.Pattern = "( ){2,}"

        If regEx.Test(tx) Then
           tx = regEx.Replace(tx, " ") 'remove excessive space
        End If
        
        tx = Trim(tx)
               
'        regEx.Pattern = "[^A-Z0-9_' ]+"
        regEx.Pattern = "[^" & xP & " ]+" 'exclude xp and space
        If regEx.Test(tx) Then
           tx = regEx.Replace(tx, vbLf) 'replace non words character (excluding space) with new line char (vbLf)
        End If
        
        tx = Replace(tx, vbLf & " ", vbLf & "") 'remove space in the beginning of every line

End If

    Set d = CreateObject("scripting.dictionary")
    d.CompareMode = vbTextCompare

'    regEx.Pattern = Trim(WorksheetFunction.Rept("[A-Z0-9_']+ ", n)) 'match n words (the phrase) separated by a space
    regEx.Pattern = Trim(WorksheetFunction.Rept("[" & xP & "]+ ", n)) 'match n words (the phrase) separated by a space
            Set matches = regEx.Execute(tx)
            
            For Each x In matches
                d(CStr(x)) = d(CStr(x)) + 1 'get phrase frequency
            Next
 
For i = 1 To n - 1
        
        regEx.Pattern = "^[" & xP & "]+ "
        If regEx.Test(tx) Then
           tx = regEx.Replace(tx, "")   'remove first word in each line to get different combination of n words (phrase)

'            regEx.Pattern = Trim(WorksheetFunction.Rept("[A-Z0-9_']+ ", n))
            regEx.Pattern = Trim(WorksheetFunction.Rept("[" & xP & "]+ ", n))
            
            Set matches = regEx.Execute(tx)
            
            For Each x In matches
                d(CStr(x)) = d(CStr(x)) + 1     'get phrase frequency
            Next

        End If
Next

If d.Count = 0 Then y = y + (NOC * 2): Exit Sub


With Cells(2, HC).Resize(d.Count, 2)
    
    Select Case d.Count
    Case Is < 65536 'Transpose function has a limit of 65536 item to process
        
        .Value = Application.Transpose(Array(d.Keys, d.Items))
        
    Case Is <= 1048500
        
        ReDim va(1 To d.Count, 1 To 2)
        i = 0
            For Each q In d.Keys
                i = i + 1
                va(i, 1) = q: va(i, 2) = d(q)
            Next
        .Value = va
    
    Case Else
        
        MsgBox "Process is canceled, the result is more than 1048500 rows"
    
    End Select
    
    .Sort Key1:=.Cells(1, 2), Order1:=xlDescending, Key2:=.Cells(1, 1), Order2:=xlAscending, Header:=xlNo
    va = .Resize(NOC, 2).Value
    Columns(HC).Resize(, 2).Clear
End With

If y >= UBound(VBX, 2) Then y = 0

For i = 1 To UBound(va, 1)
    y = y + 1
    VBX(1, y) = va(i, 1)
    y = y + 1
    VBX(1, y) = va(i, 2)
Next


End Sub
 
This is (again), perfect. Thank you SO much. It is so fun to read the 2 versions of your code to understand and learn more about VBA. I very much appreciate you taking the time to do this and to add such detailed comments.
 
Mine request is simple as I have Division in Column A and have Answers to Question 1 (text1) in Column B and Answers to Question 2 (text2) in Column C:

DivisionQuestion 1 AnswersQuestion 2 Answers
AWe are one
BAwesome
AThis is done
BNo question
CWhen you arrve
DNo changes
ENo question
CWhen you arrve
DNo changes
ENo question
AWhen you arrve
BNo changes

I want count of words available in column B and Column C as per Division Level for example for Column B:

Question 1 Answers
DivisionWordsCount
AThis
1​
is
1​
done
1​
We
1​
are
1​
one
1​
BAwesome
1​
No
1​
question
1​
CWhen
1​
you
1​
arrve
1​
DNo
1​
changes
1​

If stop words can also be excluded in the output then it would be cherry on the cake.

I understand that this forum is old; however, I am optimistic that the reply would be done. Thank you in advance!!
 
@wiscochris
Try this:

Put the list starting at A1 in sheet2, you may change it in this part:
VBA Code:
    'if you have stop words list then put the list starting at A1 in sheet2
    With Sheets("Sheet2")
        Set rSW = .Range("A1", .Cells(.Rows.Count, "A").End(xlUp))
    End With


You can change the number in this part:
VBA Code:
Const NOC As Long = 4  'number of column as result

THE CODE;
VBA Code:
Option Explicit
Const sNumber As String = "1,2,3"  '"1,2,3"
'sNumber = "1"  will generate 1 word frequency list
'sNumber = "1,2,3"  will generate 1 word, 2 word & 3 word frequency list

Const xPattern As String = "A-Z0-9_'"
'define the word characters, the above pattern will include letter, number, underscore & apostrophe as word character
'word with apostrophe such as "you're" counts as one word.
'word with underscore such as "aa_bb" counts as one word.


Const NOC As Long = 4  'number of column as result

Const xCol As String = "G:XFD" 'columns to clear
Const BF As String = "B:F" 'DATA
Const HC As Long = 10000  'helper column


Dim VBX
Dim SN As Long
Dim rSW As Range


Sub Word_Phrase_Frequency_v1x()

Dim i As Long, j As Long, n As Long, k As Long
Dim txa As String
Dim z, t, s

t = Timer
Application.ScreenUpdating = False
Range(xCol).Clear

'if there are errors, remove them
On Error Resume Next
Range(BF).SpecialCells(xlCellTypeFormulas, xlErrors).ClearContents
Range(BF).SpecialCells(xlConstants, xlErrors).ClearContents
On Error GoTo 0

j = Range("A" & Rows.Count).End(xlUp).Row

For i = 2 To j ' i is row number

    txa = Join(Application.Transpose(Application.Transpose(Cells(i, "B").Resize(1, 5))), " ")
    z = Split(sNumber, ",")
    SN = (UBound(z) + 1)
    ReDim VBX(1 To 1, 1 To SN * NOC * 2)
       
    'if you have stop words list then put the list starting at A1 in sheet2
    With Sheets("Sheet2")
        Set rSW = .Range("A1", .Cells(.Rows.Count, "A").End(xlUp))
    End With
   
    If rSW.Cells(1) <> "" Then Call stopWord(xPattern, txa)
       
        'TO PROCESS
        For k = LBound(z) To UBound(z)
            Call toProcessY(CLng(z(k)), txa, xPattern, i)
        Next
   
    Cells(i, "G").Resize(1, UBound(VBX, 2)) = VBX
Next


For Each s In z
    For k = 1 To NOC
        n = n + 1
        VBX(1, n) = s & " WORD"
        n = n + 1
        VBX(1, n) = "count"
    Next
Next
Cells(1, "G").Resize(1, UBound(VBX, 2)) = VBX


Range(xCol).Columns.AutoFit
Columns(HC).Resize(, 2).Clear
Application.ScreenUpdating = True

Debug.Print "It's done in:  " & Timer - t & " seconds"

End Sub


Sub stopWord(xP As String, tx As String)
Dim n As Long
Dim stW, x
Dim regEx As Object
stW = rSW.Value
        Set regEx = CreateObject("VBScript.RegExp")
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
        End With
tx = " " & tx
For Each x In stW
        regEx.Pattern = "[^" & xP & "]" & x & "[^" & xP & "]"
        If regEx.Test(tx) Then
           tx = regEx.Replace(tx, "|")   'replace stop word with "|"
        End If
Next

End Sub

Sub toProcessY(n As Long, ByVal tx As String, xP As String, rn As Long)
'phrase frequency

Dim regEx As Object, matches As Object, x As Object, d As Object
Dim i As Long, rc As Long
Dim va, q
Static y As Long

        Set regEx = CreateObject("VBScript.RegExp")
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
        End With

If n > 1 Then

        regEx.Pattern = "( ){2,}"

        If regEx.Test(tx) Then
           tx = regEx.Replace(tx, " ") 'remove excessive space
        End If
       
        tx = Trim(tx)
              
'        regEx.Pattern = "[^A-Z0-9_' ]+"
        regEx.Pattern = "[^" & xP & " ]+" 'exclude xp and space
        If regEx.Test(tx) Then
           tx = regEx.Replace(tx, vbLf) 'replace non words character (excluding space) with new line char (vbLf)
        End If
       
        tx = Replace(tx, vbLf & " ", vbLf & "") 'remove space in the beginning of every line

End If

    Set d = CreateObject("scripting.dictionary")
    d.CompareMode = vbTextCompare

'    regEx.Pattern = Trim(WorksheetFunction.Rept("[A-Z0-9_']+ ", n)) 'match n words (the phrase) separated by a space
    regEx.Pattern = Trim(WorksheetFunction.Rept("[" & xP & "]+ ", n)) 'match n words (the phrase) separated by a space
            Set matches = regEx.Execute(tx)
           
            For Each x In matches
                d(CStr(x)) = d(CStr(x)) + 1 'get phrase frequency
            Next
 
For i = 1 To n - 1
       
        regEx.Pattern = "^[" & xP & "]+ "
        If regEx.Test(tx) Then
           tx = regEx.Replace(tx, "")   'remove first word in each line to get different combination of n words (phrase)

'            regEx.Pattern = Trim(WorksheetFunction.Rept("[A-Z0-9_']+ ", n))
            regEx.Pattern = Trim(WorksheetFunction.Rept("[" & xP & "]+ ", n))
           
            Set matches = regEx.Execute(tx)
           
            For Each x In matches
                d(CStr(x)) = d(CStr(x)) + 1     'get phrase frequency
            Next

        End If
Next

If d.Count = 0 Then y = y + (NOC * 2): Exit Sub


With Cells(2, HC).Resize(d.Count, 2)
   
    Select Case d.Count
    Case Is < 65536 'Transpose function has a limit of 65536 item to process
       
        .Value = Application.Transpose(Array(d.Keys, d.Items))
       
    Case Is <= 1048500
       
        ReDim va(1 To d.Count, 1 To 2)
        i = 0
            For Each q In d.Keys
                i = i + 1
                va(i, 1) = q: va(i, 2) = d(q)
            Next
        .Value = va
   
    Case Else
       
        MsgBox "Process is canceled, the result is more than 1048500 rows"
   
    End Select
   
    .Sort Key1:=.Cells(1, 2), Order1:=xlDescending, Key2:=.Cells(1, 1), Order2:=xlAscending, Header:=xlNo
    va = .Resize(NOC, 2).Value
    Columns(HC).Resize(, 2).Clear
End With

If y >= UBound(VBX, 2) Then y = 0

For i = 1 To UBound(va, 1)
    y = y + 1
    VBX(1, y) = va(i, 1)
    y = y + 1
    VBX(1, y) = va(i, 2)
Next


End Sub
Amazing code. I am a new user and have requested for a ew code. Hope you would be able to help with it. Look forward to your time investment and patronage:
 
Amazing code. I am a new user and have requested for a ew code. Hope you would be able to help with it. Look forward to your time investment and patronage:
I tried it; however, did not able to get success as I want the output specific to division level. Please check if you can help with the code.

P.S: I would add stop words in sheet 2 in Column A and would love that functionality too.

Question 1 Answers
DivisionWordsCount
AThis1
is1
done1
We1
are1
one1
BAwesome1
No1
question1
CWhen1
you1
arrve1
DNo1
changes1
 
Sorry, for the late reply:
  • I assumed you only need 1 word frequency
  • Data start at row 2, you can change it in this part: For i = 2 To UBound(va, 1) 'data start at row 2
  • If you have stop words list then put the list starting at A1 in sheet2. The stop words will be removed from the list.
  • You need to manually sort data by col A (Divison)
  • Run Sub Word_Phrase_Frequency_v1, you will be asked to put the cursor at the proper column, so if you want to get frequency of "Question 1 Answers" (col B) then put the cursor in col B.
  • The result will be at 2 column on the right of the last column with data

VBA Code:
Option Explicit
Dim LC As Long 'last column
Dim CX As Long 'active cell column
Dim rSW As Range


Sub Word_Phrase_Frequency_v1()

'The code will generate word/phrase frequency
'How to use:
'1. Add reference to "Microsoft VBScript Regular Expressions 5.5" (you need to do it once only):
'   In Visual Basic Editor menu, select Tools –> References, then select Microsoft VBScript Regular Expressions 5.5, then click OK.
'2. Data must be in column A, start at A1
'3. Run Word_Phrase_Frequency_v1


'--- CHANGE sNumber & xPattern VALUE TO SUIT -----------------------------------

Const sNumber As String = "1,2,3"  '"1,2,3"
'sNumber = "1"  will generate 1 word frequency list
'sNumber = "1,2,3"  will generate 1 word, 2 word & 3 word frequency list

Const xPattern As String = "A-Z0-9_'"
'define the word characters, the above pattern will include letter, number, underscore & apostrophe as word character
'word with apostrophe such as "you're" counts as one word.
'word with underscore such as "aa_bb" counts as one word.


'Const xCol As String = "C:ZZ" 'columns to clear
Dim i As Long, j As Long, k As Long, h As Long
Dim txa As String, txq As String
Dim z, t, va, ary, arz
Dim rngA As Range
    
    Set rngA = Application.Selection
    Set rngA = Application.InputBox("Put the cursor at the proper column", "", rngA.Address, Type:=8)
    rngA.Activate

t = Timer

Application.ScreenUpdating = False

CX = ActiveCell.Column
LC = ActiveSheet.Cells.Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column
'if there are errors, remove them
On Error Resume Next
Columns(CX).SpecialCells(xlCellTypeFormulas, xlErrors).ClearContents
Columns(CX).SpecialCells(xlConstants, xlErrors).ClearContents
On Error GoTo 0

Cells(1, LC + 2) = Cells(1, CX)
Cells(2, LC + 2) = "Division"
Cells(2, LC + 3) = "WORDS"
Cells(2, LC + 4) = "COUNT"

j = Range("A" & Rows.Count).End(xlUp).Row

va = Range("A1:A" & j)

For i = 2 To UBound(va, 1)  'data start at row 2
 k = i
    Do
        i = i + 1
        If i > UBound(va, 1) Then Exit Do
    Loop While va(i, 1) = va(i - 1, 1)
    i = i - 1
'    Debug.Print j & " : " & i
     txq = txq & va(i, 1) & ":" & k & ":" & i & ","

Next

'Debug.Print txq
        ary = Split(txq, ",")
        
        For h = 0 To UBound(ary) - 1
            arz = Split(ary(h), ":")

                txa = Join(Application.Transpose(Range(Cells(arz(1), CX), Cells(arz(2), CX))), " ")
                'if you have stop words list then put the list starting at A1 in sheet2
                With Sheets("Sheet2")
                    Set rSW = .Range("A1", .Cells(.Rows.Count, "A").End(xlUp))
                End With

                If rSW.Cells(1) <> "" Then Call stopWord(xPattern, txa)
                Debug.Print txa
            Call toProcessY(1, txa, xPattern, CStr(arz(0)))
        Next


'Range(xCol).Columns.AutoFit
Application.ScreenUpdating = True
Cells(1, LC + 2).Activate
Debug.Print "It's done in:  " & Timer - t & " seconds"

End Sub

Sub toProcessY(n As Long, ByVal tx As String, xP As String, div As String)
'phrase frequency

Dim regEx As Object, matches As Object, x As Object, d As Object
Dim i As Long, LR As Long
Dim va, q

        Set regEx = CreateObject("VBScript.RegExp")
        With regEx
            .Global = True
            .MultiLine = True
            .ignorecase = True
        End With


    Set d = CreateObject("scripting.dictionary")
    d.CompareMode = vbTextCompare

'    regEx.Pattern = Trim(WorksheetFunction.Rept("[A-Z0-9_']+ ", n)) 'match n words (the phrase) separated by a space
    regEx.Pattern = Trim(WorksheetFunction.Rept("[" & xP & "]+ ", n)) 'match n words (the phrase) separated by a space
            Set matches = regEx.Execute(tx)
            
            For Each x In matches
                d(CStr(x)) = d(CStr(x)) + 1 'get phrase frequency
            Next
 
For i = 1 To n - 1
        
        regEx.Pattern = "^[" & xP & "]+ "
        If regEx.Test(tx) Then
           tx = regEx.Replace(tx, "")   'remove first word in each line to get different combination of n words (phrase)

'            regEx.Pattern = Trim(WorksheetFunction.Rept("[A-Z0-9_']+ ", n))
            regEx.Pattern = Trim(WorksheetFunction.Rept("[" & xP & "]+ ", n))
            
            Set matches = regEx.Execute(tx)
            
            For Each x In matches
                d(CStr(x)) = d(CStr(x)) + 1     'get phrase frequency
            Next

        End If
Next

'If d.Count = 0 Then MsgBox "Nothing with " & n & " word phrase found": Exit Sub
If d.Count = 0 Then Exit Sub

LR = Cells(Rows.Count, LC + 3).End(xlUp).Row + 1

'put the result
With Cells(LR, LC + 3).Resize(d.Count, 2)
    .Value = Application.Transpose(Array(d.Keys, d.Items))
    .Sort Key1:=.Cells(1, 2), Order1:=xlDescending, Key2:=.Cells(1, 1), Order2:=xlAscending, Header:=xlNo
    .Cells(1, 1).Offset(, -1) = div
End With




End Sub


Sub stopWord(xP As String, tx As String)
Dim n As Long
Dim stW, x
Dim regEx As Object
stW = rSW.Value
        Set regEx = CreateObject("VBScript.RegExp")
        With regEx
            .Global = True
            .MultiLine = True
            .ignorecase = True
        End With
tx = " " & tx
For Each x In stW
        regEx.Pattern = "[^" & xP & "]" & x & "[^" & xP & "]"
        If regEx.Test(tx) Then
           tx = regEx.Replace(tx, " ")   'replace stop word with " "
        End If
Next

End Sub

I've changed the example a bit:
darkbunty - word frequency.xlsm
ABCDEFGH
1DivisionQuestion 1 AnswersQuestion 2 AnswersQuestion 1 Answers
2AWe are oneDivisionWORDSCOUNT
3AThis is doneAone2
4Athis oneWhen you arrveThis2
5BAwesomedone1
6BNo questionWe1
7BNo changesBAwesome1
8CWhen you arrveNo1
9CWhen you arrvequestion1
10DNo changesCarrve1
11DNo changesWhen1
12ENo questionyou1
13ENo questionDchanges1
14No1
15
Sheet1


stop words:
darkbunty - word frequency.xlsm
AB
1and
2is
3are
4
5
Sheet2
 

Forum statistics

Threads
1,214,591
Messages
6,120,429
Members
448,961
Latest member
nzskater

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top