Mining Text Fields

Rob P.

New Member
Joined
Feb 23, 2002
Messages
29
How would I group and count the occurences of specific words in a text field? This is my first step in trying to develop a scoring method based on our notes.

If the sentences above were in the text field, I'd expect the words "a" and "in" to come back with a count of two while all other words would come back with a count of 1.

Rob
 

Excel Facts

Easy bullets in Excel
If you have a numeric keypad, press Alt+7 on numeric keypad to type a bullet in Excel.
Rob P.,

If I understand you correctly.

Excel Workbook
ABC
1How would I group and count the occurences of specific words in a text field? This is my first step in trying to develop a scoring method based on our notes.22
Sheet1



Have a great day,
Stan
 
Upvote 0
Hello,

Stan is on the right lines, but I think his formulas would be incorrect, if words end in a or start with in.

Try in B3

=(LEN($A$1)-LEN(SUBSTITUTE($A$1," "&A3&" "," ")))/LEN(A3)

where A3 = a

and copy down with other words in col A

just realizees mine would be wrong if the word is at the beginning or at the end of the cell.
 
Upvote 0
What about this...

Excel Workbook
EFG
22How would I group and banana the occurences of specific words in a text field? This is my first step in trying to develop a scoring method based on our notes.22
Average
 
Upvote 0
Hello,

jrb has the same problem I had, it doesn't count the start or end words.

try

=IF(LEFT($A$1,LEN(A3))=A3,1,0)+IF(RIGHT($A$1,LEN(A3))=A3,1,0)+(LEN($A$1)-LEN(SUBSTITUTE($A$1," "&A3&" "," ")))/LEN(A3)

i'm sure there is a slicker formula out there.

=(LEN($A$1)+2-LEN(SUBSTITUTE(SUBSTITUTE($A$1,$A$1," "&$A$1&" ")," "&A3&" "," ")))/LEN(A3)

is about as good as I can get it
 
Last edited:
Upvote 0
Hi,

Are you looking to analyze every single word in the text field to determine how many times they occur, or only looking for specific words? This is a little messy, but this will take the text field and tell you how many times every single word in the box occurs... you should be able to modify it from there to take exactly what information you want from it.

Code:
Private Sub CommandButton1_Click()
  ' Clear form and setup variables
    Range("A3:C10001").ClearContents
    myVal = ""
    myLen = 0
    
  ' Copy the original text to a temp range (so you don't mess up the original)
    Range("B3").Value = Range("A1").Value
    
  ' Remove all non-standard characters (punctuation, etc.), so they aren't counted as part of a word
    myVal = Range("B3").Value
    myVal = Replace(myVal, ",", "")
    myVal = Replace(myVal, ".", "")
    myVal = Replace(myVal, ";", "")
    myVal = Replace(myVal, ":", "")
    myVal = Replace(myVal, "'", "")
    myVal = Replace(myVal, "!", "")
    myVal = Replace(myVal, "?", "")
    myVal = Replace(myVal, "#", "")
    myVal = Replace(myVal, "$", "")
    myVal = Replace(myVal, "%", "")
    myVal = Replace(myVal, "^", "")
    myVal = Replace(myVal, "&", "")
    myVal = Replace(myVal, "*", "")
    myVal = Replace(myVal, "(", "")
    myVal = Replace(myVal, ")", "")
    myVal = Replace(myVal, "-", "")
    myVal = Replace(myVal, "_", "")
    myVal = Replace(myVal, "+", "")
    myVal = Replace(myVal, "=", "")
    myVal = Replace(myVal, "[", "")
    myVal = Replace(myVal, "]", "")
    myVal = Replace(myVal, "{", "")
    myVal = Replace(myVal, "}", "")
    myVal = Replace(myVal, "\", "")
    myVal = Replace(myVal, "|", "")
    myVal = Replace(myVal, "/", "")
    myVal = Replace(myVal, "?", "")
    myVal = Replace(myVal, "<", "")
    myVal = Replace(myVal, ">", "")
    myVal = Replace(myVal, "`", "")
    myVal = Replace(myVal, "~", "")
    Range("B3").Value = myVal
    
  ' Determine how many words there are in the statement
    myLen = Len(myVal) - Len(Replace(myVal, " ", "")) + 1
    
  ' Make sure more than one word was entered in A1
    If myLen <= 1 Then Exit Sub
    
  ' Break the text apart into individual words
    myRow = 4
    For cnt = 1 To myLen
        Range("A" & myRow).Formula = "=IF(ISERROR(LEFT(B" & myRow - 1 & ",FIND("" "",B" & myRow - 1 & ")-1)),B" & myRow - 1 & ",LEFT(B" & myRow - 1 & ",FIND("" "",B" & myRow - 1 & ")-1))"
        Range("B" & myRow).Formula = "=IF(ISERROR(RIGHT(B" & myRow - 1 & ",LEN(B" & myRow - 1 & ")-LEN(A" & myRow & ")-1)),"""",RIGHT(B" & myRow - 1 & ",LEN(B" & myRow - 1 & ")-LEN(A" & myRow & ")-1))"
        Range("C" & myRow).Value = myRow
        myRow = myRow + 1
    Next cnt
    
  ' Convert to text
    Range("A4:A" & myRow).Value = Range("A4:A" & myRow).Value
    Range("B3:B" & myRow).ClearContents
    
  ' Count up instances of each word
    myRow = 4
    For cnt = 1 To myLen
        Range("B" & myRow).Formula = "=COUNTIF(A:A,A" & myRow & ")"
        myRow = myRow + 1
    Next cnt
    
  ' Finish calculations, clean up form
    Range("B4:B" & myRow).Value = Range("B4:B" & myRow).Value
    
    myRow = 4
    For cnt = 1 To myLen
        myWord = Range("A" & myRow).Value
        If Evaluate("=countif(A$3:A" & myRow - 1 & ",""" & myWord & """)") > 0 Then
            Range("A" & myRow & ":C" & myRow).ClearContents
        End If
        myRow = myRow + 1
    Next cnt
    
    Application.Wait (Now + TimeValue("00:00:00"))
    Range("A4:C" & myRow).Sort Key1:=Range("C4"), Order1:=xlAscending, Header:=xlNo, OrderCustom:=1, _
        MatchCase:=False, Orientation:=xlTopToBottom, DataOption1:=xlSortNormal
    Range("C4:C" & myRow).ClearContents
End Sub

To use this, just start a new workbook and stick a command button on it, and put this code into the command button's code area. Then put the text you want to analyze into cell A1, and hit the command button to execute the code.

Let me know if this is helpful at all. I tested it and it seems to work fine...
 
Upvote 0
Rob P.,

Before the macro:

Excel Workbook
ABC
1How would I group and count the occurences of specific words in a text field? This is my first step in trying to develop a scoring method based on our notes.
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Sheet1



After the macro:


Excel Workbook
ABC
1How would I group and count the occurences of specific words in a text field? This is my first step in trying to develop a scoring method based on our notes.ListNbr of
2How1
3would1
4I1
5group1
6and1
7count1
8the1
9occurences1
10of1
11specific1
12words1
13in2
14a2
15text1
16field1
17This1
18is1
19my1
20first1
21step1
22trying1
23to1
24develop1
25scoring1
26method1
27based1
28on1
29our1
30notes1
Sheet1




Please TEST this FIRST in a COPY of your workbook (always make a backup copy before trying new code, you never know what you might lose).

Press and hold down the 'ALT' key, and press the 'F11' key.

On the 'Insert' menu, click 'Module'.

Copy the below code, and paste it into the Module (on the right pane).


Code:
Option Explicit
Sub CreateWordList()
Dim NR As Long, i As Long, LR As Long
Dim Hold As String
Application.ScreenUpdating = False
Cells(1, 2) = "List"
NR = 2: Hold = ""
For i = 1 To Len(Cells(1, 1)) Step 1
  If Mid(Cells(1, 1), i, 1) <> " " And Mid(Cells(1, 1), i, 1) <> "." And Mid(Cells(1, 1), i, 1) <> "?" Then
    Hold = Hold + Mid(Cells(1, 1), i, 1)
  Else
    If Hold <> "" Then
      Cells(NR, 2) = Hold
      Hold = ""
      NR = NR + 1
    End If
  End If
Next i
Range("B1:B" & NR).AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("C1"), Unique:=True
Range("D1") = "Nbr of"
LR = Cells(Rows.Count, 3).End(xlUp).Row
With Range("D2:D" & LR)
  .FormulaR1C1 = "=COUNTIF(R2C2:R" & NR & "C2,RC3)"
  .Value = .Value
End With
Range("B1").EntireColumn.Delete
Range("B1:C" & LR).Columns.AutoFit
Application.ScreenUpdating = True
End Sub


Then run the "CreateWordList" macro.


Have a great day,
Stan
 
Upvote 0
Another option.

In the VBEditor in Tools->References enable the references to the libraries "Microsoft VBScript Regular Expressions" and "Microsoft Scripting Runtime".

Assuming your text in A1, run:

Code:
Sub WordsFreq()
Dim oMatches As MatchCollection, oMatch As Match
 
With New RegExp
    .Pattern = "[a-zA-Z]+"
    .Global = True
    Set oMatches = .Execute(Range("A1"))
End With
 
With New Dictionary
    For Each oMatch In oMatches
        .Item(oMatch.Value) = .Item(oMatch.Value) + 1
    Next oMatch
    
    Range("C2").Resize(.Count, 2).Value = Application.Transpose(Array(.Keys, .Items))
End With
End Sub
 
Upvote 0
Ossuary, thanks for the code, it's functionally what I want. I can spend a lot of time figuring out the next step or I can just ask...
I would like this to be more flexible. I'm going to be analyzing thousands of notes and I'd like to be able to import all the notes into excel horizontally and have the independent word counts fill in horizontally accross the page.

I have an example of how the output would look. I'll try and remember how to post it...

Rob
 
Upvote 0

Forum statistics

Threads
1,213,557
Messages
6,114,293
Members
448,564
Latest member
ED38

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top