# Mining Text Fields

#### Rob P.

##### New Member
How would I group and count the occurences of specific words in a text field? This is my first step in trying to develop a scoring method based on our notes.

If the sentences above were in the text field, I'd expect the words "a" and "in" to come back with a count of two while all other words would come back with a count of 1.

Rob

### Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest

#### stanleydgromjr

##### Banned
Rob P.,

If I understand you correctly.

Excel Workbook
ABC
1How would I group and count the occurences of specific words in a text field? This is my first step in trying to develop a scoring method based on our notes.22
Sheet1

Have a great day,
Stan

##### Well-known Member
Hello,

Stan is on the right lines, but I think his formulas would be incorrect, if words end in a or start with in.

Try in B3

=(LEN(\$A\$1)-LEN(SUBSTITUTE(\$A\$1," "&A3&" "," ")))/LEN(A3)

where A3 = a

and copy down with other words in col A

just realizees mine would be wrong if the word is at the beginning or at the end of the cell.

#### jeffreybrown

##### Well-known Member

Excel Workbook
EFG
22How would I group and banana the occurences of specific words in a text field? This is my first step in trying to develop a scoring method based on our notes.22
Average

##### Well-known Member
Hello,

jrb has the same problem I had, it doesn't count the start or end words.

try

=IF(LEFT(\$A\$1,LEN(A3))=A3,1,0)+IF(RIGHT(\$A\$1,LEN(A3))=A3,1,0)+(LEN(\$A\$1)-LEN(SUBSTITUTE(\$A\$1," "&A3&" "," ")))/LEN(A3)

i'm sure there is a slicker formula out there.

=(LEN(\$A\$1)+2-LEN(SUBSTITUTE(SUBSTITUTE(\$A\$1,\$A\$1," "&\$A\$1&" ")," "&A3&" "," ")))/LEN(A3)

is about as good as I can get it

Last edited:

#### ossuary

##### Active Member
Hi,

Are you looking to analyze every single word in the text field to determine how many times they occur, or only looking for specific words? This is a little messy, but this will take the text field and tell you how many times every single word in the box occurs... you should be able to modify it from there to take exactly what information you want from it.

Code:
``````Private Sub CommandButton1_Click()
' Clear form and setup variables
Range("A3:C10001").ClearContents
myVal = ""
myLen = 0

' Copy the original text to a temp range (so you don't mess up the original)
Range("B3").Value = Range("A1").Value

' Remove all non-standard characters (punctuation, etc.), so they aren't counted as part of a word
myVal = Range("B3").Value
myVal = Replace(myVal, ",", "")
myVal = Replace(myVal, ".", "")
myVal = Replace(myVal, ";", "")
myVal = Replace(myVal, ":", "")
myVal = Replace(myVal, "'", "")
myVal = Replace(myVal, "!", "")
myVal = Replace(myVal, "?", "")
myVal = Replace(myVal, "#", "")
myVal = Replace(myVal, "\$", "")
myVal = Replace(myVal, "%", "")
myVal = Replace(myVal, "^", "")
myVal = Replace(myVal, "&", "")
myVal = Replace(myVal, "*", "")
myVal = Replace(myVal, "(", "")
myVal = Replace(myVal, ")", "")
myVal = Replace(myVal, "-", "")
myVal = Replace(myVal, "_", "")
myVal = Replace(myVal, "+", "")
myVal = Replace(myVal, "=", "")
myVal = Replace(myVal, "[", "")
myVal = Replace(myVal, "]", "")
myVal = Replace(myVal, "{", "")
myVal = Replace(myVal, "}", "")
myVal = Replace(myVal, "\", "")
myVal = Replace(myVal, "|", "")
myVal = Replace(myVal, "/", "")
myVal = Replace(myVal, "?", "")
myVal = Replace(myVal, "<", "")
myVal = Replace(myVal, ">", "")
myVal = Replace(myVal, "`", "")
myVal = Replace(myVal, "~", "")
Range("B3").Value = myVal

' Determine how many words there are in the statement
myLen = Len(myVal) - Len(Replace(myVal, " ", "")) + 1

' Make sure more than one word was entered in A1
If myLen <= 1 Then Exit Sub

' Break the text apart into individual words
myRow = 4
For cnt = 1 To myLen
Range("A" & myRow).Formula = "=IF(ISERROR(LEFT(B" & myRow - 1 & ",FIND("" "",B" & myRow - 1 & ")-1)),B" & myRow - 1 & ",LEFT(B" & myRow - 1 & ",FIND("" "",B" & myRow - 1 & ")-1))"
Range("B" & myRow).Formula = "=IF(ISERROR(RIGHT(B" & myRow - 1 & ",LEN(B" & myRow - 1 & ")-LEN(A" & myRow & ")-1)),"""",RIGHT(B" & myRow - 1 & ",LEN(B" & myRow - 1 & ")-LEN(A" & myRow & ")-1))"
Range("C" & myRow).Value = myRow
myRow = myRow + 1
Next cnt

' Convert to text
Range("A4:A" & myRow).Value = Range("A4:A" & myRow).Value
Range("B3:B" & myRow).ClearContents

' Count up instances of each word
myRow = 4
For cnt = 1 To myLen
Range("B" & myRow).Formula = "=COUNTIF(A:A,A" & myRow & ")"
myRow = myRow + 1
Next cnt

' Finish calculations, clean up form
Range("B4:B" & myRow).Value = Range("B4:B" & myRow).Value

myRow = 4
For cnt = 1 To myLen
myWord = Range("A" & myRow).Value
If Evaluate("=countif(A\$3:A" & myRow - 1 & ",""" & myWord & """)") > 0 Then
Range("A" & myRow & ":C" & myRow).ClearContents
End If
myRow = myRow + 1
Next cnt

Application.Wait (Now + TimeValue("00:00:00"))
Range("A4:C" & myRow).Sort Key1:=Range("C4"), Order1:=xlAscending, Header:=xlNo, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom, DataOption1:=xlSortNormal
Range("C4:C" & myRow).ClearContents
End Sub``````

To use this, just start a new workbook and stick a command button on it, and put this code into the command button's code area. Then put the text you want to analyze into cell A1, and hit the command button to execute the code.

Let me know if this is helpful at all. I tested it and it seems to work fine...

#### stanleydgromjr

##### Banned
Rob P.,

Before the macro:

Excel Workbook
ABC
1How would I group and count the occurences of specific words in a text field? This is my first step in trying to develop a scoring method based on our notes.
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Sheet1

After the macro:

Excel Workbook
ABC
1How would I group and count the occurences of specific words in a text field? This is my first step in trying to develop a scoring method based on our notes.ListNbr of
2How1
3would1
4I1
5group1
6and1
7count1
8the1
9occurences1
10of1
11specific1
12words1
13in2
14a2
15text1
16field1
17This1
18is1
19my1
20first1
21step1
22trying1
23to1
24develop1
25scoring1
26method1
27based1
28on1
29our1
30notes1
Sheet1

Please TEST this FIRST in a COPY of your workbook (always make a backup copy before trying new code, you never know what you might lose).

Press and hold down the 'ALT' key, and press the 'F11' key.

On the 'Insert' menu, click 'Module'.

Copy the below code, and paste it into the Module (on the right pane).

Code:
``````Option Explicit
Sub CreateWordList()
Dim NR As Long, i As Long, LR As Long
Dim Hold As String
Application.ScreenUpdating = False
Cells(1, 2) = "List"
NR = 2: Hold = ""
For i = 1 To Len(Cells(1, 1)) Step 1
If Mid(Cells(1, 1), i, 1) <> " " And Mid(Cells(1, 1), i, 1) <> "." And Mid(Cells(1, 1), i, 1) <> "?" Then
Hold = Hold + Mid(Cells(1, 1), i, 1)
Else
If Hold <> "" Then
Cells(NR, 2) = Hold
Hold = ""
NR = NR + 1
End If
End If
Next i
Range("B1:B" & NR).AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("C1"), Unique:=True
Range("D1") = "Nbr of"
LR = Cells(Rows.Count, 3).End(xlUp).Row
With Range("D2:D" & LR)
.FormulaR1C1 = "=COUNTIF(R2C2:R" & NR & "C2,RC3)"
.Value = .Value
End With
Range("B1").EntireColumn.Delete
Range("B1:C" & LR).Columns.AutoFit
Application.ScreenUpdating = True
End Sub``````

Then run the "CreateWordList" macro.

Have a great day,
Stan

#### pgc01

##### MrExcel MVP
Another option.

In the VBEditor in Tools->References enable the references to the libraries "Microsoft VBScript Regular Expressions" and "Microsoft Scripting Runtime".

Assuming your text in A1, run:

Code:
``````Sub WordsFreq()
Dim oMatches As MatchCollection, oMatch As Match

With New RegExp
.Pattern = "[a-zA-Z]+"
.Global = True
Set oMatches = .Execute(Range("A1"))
End With

With New Dictionary
For Each oMatch In oMatches
.Item(oMatch.Value) = .Item(oMatch.Value) + 1
Next oMatch

Range("C2").Resize(.Count, 2).Value = Application.Transpose(Array(.Keys, .Items))
End With
End Sub``````

#### stanleydgromjr

##### Banned
PGC,

This site is such a great learning environment.

Thanks, and have a great day,
Stan

#### Rob P.

##### New Member
Ossuary, thanks for the code, it's functionally what I want. I can spend a lot of time figuring out the next step or I can just ask...
I would like this to be more flexible. I'm going to be analyzing thousands of notes and I'd like to be able to import all the notes into excel horizontally and have the independent word counts fill in horizontally accross the page.

I have an example of how the output would look. I'll try and remember how to post it...

Rob

Replies
9
Views
307
Replies
8
Views
438
Replies
23
Views
514
Replies
6
Views
650
Replies
5
Views
1K

1,190,697
Messages
5,982,352
Members
439,776
Latest member
mathewduffy

### We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.

### Which adblocker are you using?

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

### Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

### Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back