Finding the most common words or phrases in a field

DoctorZoom

New Member
Joined
Jan 19, 2016
Messages
13
Hi, I'm doing troubleshooting on customer trouble tickets and without having to read through thousands of them, I'd like to be able to search for common words or phrases with a count in a particular field of the trouble ticket. All the tickets are in a excel spreadsheet and I want to do this for one field (column) called "customer narrative" Help would be greatly appreciated! Thanks
 

Excel Facts

Formula for Yesterday
Name Manager, New Name. Yesterday =TODAY()-1. OK. Then, use =YESTERDAY in any cell. Tomorrow could be =TODAY()+1.
Hi, I'm doing troubleshooting on customer trouble tickets and without having to read through thousands of them, I'd like to be able to search for common words or phrases with a count in a particular field of the trouble ticket. All the tickets are in a excel spreadsheet and I want to do this for one field (column) called "customer narrative" Help would be greatly appreciated! Thanks
Hi DoctorZoom, welcome to the boards.

This can be achieved with an array formula (entered with CTRL+SHIFT+ENTER, not just ENTER) as follows:

=INDEX(A1:A100,MODE(MATCH(A1:A100,A1:A100,0)))

You would obviously need to amend the ranges it is looking at, and the formula should not go in a cell which falls inside the range you are looking at to prevent circular referencing. If you have successfully entered the formula using CTRL+SHIFT+ENTER Excel will automatically put curly brackets around it in the formula bar like this:

{=INDEX(A1:A100,MODE(MATCH(A1:A100,A1:A100,0)))}

Please note: Do not add the curly brackets in manually, that will not have the same effect as CTRL+SHIFT+ENTER
 
Upvote 0
Hi DoctorZoom, welcome to the boards.

This can be achieved with an array formula (entered with CTRL+SHIFT+ENTER, not just ENTER) as follows:

=INDEX(A1:A100,MODE(MATCH(A1:A100,A1:A100,0)))

You would obviously need to amend the ranges it is looking at, and the formula should not go in a cell which falls inside the range you are looking at to prevent circular referencing. If you have successfully entered the formula using CTRL+SHIFT+ENTER Excel will automatically put curly brackets around it in the formula bar like this:

{=INDEX(A1:A100,MODE(MATCH(A1:A100,A1:A100,0)))}

Please note: Do not add the curly brackets in manually, that will not have the same effect as CTRL+SHIFT+ENTER

Thanks very much!

I tried it with a test sample of movie titles. I used same range in formula and these sample cells repeated until I filled out the full A1:A100 range.

When I enter the formula, in a cell outside of column A, I see that the whole range of a1:a100 highlighted.

These 13 cells are repeated until a100
Straight Outta Compton
Spare Parts
Room
Trumbo
The Martian
Spotlight
Self/less
Child 44
Steve Jobs
Dark Places
Irrational Man
Uncanny
The Creed

<colgroup><col></colgroup><tbody>
</tbody>

When I enter the formula using CTRL SHFT ENTER - it does put correct brackets around formula but the formula returns only the 1st entry of "Straight Outta Compton"??

What I am looking to do is output the most frequent words along with the count of the words and it would be great if I could output common phrases and counts - i.e., "hit the refresh button" may appear 500 times in my narrative field - I'd like to see the actual phrase outputted along with a count. Maybe something like outputting the most common word and counts and then most common 2 words and count, etc. Or most common few words that appear in order within a cell and the count.

I need to be able to look for a pattern of issues our customers may be reporting based on the text entered by our agents (free form) - given the above type of formula - I can quickly get a read on what customers are complaining about the most within a higher level category we have for our trouble tickets. For example, I can filter the data by customer education questions but would need a good formula to pull out the most common questions.

Sorry for the long reply - appreciate you help!
John
 
Upvote 0
Thanks very much!

I tried it with a test sample of movie titles. I used same range in formula and these sample cells repeated until I filled out the full A1:A100 range.

When I enter the formula, in a cell outside of column A, I see that the whole range of a1:a100 highlighted.

These 13 cells are repeated until a100
Straight Outta Compton
Spare Parts
Room
Trumbo
The Martian
Spotlight
Self/less
Child 44
Steve Jobs
Dark Places
Irrational Man
Uncanny
The Creed

<tbody>
</tbody>

When I enter the formula using CTRL SHFT ENTER - it does put correct brackets around formula but the formula returns only the 1st entry of "Straight Outta Compton"??

What I am looking to do is output the most frequent words along with the count of the words and it would be great if I could output common phrases and counts - i.e., "hit the refresh button" may appear 500 times in my narrative field - I'd like to see the actual phrase outputted along with a count. Maybe something like outputting the most common word and counts and then most common 2 words and count, etc. Or most common few words that appear in order within a cell and the count.

I need to be able to look for a pattern of issues our customers may be reporting based on the text entered by our agents (free form) - given the above type of formula - I can quickly get a read on what customers are complaining about the most within a higher level category we have for our trouble tickets. For example, I can filter the data by customer education questions but would need a good formula to pull out the most common questions.

Sorry for the long reply - appreciate you help!
John
Hi John,

The problem you will find using your testing method is that every film in your list up as far as Steve Jobs will all be repeated exactly 8 times, meaning that in truth there are 8 values that are all the "most common" value. Obviously I appreciate that in your real data there will obviously be many duplicates of many different entries (that is kinda the point of this after all), but I would think that there will most likely be an outright "most common" value.

That said, assuming the list of films are in A1:A100, in B1 use the following array formula entered with CTRL+SHIFT+ENTER to show the "most common" value:

=IFERROR(INDEX(A1:A100,MODE(MATCH(A1:A100,A1:A100,0)+{0,0})),"")

Then, in B2 enter the following array formula, again entered with CTRL+SHIFT+ENTER to show the second "most common" value:

=IFERROR(INDEX(A$1:A$100,MODE(IF(COUNTIF(B$1:B1,A$1:A$100)=0,MATCH(A$1:A$100,A$1:A$100,0)+{0,0}))),"")

You can drag-fill this down from B2 to increase how many items you want to find, so you could find the third, forth, fifth or whatever most common value. This will obviously work better with your real data than with the list of films unless the list were more diverse and not with 8 values all equally common.

Finally, with regards to having a count of the values, that would simply need a different formula in the adjacent cell to the "most common" result (we will assume this was in B1) which would be =COUNTIF($A$1:$A$100,B1) and drag-fill it down the length of the list in column B to see how many of each value there were.

Is that any better for you?
 
Upvote 0
Hi John,

The problem you will find using your testing method is that every film in your list up as far as Steve Jobs will all be repeated exactly 8 times, meaning that in truth there are 8 values that are all the "most common" value. Obviously I appreciate that in your real data there will obviously be many duplicates of many different entries (that is kinda the point of this after all), but I would think that there will most likely be an outright "most common" value.

That said, assuming the list of films are in A1:A100, in B1 use the following array formula entered with CTRL+SHIFT+ENTER to show the "most common" value:

=IFERROR(INDEX(A1:A100,MODE(MATCH(A1:A100,A1:A100,0)+{0,0})),"")

Then, in B2 enter the following array formula, again entered with CTRL+SHIFT+ENTER to show the second "most common" value:

=IFERROR(INDEX(A$1:A$100,MODE(IF(COUNTIF(B$1:B1,A$1:A$100)=0,MATCH(A$1:A$100,A$1:A$100,0)+{0,0}))),"")

You can drag-fill this down from B2 to increase how many items you want to find, so you could find the third, forth, fifth or whatever most common value. This will obviously work better with your real data than with the list of films unless the list were more diverse and not with 8 values all equally common.

Finally, with regards to having a count of the values, that would simply need a different formula in the adjacent cell to the "most common" result (we will assume this was in B1) which would be =COUNTIF($A$1:$A$100,B1) and drag-fill it down the length of the list in column B to see how many of each value there were.

Is that any better for you?


Thanks again!

We are getting much closer :)

Formulas worked great when field I am searching has fixed values - for example, I tested using type of films (doc, drama, comedy, etc) and got a list of most frequent types (most to least) and the counts.

Issue I have is that the field I am looking at in our trouble tickets is free form and not one will exactly be the same. I put a few free form text fields into the array and the formula pulls each one out as a separate instance with all the words in the field - so you are getting a duplicate of all the fields (since they are unique and have a count of 1)

That's why I am trying to see the most common "words" within the field I am looking at and also get a list and count of 2 or 3 or 4, etc. words that appear within the same cell.

For example:

4 agents would enter the following text for similar problems from 4 customers:

CX states that when they press menu button on remote nothing happens
press menu on remote doesn't work
program remote problems
menu no go

I would need to see from this small sample that "menu" appears in three cells (count =3)
"press" appears in 2 cells (count=2)
But what I need to know is how many cells have both "press" and "menu" in them?
If it is too difficult to determine sets of words appearing in cells - then it would still be helpful if we could pull out similar phrases / text strings (i.e., "press menu") appears in 2 cells (count=2)

Hence you can see my predicament :(

Thanks again!
 
Upvote 0
Hmm, that is another matter entirely, and one which I am not sure formulas alone will be able to handle.

I found the following useful thread here over on the OzGrid forums which has some VBA that might help.

To test it out I created a new workbook, then copied your example from your last post into column A of the new document. Next I copy / pasted the following VBA code into a standard module and ran it, which resulted in column B showing each separate word used in column A, and column C showing how many times each of those words appears in column A. I suspect this may be the closest in definition to what you are trying to record:

Code:
Sub HTH()
    Dim vArray As Variant
    Dim lLoop As Long
    Dim rCell As Range
     
    With CreateObject("Scripting.Dictionary")
        For Each rCell In Range("A1", Cells(Rows.Count, "A").End(xlUp))
            vArray = Split(rCell.Value, " ")
            For lLoop = LBound(vArray) To UBound(vArray)
                If Not .exists(vArray(lLoop)) Then
                    .Add vArray(lLoop), 1
                Else
                    .Item(vArray(lLoop)) = .Item(vArray(lLoop)) + 1
                End If
            Next lLoop
        Next rCell
        Range("B1").Resize(.Count).Value = Application.Transpose(.keys)
        Range("C1").Resize(.Count).Value = Application.Transpose(.items)
    End With
End Sub

For a better explanation of what it is doing and / or how it works, read through the link I posted above to see what the author of the code himself has to say.
 
Upvote 0
Hmm, that is another matter entirely, and one which I am not sure formulas alone will be able to handle.

I found the following useful thread here over on the OzGrid forums which has some VBA that might help.

To test it out I created a new workbook, then copied your example from your last post into column A of the new document. Next I copy / pasted the following VBA code into a standard module and ran it, which resulted in column B showing each separate word used in column A, and column C showing how many times each of those words appears in column A. I suspect this may be the closest in definition to what you are trying to record:

Code:
Sub HTH()
    Dim vArray As Variant
    Dim lLoop As Long
    Dim rCell As Range
     
    With CreateObject("Scripting.Dictionary")
        For Each rCell In Range("A1", Cells(Rows.Count, "A").End(xlUp))
            vArray = Split(rCell.Value, " ")
            For lLoop = LBound(vArray) To UBound(vArray)
                If Not .exists(vArray(lLoop)) Then
                    .Add vArray(lLoop), 1
                Else
                    .Item(vArray(lLoop)) = .Item(vArray(lLoop)) + 1
                End If
            Next lLoop
        Next rCell
        Range("B1").Resize(.Count).Value = Application.Transpose(.keys)
        Range("C1").Resize(.Count).Value = Application.Transpose(.items)
    End With
End Sub

For a better explanation of what it is doing and / or how it works, read through the link I posted above to see what the author of the code himself has to say.

Just want to thank your for all your help - I appreciate the time you took to try and help me!

Found this site that does exactly what I need but is limited and can't be used but it gets my point across of what is needed.
I just need to do it within excel for thousands of cells :(
[FONT=&quot]http://www.online-utility.org/<wbr>text/analyzer.jsp[/FONT]
 
Upvote 0
Just want to thank your for all your help - I appreciate the time you took to try and help me!

Found this site that does exactly what I need but is limited and can't be used but it gets my point across of what is needed.
I just need to do it within excel for thousands of cells :(
http://www.online-utility.org/<wbr>text/analyzer.jsp
Hi John,

No worries, I am happy to help if and where I can.

The closest I can come to that is the code I provided last time, which looks down column A (you can obviously change which column it looks at) and breaks each entry down to individual words and counts how many times each word occurred. In theory you could then add more code to the end which tried to order the list by the count so the most common appear at the top working down to the least common.

Would that change to the code I previously linked be any better for you? If so I can see what tweaks can be made to achieve that goal.
 
Upvote 0
Hi John,

No worries, I am happy to help if and where I can.

The closest I can come to that is the code I provided last time, which looks down column A (you can obviously change which column it looks at) and breaks each entry down to individual words and counts how many times each word occurred. In theory you could then add more code to the end which tried to order the list by the count so the most common appear at the top working down to the least common.

Would that change to the code I previously linked be any better for you? If so I can see what tweaks can be made to achieve that goal.

Thanks

But I need to capture phrases like the web site link I sent you.
I did find this:
The Spreadsheet Page Excel Downloads: Generate A Word Frequency List

This does single word analysis but still falls short of what I need.

I'm checking if my company has a copy of clarabridge - sounds like this SW would do the trick...

Bottom line - anything I do in Excel would have to have same output as:
Text Analyzer - Text analysis Tool - Counts Frequencies of Words, Characters, Sentences and Syllables

thx
 
Upvote 0
Thanks

But I need to capture phrases like the web site link I sent you.
I did find this:
The Spreadsheet Page Excel Downloads: Generate A Word Frequency List

This does single word analysis but still falls short of what I need.

I'm checking if my company has a copy of clarabridge - sounds like this SW would do the trick...

Bottom line - anything I do in Excel would have to have same output as:
Text Analyzer - Text analysis Tool - Counts Frequencies of Words, Characters, Sentences and Syllables

thx
Hi again John,

Unfortunately that degree of detail is beyond my current coding capabilities as really I am only just learning this myself. As the moderator Peter mentioned in your other thread, something as complicated as this will take one of the real expert gurus to help you and it is quite possible that you may struggle to get that level of help and time commitment from a free forum.

I apologise that my suggestions were not suitable, but I think that this is probably as far as I can go.

I wish you luck with finding a solution.

Regards,

Fishboy
 
Upvote 0

Forum statistics

Threads
1,215,514
Messages
6,125,273
Members
449,220
Latest member
Excel Master

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top