Decomposing Text Strings

DoctorZoom

New Member
Joined
Jan 19, 2016
Messages
13
Hi folks,

I'm looking to search a specific field in from our trouble tickets - which is a free form text field entered by our agents and is always unique since they are writing in their own words what the customer is reporting.

I have the ability to break these tickets into high level categories but there are still thousands of tickets per day per category.

When trying to determine why we have a spike in tickets or when trying to do a deeper dive into a specific category, I need to be able to find the most common text strings within the field.

I was working with Fishboy but we only got so far.

http://www.mrexcel.com/forum/excel-...t-common-words-phrases-field.html#post4402083

He pointed out some code at:
Find/Count Most Common Words & Phrases In List

But my issue is similar to the 2nd post there that was never answered:

How do I break down the row into larger chunks than just the individual words?

For example, if my row was A B C - I want to return:

A - 1
B - 1
C - 1
A B - 1
B C - 1
A B C - 1

(AC could work but if that is too complicated, then just words that are consecutive)

Or if the row was "The Small Red Fox" - I'm looking to return the following 10 rows:

The
Small
Red
Fox
The Small
Small Red
Red Fox
The Small Red
Small Red Fox
The Small Red Fox

This would help tremendously in finding the most common customer problems from the free form text field entered by our agents.

Ideally a solution that doesn't use VBA would be best but if I need to use, that will be fine but I need to take what exists in the post (in link) to do this further step of breaking out "phrases"

Thanks for your help!
John
 

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
Looks to me like Fishboy is still trying to help you. I think you should stick to that thread.

Dom
 
Upvote 0
Workaround: you can perform a similar search for two consecutive words if you delete the 1st, 3rd etc. then the 2nd, 4th etc. spaces or replace them with another character.
 
Upvote 0
.. then just words that are consecutive
For that, and assuming ..
- original data in column A starting at row 2
- nothing to the right of column A that needs preserving
.. then try this in a copy of your workbook.

Rich (BB code):
Sub split_text()
  Dim a, b, bits
  Dim i As Long, j As Long, k As Long, c As Long, sze As Long, uba As Long, ubb As Long, wrds As Long
  Dim s As String

  a = Range("A2", Range("A" & Rows.Count).End(xlUp)).Value
  ActiveSheet.UsedRange.Offset(, 1).ClearContents
  uba = UBound(a)
  ReDim b(1 To uba, 0 To 0)
  For i = 1 To uba
    c = 0
    bits = Split(a(i, 1))
    wrds = UBound(bits) + 1
    If (wrds + 1) * wrds / 2 > ubb Then
      ubb = (wrds + 1) * wrds / 2
      ReDim Preserve b(1 To uba, 1 To ubb)
    End If
    For sze = 1 To wrds
      For j = 0 To wrds - sze
        s = vbNullString
        For k = 1 To sze
          s = s & " " & bits(j + k - 1)
        Next k
        c = c + 1
        b(i, c) = Mid(s, 2)
      Next j
    Next sze
  Next i
  With Range("B2").Resize(uba, ubb)
    .Value = b
    .EntireColumn.AutoFit
  End With
End Sub

Sample results:

Excel Workbook
ABCDEFGHIJK
1Original Data
2The Small Red FoxTheSmallRedFoxThe SmallSmall RedRed FoxThe Small RedSmall Red FoxThe Small Red Fox
3BlueBlue
4
5I like applesIlikeapplesI likelike applesI like apples
Sheet1
 
Upvote 0
Hi Peter,

What I need to do is look through thousands of cells and find the most common phrases.
This site provides a great example of the output I am looking for... only I need to do this within excel for thousands of cells :(

[FONT="][URL="http://www.online-utility.org/text/analyzer.jsp"]http://www.online-utility.org/<wbr>text/analyzer.jsp[/URL][/FONT]
Is this a further question, or have we done what you want?

If a further question, can you elaborate and provide small example(s)?
 
Upvote 0
Is this a further question, or have we done what you want?

If a further question, can you elaborate and provide small example(s)?

Still haven't solved the problem.

Please go to the site I provided and put a paragraph or two of text (from anywhere) and you will see the output. That is the exact output I am looking for but I need to do this in Excel and for thousands of cells.

Thanks!
 
Upvote 0
Is this a further question, or have we done what you want?

If a further question, can you elaborate and provide small example(s)?


This is sample output after copying and pasting text from an article:
Note: It will also show you top phrases of more than 3 words if they exist.

Some top phrases containing 3 words (without punctuation marks) Occurencies
why he is 2
push and push 2

<tbody>
</tbody>

Some top phrases containing 2 words (without punctuation marks) Occurencies
about his 2
in the 2
and push 2
chelsea does 2
why he 2
he is 2
push and 2

<tbody>
</tbody>
Unfiltered wordcount :
Order Unfiltered wordcount Occurrences Percentage
1. and 6 3.8217
2. to 5 3.1847
3. he 4 2.5478
4. in 4 2.5478
5. she 4 2.5478
6. was 4 2.5478
7. the 4 2.5478
8. a 3 1.9108
9. is 3 1.9108
10. about 3 1.9108
11. that 3 1.9108
12. push 3 1.9108
13. i 2 1.2739
14. no 2 1.2739
15. of 2 1.2739
16. 2 1.2739
17. father 2 1.2739
18. episode 2 1.2739
19. chelsea 2 1.2739
20. her 2 1.2739
21. him 2 1.2739
22. his 2 1.2739
23. does 2 1.2739
24. for 2 1.2739
25. people 2 1.2739
26. he's 2 1.2739
27. who 2 1.2739
28. why 2 1.2739
29. like 2 1.2739
30. it's 1 0.6369
31. countries 1 0.6369
32. cities 1 0.6369
33. talked 1 0.6369
34. ex 1 0.6369
35. by 1 0.6369
36. be 1 0.6369
37. at 1 0.6369
38. as 1 0.6369
39. al 1 0.6369
40. an 1 0.6369
41. me 1 0.6369
42. my 1 0.6369
43. so 1 0.6369
44. happening 1 0.6369
45. mouthpiece 1 0.6369
46. sexual 1 0.6369
47. boyfriend 1 0.6369
48. wasn't 1 0.6369
49. make 1 0.6369
50. around 1 0.6369
51. african 1 0.6369
52. most 1 0.6369
53. mine 1 0.6369
54. reassure 1 0.6369
55. position 1 0.6369
56. encountered 1 0.6369
57. israelis 1 0.6369
58. unfortunately 1 0.6369
59. jewish 1 0.6369
60. with 1 0.6369
61. especially 1 0.6369
62. knows 1 0.6369
63. what 1 0.6369
64. weird 1 0.6369
65. appetite 1 0.6369
66. jews 1 0.6369
67. since 1 0.6369
68. schisms 1 0.6369
69. across 1 0.6369
70. says 1 0.6369
71. wanted 1 0.6369
72. such 1 0.6369
73. sharpton 1 0.6369
74. there 1 0.6369
75. tackles 1 0.6369
76. talk 1 0.6369
77. also 1 0.6369
78. they 1 0.6369
79. talking 1 0.6369
80. likes 1 0.6369
81. americans 1 0.6369
82. israel 1 0.6369
83. tricky 1 0.6369
84. length 1 0.6369
85. are 1 0.6369
86. realize 1 0.6369
87. racism 1 0.6369
88. pushy 1 0.6369
89. pleasure 1 0.6369
90. god 1 0.6369
91. fun 1 0.6369
92. going 1 0.6369
93. wonder 1 0.6369
94. other 1 0.6369
95. sex 1 0.6369
96. have 1 0.6369
97. not 1 0.6369
98. marriage 1 0.6369
99. america 1 0.6369
100. you 1 0.6369
101. handler 1 0.6369
102. problem 1 0.6369
103. life 1 0.6369
104. struck 1 0.6369
105. personal 1 0.6369
106. interviews 1 0.6369

<tbody>
</tbody>
 
Upvote 0

Forum statistics

Threads
1,215,011
Messages
6,122,677
Members
449,092
Latest member
tayo4dgacorbanget

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top