Excel Formula - Comma Separated List of all Words Occurring over 5 times in a column

Steve_Smith_1123

New Member
Joined
May 6, 2022
Messages
10
Office Version
  1. 365
Platform
  1. Windows
  2. MacOS
Hello,
Hope you're all doing well. I think it is more because I am not phrasing the question the right way, but I can't find a way to get a list of all string values (case agnostic) that occur more than a certain number of times. The data set consists of over 4,000 lines.

Thanks in advance for any help you can provide

What are you looking for exactly Steve?:
  1. A list of values, either in comma separated or in a range.
  2. All string values that occur in the column more than four times (case agnostic).
Can you provide a sample of how the data looks?:



French Calls Getting Closed Message
New Queue FRL_AIG UL/ROP/GP95. - AIG - 5
RoadsideAssistance_Bot_prod Lex Tuning - EN-23
TopBox metadata in CTR - TRN-92
Agency Sales Mailbox VM coming in as Raw Text
Timed ACW does not appear to be working on some agents
AWS Connect IVR enhancements for Customer Support
798720: Post Go-Live Report Attributes: PI2 Priority 2 of 7
RoadsideAssistance_Bot_prod Intent expansion - NewReservation_prod
 

Excel Facts

Formula for Yesterday
Name Manager, New Name. Yesterday =TODAY()-1. OK. Then, use =YESTERDAY in any cell. Tomorrow could be =TODAY()+1.
Can you provide desired outcome of this dataset?
 
Upvote 0
Those words occur just once..
 
Upvote 0
I was just giving an example of how the data set would look. That's what you wanted right?
 
Upvote 0
Only for office 365

Book1
ABC
1DataOutput
2French Calls Getting Closed MessageFrench
3New Queue FRL_AIG UL/ROP/GP95. - AIG - 5Calls
4RoadsideAssistance_Bot_prod Lex Tuning - EN-23
5TopBox metadata in CTR - TRN-92
6Agency Sales Mailbox VM coming in as Raw Text
7Timed ACW does not appear to be working on some agents
8AWS Connect IVR enhancements for Customer Support
9798720: Post Go-Live Report Attributes: PI2 Priority 2 of 7
10RoadsideAssistance_Bot_prod Intent expansion - NewReservation_prod
11French toast
12blabla TopBox
13as;ldkfjasl;k
14adskfja;sdf
15Report like
16the calls are getting boring
17boring calls
18French cheese
19its timed well
20blabla timed
21french timed toast
22TopBox is fine
23Report
24Calls
25Calls
26French
Sheet3
Cell Formulas
RangeFormula
C2:C3C2=LET(r,REDUCE("",A2:A26,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,," ")))),UNIQUE(FILTER(r,(LEN(r)>1)*BYROW(r,LAMBDA(x,COUNTA(FILTER(r,r=x))))>4)))
Dynamic array formulas.
 
Upvote 0
Or

Excel Formula:
=LET(r,TEXTSPLIT(TEXTJOIN(" ",,A2:A26),," "),UNIQUE(FILTER(r,(LEN(r)>1)*BYROW(r,LAMBDA(x,COUNTA(FILTER(r,r=x))))>4)))
 
Upvote 0
Or

Excel Formula:
=LET(r,TEXTSPLIT(TEXTJOIN(" ",,A2:A26),," "),UNIQUE(FILTER(r,(LEN(r)>1)*BYROW(r,LAMBDA(x,COUNTA(FILTER(r,r=x))))>4)))
That one would almost certainly fail due to the length limit of TEXTJOIN given the OP's data size
The data set consists of over 4,000 lines.

The first formula took almost a minute to calculate on my machine, given 4,000 rows like your sample repeated.

@Steve_Smith_1123
Could you consider a user-defined function like below (though I believe that it will not work in your MacOS environment)?

VBA Code:
Function WordList(rng As Range, MinCount As Long) As Variant
  Dim d As Object
  Dim a As Variant, itm As Variant
  Dim i As Long
  
  Set d = CreateObject("Scripting.Dictionary")
  d.CompareMode = 1
  a = rng.Value
  For i = 1 To UBound(a)
    For Each itm In Split(a(i, 1))
      d(itm) = d(itm) + 1
    Next itm
  Next i
  For Each itm In d.Keys
    If d(itm) < MinCount Or Len(itm) < 2 Then d.Remove itm
  Next itm
  WordList = Application.Transpose(d.Keys)
End Function

Steve_Smith_1123.xlsm
AB
1Data
2French Calls Getting Closed MessageFrench
3New Queue FRL_AIG UL/ROP/GP95. - AIG - 5Calls
4RoadsideAssistance_Bot_prod Lex Tuning - EN-23
5TopBox metadata in CTR - TRN-92
6Agency Sales Mailbox VM coming in as Raw Text
7Timed ACW does not appear to be working on some agents
8AWS Connect IVR enhancements for Customer Support
9798720: Post Go-Live Report Attributes: PI2 Priority 2 of 7
10RoadsideAssistance_Bot_prod Intent expansion - NewReservation_prod
11French toast
12blabla TopBox
13as;ldkfjasl;k
14adskfja;sdf
15Report like
16the calls are getting boring
17boring calls
18French cheese
19its timed well
20blabla timed
21french timed toast
22TopBox is fine
23Report
24Calls
25Calls
26French
Sheet3
Cell Formulas
RangeFormula
B2:B3B2=WordList(A2:A26,5)
Dynamic array formulas.


The above function replicates @JEC's formula in that it does not report 1-letters "words" (like "-" in your data).
It also has this flexibility requested.
occur more than a certain number of times.

Tested on the same 4,000 row data as above, this UDF took about 0.04 seconds.
 
Upvote 0
Dictionary indeed not working on Macs.
 
Upvote 0

Forum statistics

Threads
1,215,330
Messages
6,124,308
Members
449,152
Latest member
PressEscape

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top