Macro for finding most frequent occuring strings in a list

nn992

New Member
Joined
Jul 28, 2016
Messages
47
Hi everyone,

I have a project where I am working with really large datasets, and I have to use excel for it.

One of the tasks to do is to extract most frequently occuring string, second most frequent string, third etc...

I know that there is a formula to do this, but I would like to keep it simpler and do it with macro..
Any experts here who can help?

Thanks in advance
 

Excel Facts

What is the shortcut key for Format Selection?
Ctrl+1 (the number one) will open the Format dialog for whatever is selected.
Here's one way of doing it.


Code:
Sub print_unique()


Const oprange = "H1"


' Lists DISTINCT list of items on spreadsheet using columns H:I in order found in source data; Col H = Item, Col I =  Frequency it appears in the source data
    
    Dim v
    
    v = getUniqueArray(Range("b2:b30"))
    If IsArray(v) Then
       Range(oprange).Resize(UBound(v), 2) = v
    End If


End Sub


Sub test()


' Takes the source data and asks for the highest 10 frequent things.
' Output is shown as a text string showing: [Item Name] is in position [Order of Frequency] (i.e. Highest = 1, Second Highest = 2 and so on) with a count of [No of times found]
' IMPORTANT NOTE: Because I used a RANGE to sort the data (rather than, say, a sort routine) the GetMFOS routine this calls uses range K:L


' This can be avoided by adding a sort routine to the array and not using the range object. Additionally, this means a UDF (User Defined Function can be used!)


    Const TempRange = "K1"
    Const oprange = "F"


    Dim OP As String
    Dim FreqCntr As Integer
    
    For FreqCntr = 1 To 10
        GetMFOS Range("b2:b30"), FreqCntr, OP, TempRange
        Range(oprange & FreqCntr).Value = OP
    Next FreqCntr




End Sub




Sub GetMFOS(ByVal StrRange As Range, ByVal FreqSought As Integer, ByRef OutputStr As String, ByVal TempRange As String)


    Dim v
    Dim r As Range
    
    v = getUniqueArray(StrRange)
    If IsArray(v) Then
    
        If FreqSought > UBound(v) Then
            OutputStr = "There are only " & UBound(v) & " items. Cannot find item at position " & FreqSought
            Exit Sub
        End If
    
    
        Set r = Range(TempRange).Resize(UBound(v), 2)
    
        For cntr = 1 To UBound(v)
            r.Cells(cntr, 1).Value = v(cntr, 1)
            r.Cells(cntr, 2).Value = v(cntr, 2)
        Next cntr
        
        r.Sort Key1:=r.Cells(1, 2), order1:=xlDescending, Header:=xlNo
    
        OutputStr = r.Cells(FreqSought, 1) & " is in position " & FreqSought & " with a count of " & r.Cells(FreqSought, 2) & " Items Found"
    


    End If




End Sub






Public Function getUniqueArray(inputRange As Range, _
                                Optional skipBlanks As Boolean = True, _
                                Optional matchCase As Boolean = True, _
                                Optional prepPrint As Boolean = True _
                                ) As Variant
               
Dim vDic As Object
Dim tArea As Range
Dim tArr As Variant, tVal As Variant, tmp As Variant
Dim noBlanks As Boolean
Dim cnt As Long
Dim CntArray()
                      
On Error GoTo exitFunc:
If inputRange Is Nothing Then GoTo exitFunc


With inputRange
    If .Cells.Count < 2 Then
        ReDim tArr(1 To 1, 1 To 1)
        tArr(1, 1) = .Value2
        getUniqueArray = tArr
        GoTo exitFunc
    End If


    Set vDic = CreateObject("scripting.dictionary")
    If Not matchCase Then vDic.compareMode = vbTextCompare
    
    noBlanks = True
    
    For Each tArea In .Areas
        tArr = tArea.Value2
        For Each tVal In tArr
            If tVal <> vbNullString Then
                vDic.Item(tVal) = vDic.Item(tVal) + 1
                If vDic.Exists(tVal) Then Debug.Print vDic(tVal)
            ElseIf noBlanks Then
                noBlanks = False
            End If
        Next
    Next
End With


If Not skipBlanks Then If Not noBlanks Then vDic.Item(vbNullString) = Empty


If prepPrint Then
    ReDim tmp(1 To vDic.Count, 1 To 2)
    For Each tVal In vDic.Keys
        cnt = cnt + 1
        tmp(cnt, 1) = tVal
    Next
    cnt = 0
    For Each tVal In vDic.items
        cnt = cnt + 1
        tmp(cnt, 2) = tVal
    Next
    getUniqueArray = tmp


End If


exitFunc:
Set vDic = Nothing
End Function




Excel 2010
ABCDEFGHIJKL
1DateReasonLocationDog Ate Postman is in position 1 with a count of 9 Items FoundExplosion6Dog Ate Postman9
222/01/2017ExplosionSaturns RingsExplosion is in position 2 with a count of 6 Items FoundDog Ate Postman9Explosion6
317/10/2017Dog Ate PostmanJoe Blogs EngineeringLate Delivery is in position 3 with a count of 5 Items FoundLate Delivery5Late Delivery5
420/09/2017ExplosionEuropaOrder Lost is in position 4 with a count of 4 Items FoundOrder Lost4Order Lost4
501/05/2017Late DeliverySaturns RingsWrong Item is in position 5 with a count of 3 Items FoundWrong Item3Wrong Item3
609/03/2017Dog Ate PostmanJoe Blogs EngineeringWrong Planet is in position 6 with a count of 2 Items FoundWrong Planet2Wrong Planet2
724/08/2017ExplosionEuropaThere are only 6 items. Cannot find item at position 7
821/06/2017Dog Ate PostmanEuropaThere are only 6 items. Cannot find item at position 8
908/09/2017Order LostBetty CaféThere are only 6 items. Cannot find item at position 9
1001/05/2017Order LostMoonThere are only 6 items. Cannot find item at position 10
1111/05/2017Order LostJoe Blogs Engineering
1225/08/2017Wrong ItemJoe Blogs Engineering
1314/05/2017Wrong PlanetMars Station
1403/04/2017Late DeliveryBetty Café
1516/05/2017ExplosionBetty Café
1625/02/2017ExplosionMoon
1721/11/2017Dog Ate PostmanSaturns Rings
1828/11/2017Wrong ItemSaturns Rings
1923/10/2017Dog Ate PostmanMars Station
2015/12/2017ExplosionSaturns Rings
2108/12/2017Wrong ItemMars Station
2213/11/2017Wrong PlanetSaturns Rings
2305/01/2017Dog Ate PostmanEuropa
2414/03/2017Dog Ate PostmanMars Station
2519/12/2017Dog Ate PostmanEuropa
2605/10/2017Late DeliveryJoe Blogs Engineering
2721/02/2017Late DeliveryEuropa
2825/05/2017Order LostEuropa
2911/05/2017Dog Ate PostmanSaturns Rings
3003/07/2017Late DeliveryMoon

<tbody>
</tbody>







There's data in columns A:C

Find the frequency of the items in column B


Run the print_unique macro and you get the output data in H:I showing a DISTINCT list of entries and the number of times each was found.



Run the test macro and you get output data in K:L (a byproduct of using a range to sort the data).. and the output (asks for ten highest frequencies), outputs in column F





Much Kudos goes to CHIRP from just over 5 years ago... whose distinct list code I amended in order to get the frequency and quantity.

I did add a link to the original code but the forum won't let me post - something about [FONT=&quot]Chrome detected unusual code on this page and blocked it to protect your personal information (for example, passwords, phone numbers and credit cards).[/FONT]
 
Upvote 0

Forum statistics

Threads
1,215,223
Messages
6,123,715
Members
449,118
Latest member
MichealRed

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top