Frequency of values in a huge asymmetrical range/ table

sauronbaggins

New Member
Joined
May 7, 2020
Messages
5
Office Version
2016
Platform
Windows, MacOS
Hi guys,

I have tried googling and the things that I know. However, I am not getting the solution. I have an asymmetrical data set of names. It has approx 20000 rows. Column A is never empty for any of the rows. The farthest column that holds a value in it is "HZ" (not all rows have values until HZ, meaning a LOT of the columns are empty. I made it a "Table" and tried to summarize with a pivot table but that doesn't work. What's the best way to find frequency of all the names in such a huge range?

To give you an idea of how the data looks, please see this table.

GameColumn 1Column 2Column 3
TennisBadmintonChess
Golf
TennisBaseball
BasketballTable Tennis
TennisGolf
Hockey

The output I want is this:

GameCount
Tennis3
Golf2
Hockey1
Badminton1
Baseball1
Basketball1
Chess1
Table Tennis1

If there is a way to consolidate/ transpose all the values across this range in a single column, that might also work. I do not have the list of all the unique "sport" values. The unique values run in a few thousands, so it is not possible to do it manually. I first need to create a list of unique values and then find the frequency of each of those across the range.
 

Some videos you may like

Excel Facts

Can a formula spear through sheets?
Use =SUM(January:December!E7) to sum E7 on all of the sheets from January through December

Eric W

MrExcel MVP
Joined
Aug 18, 2015
Messages
9,751
Welcome to the forum!

You'll need a macro to do this. Try this:

Open a copy of your workbook. Press Alt-F11 to open the VBA editor. Press Alt-IM to Insert a Module. Paste the following code into the window that opens:

Rich (BB code):
Sub GetUniques()
Dim ur As Variant, op() As Variant, r As Long, c As Long
Dim dict As Object, dicta As Variant, dictb As Variant

    ur = Sheets("Sheet1").UsedRange.Value
    Set dict = CreateObject("Scripting.Dictionary")
    
    For r = 1 To UBound(ur)
        For c = 1 To UBound(ur, 2)
            If ur(r, c) <> "" Then
                dict(ur(r, c)) = dict(ur(r, c)) + 1
            End If
        Next c
    Next r
    
    ReDim op(1 To dict.Count, 1 To 2)
    dicta = dict.keys
    dictb = dict.items
    For r = 1 To UBound(op)
        op(r, 1) = dicta(r - 1)
        op(r, 2) = dictb(r - 1)
    Next r
    
    Sheets("Sheet2").Range("A1").Resize(dict.Count, 2) = op
    
End Sub
Change the sheet name in red to the sheet with the data, and the sheet name and cell in blue to where you want the results to go. Press Alt-Q to close the editor.

In Excel, press Alt-F8 to open the macro selector. Select GetUniques and click Run. That should do it. Take a look and see what you think. We can ignore a header row if there is one, and we can sort the results alphabetically or by frequency too. Hope this helps.
 

sauronbaggins

New Member
Joined
May 7, 2020
Messages
5
Office Version
2016
Platform
Windows, MacOS
Welcome to the forum!

You'll need a macro to do this. Try this:

Open a copy of your workbook. Press Alt-F11 to open the VBA editor. Press Alt-IM to Insert a Module. Paste the following code into the window that opens:

Rich (BB code):
Sub GetUniques()
Dim ur As Variant, op() As Variant, r As Long, c As Long
Dim dict As Object, dicta As Variant, dictb As Variant

    ur = Sheets("Sheet1").UsedRange.Value
    Set dict = CreateObject("Scripting.Dictionary")
   
    For r = 1 To UBound(ur)
        For c = 1 To UBound(ur, 2)
            If ur(r, c) <> "" Then
                dict(ur(r, c)) = dict(ur(r, c)) + 1
            End If
        Next c
    Next r
   
    ReDim op(1 To dict.Count, 1 To 2)
    dicta = dict.keys
    dictb = dict.items
    For r = 1 To UBound(op)
        op(r, 1) = dicta(r - 1)
        op(r, 2) = dictb(r - 1)
    Next r
   
    Sheets("Sheet2").Range("A1").Resize(dict.Count, 2) = op
   
End Sub
Change the sheet name in red to the sheet with the data, and the sheet name and cell in blue to where you want the results to go. Press Alt-Q to close the editor.

In Excel, press Alt-F8 to open the macro selector. Select GetUniques and click Run. That should do it. Take a look and see what you think. We can ignore a header row if there is one, and we can sort the results alphabetically or by frequency too. Hope this helps.

This worked perfectly! Thanks so much! :)
 

mse330

Well-known Member
Joined
Oct 18, 2007
Messages
646
Office Version
365
Platform
Windows
This is an alternative code that is shorter & might be slightly faster. This will ignore header row data ...

VBA Code:
Sub CntUnique()
Dim a: a = [A1].CurrentRegion.Offset(1)
    With CreateObject("scripting.dictionary")
        For Each i In a
            If Not IsEmpty(i) Then If Not .exists(i) Then .Add i, 1 Else .Item(i) = .Item(i) + 1
        Next
        Sheets.Add(, ActiveSheet).[A1].Resize(.Count, 2) = Application.Transpose(Array(.keys, .items))
    End With
End Sub
 

Eric W

MrExcel MVP
Joined
Aug 18, 2015
Messages
9,751
This is an alternative code that is shorter & might be slightly faster. This will ignore header row data ...
It's true that your code is somewhat shorter, but it appears that the biggest difference is your use of the Application.Transpose function. I don't know if you're aware, but that function has a limit of 32K values. Given that the OP said that the sheet has over 20K rows, with multiple entries per row, I considered it possible that the 32K limit would be exceeded. That's why I eschewed the Transpose function and used a longer but more robust method.
 

mse330

Well-known Member
Joined
Oct 18, 2007
Messages
646
Office Version
365
Platform
Windows
Valid point Eric, thanks for pointing that out (y)
 

mse330

Well-known Member
Joined
Oct 18, 2007
Messages
646
Office Version
365
Platform
Windows
I don't know if you're aware, but that function has a limit of 32K values.
Hi Eric, I was just experimenting to see how would excel behave if I have data with over 32k records but it was working fine 🤔. I did some search & it seems that the limit is 65,536 which once exceeded, excel stars to act weird as it shows some records until some point (way before the 65K limit) then it shows #N/A. Just wanted to share :)
 

sauronbaggins

New Member
Joined
May 7, 2020
Messages
5
Office Version
2016
Platform
Windows, MacOS
It's true that your code is somewhat shorter, but it appears that the biggest difference is your use of the Application.Transpose function. I don't know if you're aware, but that function has a limit of 32K values. Given that the OP said that the sheet has over 20K rows, with multiple entries per row, I considered it possible that the 32K limit would be exceeded. That's why I eschewed the Transpose function and used a longer but more robust method.
Good point. For sure it would have exceeded 32k. The total unique values I got was ~51k.
 

sauronbaggins

New Member
Joined
May 7, 2020
Messages
5
Office Version
2016
Platform
Windows, MacOS
What about if we want to extract the last 2 comma separated values from right from each row?

For example, of the table is like the one below:

GameColumn 1Column 2Column 3
TennisBadmintonChess
Golf
TennisBaseball
BasketballTennisTable Tennis
TennisGolf
Hockey

The number of comma separated values in the rows is not constant, i.e., some row might have as many as 100 comma separated values and other might only have 1. Most have more than 1 though.

Now, I need to extract the first 2 (from left) and the last 2 (from right) comma separated values for all rows. Once I have that for all the rows, I will once again do a frequency calculation.

Getting the 1st 2 values is fairly easy - just "distribute to columns" and copy the 1st 2 columns. How do I get the last 2 ones from the right (given that the number of comma separated values is not constant)?
 

Watch MrExcel Video

Forum statistics

Threads
1,099,156
Messages
5,466,983
Members
406,513
Latest member
t0ny84

This Week's Hot Topics

Top