Extracting unique data from Large Datasets - Excel Magic Trick 698

Chris Pucknell

New Member
Joined
Oct 31, 2014
Messages
11
Hey

I couldn't find an answer for this so thought I'd sign up!

As you can guess, i've been trying out the method in this video: https://www.youtube.com/watch?v=IhuURsu0jdI

A series of formulas for extracting unique data, compiling it into a list, and then using this list to make a dynamicly updating DV List.

It works just great. The trouble is, when it is applied to large data sets, it seriously ties up the cpu.

My question is how can I reduce or eliminate this problem? What elements can be altered or removed to improve performance?

The below formula extracts a list of unique names in sequence. From there it's fairly simple to make the DV List.

=IF(ROWS(B$13:B13)>$C$13,"",INDEX(User_Name_Extract,
SMALL(IF(FREQUENCY(IF(User_Name_Extract<>"",
MATCH(User_Name_Extract,User_Name_Extract&"",0)),
ROW(User_Name_Extract)-ROW('Data Source'!$A$2)+1),ROW(User_Name_Extract)-ROW('Data Source'!$A$2)+1),ROWS(B$13:B13))))

$C$13 contains a unique count formula.

B$13:B13 is the position on the list (so if there were only 10 unique names you wanted to extract, you could get away with dragging it to B$13:B23).

"User_Name_Extract" is the range, which in my case is 32000 rows xD

You can probably get a clearer understanding from the video than I can give. Is there any solution when applying this to such large datasets? And what particularly is causing the trouble?

Any help would be grand :)
 
Oh don't worry I'm getting my hands dirty!

I'm trying out the first one first, but I'll do both and time them against eachother on this low spec work computer.

Have run into trouble.

I've defined User_Name_Extract:
Code:
='Data Source'!A2:INDEX('Data Source'!A2:A32001,COUNTA('Data Source'!A2:A32001))

[...]

So where have I gone wrong here? I'm guessing something to do with the User_Name_Extract.

If Data Source!A2:A32001 is text, define User_Name_Extract as referring to:

='Data Source'!$A$2:INDEX('Data Source'!$A:$A,MATCH(REPT("z",255),'Data Source'!$A:$A))

And follow the set up from here on.

Note that the 2nd scheme will be way faster.
 
Upvote 0

Excel Facts

Format cells as time
Select range and press Ctrl+Shift+2 to format cells as time. (Shift 2 is the @ sign).
@Chris Pucknell

1.

Better to define Ivec by means of Formulas | Name Manager as referring to:
Rich (BB code):
=ROW(User_Name_Extract)-ROW(INDEX(User_Name_Extract,1,1))+1


[/code]

Hi, this is a teriffic thread (which i'm trying hard to follow and absorb) however, I have become stuck at Ivec.... can somebody please explain Ivec as i'm not sure what on earth it is or where it fits in to all this as i've never heard this term before!

Many thanks
 
Upvote 0
Hi, this is a teriffic thread (which i'm trying hard to follow and absorb) however, I have become stuck at Ivec.... can somebody please explain Ivec as i'm not sure what on earth it is or where it fits in to all this as i've never heard this term before!

Many thanks

It defines an integer vector, running from 1 to the size of the range of interest. Given that User_Name_Extract already refers to a range of interest, we can define Ivec as:

=ROW(User_Name_Extract)-ROW(INDEX(User_Name_Extract,1,1))+1

which will refer to integer vector that runs

{1;2;3;...;SIZEOF[User_Name_Extract]}.

See post #16, where it is used in a formula set up to list the unique entries from the target rage User_Name_Extract.
 
Upvote 0
Tried your fix Aladdin, works fine! and as you predicted method 1. doesn't seem to shave a lot of time off the original setup from the video.

that part REPT("z",255)... this is a way of matching text?

Will have to give #2 a bash when I get some time!

Cheers
 
Upvote 0
Tried your fix Aladdin, works fine! and as you predicted method 1. doesn't seem to shave a lot of time off the original setup from the video.

Just for the record: I'm the author of the set up.

that part REPT("z",255)... this is a way of matching text?

If the data has no numeric range, the fore going is applied to a text range in order to determine the last row where data is.

Will have to give #2 a bash when I get some time!

Also a set up I devised. This trades off memory against speed, i.e., spend memory in order to get better temporal performance.

Cheers[/QUOTE]
 
Upvote 0
If speed is crucial:

Code:
Sub M_unique()
    sn = Columns(1).SpecialCells(2)
    
    With CreateObject("scripting.dictionary")
        For j = 1 To UBound(sn)
          x0 = .Item(sn(j, 1))
        Next
        Cells(1, 2).Resize(.Count) = Application.Transpose(.keys)
    End With
End Sub
 
Upvote 0

Forum statistics

Threads
1,216,099
Messages
6,128,820
Members
449,469
Latest member
Kingwi11y

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top