Any clever solutions to speed up a 50,000 row MATCH formula?

d0rian

Active Member
Joined
May 30, 2015
Messages
313
Office Version
  1. 365
I have a data set of 50,000 text entries in column A. Every day a few hundred entries are added to the data set, and I paste the NEW set of entries (e.g. let's say a total of 50,275, which includes both the 50,000 from the prior day and 275 new ones) in column B.
I want the fastest way to identify the new 275 entries (which exist in column B, but not A).
The simplest method I know of is to paste a simple MATCH formula in column C, e.g. =IF(ISNUMBER(MATCH(B1,A:A,0)),"","n")
But that takes 10-12 seconds to run on 50,000+ rows.

Can anyone think of any way to accomplish this faster?
 
Thanks for the feedback. I agree data integrity is important and you don't want issues further down the track. Glad we were able to help.
 
Upvote 0

Excel Facts

Which lookup functions find a value equal or greater than the lookup value?
MATCH uses -1 to find larger value (lookup table must be sorted ZA). XLOOKUP uses 1 to find values greater and does not need to be sorted.
How a simple question turns out to reveal a lot of details on how functions actually work in Excel. And again how much knowledge is available on this forum by all the members who simply care to help out complete strangers. Hat off for that.
 
Last edited:
Upvote 0
Follow-up: while initially this solution appeared to be a God-send -- it does indeed speed up many of my functions -- my workbooks have now begun regularly crashing, generating a generic 'not enough resources' error. While I've seen this error before, it's now an almost daily occurrence. I can't think of anything that has changed other than me having substituted this new LOOKUP syntax for the previous much slower INDEX/MATCH...
Nothing about this change strikes me as the sort of thing that would render any formula more volatile or resource-intensive.....can anyone think of any reason why this binary search methodology would produce more crashes?
 

Attachments

  • resources.JPG
    resources.JPG
    19.3 KB · Views: 5
Upvote 0
Upvote 0
Any recent addins installed ??
Does it happen in ALL workbooks / sheets ?
Have you considered using a VBA solution to avoid using the formula ?

I implemented the LOOKUP in 2 of my 3 workbooks...those two workbooks now generate that resources error...the 3rd (where I have yet to make the INDEX/MATCH > LOOKUP switch does not.) Hard to say conclusively whether that's the cause because I've seen this error before...but usually no more than 1-2x / week. Now it's daily. No recent add-ins installed. (Haven't considered a VBA solution because I have a ton of somewhat complicated formulas that I'm always tinkering with and don't know VBA will enough to go that route.)
But if there's nothing that screams resource-intensiveness about the LOOKUP function to anyone on this board, I'll have to do some A/B testing and such...
 
Upvote 0
Have you considered using Power Query ?
I'm using Excel 2007, which doesn't have that, I don't believe.
Unable to solve this, I reverted to the previous INDEX/MATCH formulation, which was a bit slower, but unfortunately the insufficient-resources bug has stayed...so it's "good" in that it suggests the LOOKUP formula wasn't the cause of this, but also troubling in that I now have a weird bug I didn't have before and don't know what's causing it.

Does the following observation help diagnose this at all >> I have 6 macros that I run to import/clean data. I created a 7th "parent macro" that just runs macros 1 through 6 so I only have to click one thing. This insufficient-resources crash almost always happens with macro #6 (which is just a simple copy/paste-down of a formula, granted to 50,000 rows, but it's not THAT big a task and never used to be a problem.)

My not-so-great workaround is this: I run macros #1 through #5 (because if I ran 1-through-6 in one shot it would crash when it got to #6), save and close the Excel file, then re-open it and I can run #6 without generating the error/crash. IOW, I can run #6 when it's the first thing I do after opening the file, but if it's anything other than that, #6 generates the crash. That behavior makes me think it's some sort of memory-leak or something...does that help diagnose this?
 
Upvote 0
I have 6 macros that I run to import/clean data.
Why not use a macro to identify the new entries too?
For 50,000 rows this took 0.2 seconds for me.

VBA Code:
Sub FindNew()
  Dim d As Object
  Dim a As Variant, b As Variant
  Dim i As Long

  Set d = CreateObject("Scripting.Dictionary")
  a = Range("A1", Range("A" & Rows.Count).End(xlUp)).Value
  For i = 1 To UBound(a)
    d(a(i, 1)) = Empty
  Next i
  a = Range("B1", Range("B" & Rows.Count).End(xlUp)).Value
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a)
    If Not d.exists(a(i, 1)) Then b(i, 1) = "n"
  Next i
  Range("C1").Resize(UBound(b)).Value = b
End Sub
 
Upvote 0

Forum statistics

Threads
1,216,098
Messages
6,128,812
Members
449,468
Latest member
AGreen17

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top