Comparing 2 arrays without looping trough their individual elements !

Jaafar Tribak

Well-known Member
Joined
Dec 5, 2002
Messages
9,638
Office Version
  1. 2016
Platform
  1. Windows
Any good ideas on how to determine if 2 arrays are identical w/o a loop. This is in order to make the comparison as quick as possible.

Regards.
 

Excel Facts

What is the shortcut key for Format Selection?
Ctrl+1 (the number one) will open the Format dialog for whatever is selected.
Found a nice trick to determine on the fly if 2 Arrays are identical w/o needing to iterate each array element.

Here is a simple custom Boolean Function that compares 2 (1 Dimentional ) arrays and return True if identical.

Code:
Private Function AreArraysIdentical _
(Arg1 As Variant, Arg2 As Variant) As Boolean
 
    Dim s1, s2 As String
    
    s1 = Join(Arg1, "")
    s2 = Join(Arg2, "")
    
    AreArraysIdentical = s1 = s2
 
End Function

Here is a Test :

Code:
Sub Test()
 
    Dim Array1, Array2, Array3 As Variant
    Array1 = Array(1, 2, 3, 4, 5, 6)
    Array2 = Array(1, 2, 3, 4, 5, 6)
    Array3 = Array(10, 2, 30, 40, 5, 65)
    
    MsgBox AreArraysIdentical(Array1, Array2)
    MsgBox AreArraysIdentical(Array1, Array3)
 
End Sub

This doesn't work for multi-dimentional arrays and doesn't return the culprit items but it can be very useful in checking if two long worksheet Ranges or even entire Columns/Rows are identical or not which AFAIK (correct me if i am wrong) can't be done w/o looping trough each Cell .

This worked for me (it returns True)

Code:
Sub Test2()
 
    Dim Array1, Array2 As Variant
    
    Range("a1:a50000") = "abcdefghijk"
    Range("b1:b50000") = "abcdefghijk"
 
    With Application.WorksheetFunction
       Array1 = .Transpose(Range("a1:a50000"))
       Array2 = .Transpose(Range("b1:b50000"))
    End With
    
    
    MsgBox AreArraysIdentical(Array1, Array2)
 
End Sub
Now just go back to the sheet and change a single letter in a random cell in one of the above ranges and if you run the above code it will immediatly detect the missing letter & return False.

Regards.
 
Upvote 0
Hi Jafaar

entire Columns/Rows are identical or not which AFAIK (correct me if i am wrong) can't be done w/o looping trough each Cell
Did you test your function on an entire column? Transposing might cause an issue?


A couple of thoughts -

You can insert formulas from VBA into the worksheet to check if the elements are identical without looping?


Or, entirely within VBA I'd be tempted with Evaluate(), something like:
Code:
Sub Test()
 
    Dim blnResult As Boolean
 
    blnResult = Sheet1.Evaluate("AND(" & _
                            Sheet1.Range("A1:A50000").Address(external:=True) & "=" & _
                            Sheet1.Range("B1:B50000").Address(external:=True) & ")")
 
    Debug.Print blnResult
 
End Sub
which is subject to CSE formula restrictions.
 
Upvote 0
Hi Jaafar,

This may be wrong, as I am WAY past when I should have hit the rack. You didn't mention having an answer and/or the one-dimension limitation, so apologies for the sloppy coding...

Again, may well be way wrong, as I am dog poop at formulas, but maybe CORREL for an "are these cols/rows identical" kind of check?

Anyways, my laptop is awfully sickly, so won't handle big transposes - but even discounting the time that takes, my results were:

Setup:
<font face=Courier New><SPAN style="color:#00007F">Option</SPAN> <SPAN style="color:#00007F">Explicit</SPAN><br>    <br><SPAN style="color:#00007F">Sub</SPAN> Setup()<br>    <SPAN style="color:#00007F">With</SPAN> ActiveSheet<br>        .Range("A1:A60000").Formula = "=ROUND(RAND()*1299,0)"<br>        .Range("A1:A60000").Value = .Range("A1:A60000").Value<br>        .Range("B1:B60000").Value = .Range("A1:A60000").Value<br>        .Range("B59999").Value = .Range("A59999").Value - 10<br>    <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN><br><SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">Sub</SPAN><br>    </FONT>

Tests:

<font face=Courier New><SPAN style="color:#00007F">Sub</SPAN> exaCORREL()<br><SPAN style="color:#00007F">Dim</SPAN> ary_1, ary_2, bolMisMatch <SPAN style="color:#00007F">As</SPAN> <SPAN style="color:#00007F">Boolean</SPAN>, Start <SPAN style="color:#00007F">As</SPAN> <SPAN style="color:#00007F">Single</SPAN>, i <SPAN style="color:#00007F">As</SPAN> <SPAN style="color:#00007F">Long</SPAN>, _<br>dblAnswer <SPAN style="color:#00007F">As</SPAN> <SPAN style="color:#00007F">Double</SPAN>, jaafar_1, jaafar_2<br>    <br><SPAN style="color:#00007F">Const</SPAN> U_LIMIT <SPAN style="color:#00007F">As</SPAN> <SPAN style="color:#00007F">Long</SPAN> = 60000<br>    <br>    <SPAN style="color:#00007F">With</SPAN> ActiveSheet<br>        ary_1 = .Range("A1:A" & U_LIMIT).Value<br>        ary_2 = .Range("B1:B" & U_LIMIT).Value<br>    <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN><br>    <br>    Start = Timer<br>    <SPAN style="color:#00007F">For</SPAN> i = <SPAN style="color:#00007F">LBound</SPAN>(ary_1) <SPAN style="color:#00007F">To</SPAN> <SPAN style="color:#00007F">UBound</SPAN>(ary_1)<br>        <SPAN style="color:#00007F">If</SPAN> <SPAN style="color:#00007F">Not</SPAN> ary_1(i, 1) = ary_2(i, 1) <SPAN style="color:#00007F">Then</SPAN><br>            bolMisMatch = <SPAN style="color:#00007F">True</SPAN><br>            <SPAN style="color:#00007F">Exit</SPAN> <SPAN style="color:#00007F">For</SPAN><br>        <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">If</SPAN><br>    <SPAN style="color:#00007F">Next</SPAN><br>    Debug.Print "Mismatch= " & bolMisMatch & " at: " & i & "  using loop thru array: " & Timer - Start<br>    <br>    Start = Timer<br>    Application.ScreenUpdating = <SPAN style="color:#00007F">False</SPAN><br>    Application.Calculation = xlCalculationManual<br>    <SPAN style="color:#00007F">With</SPAN> ActiveSheet<br>        .Range("E1:E" & U_LIMIT).Value = ary_1<br>        .Range("F1:F" & U_LIMIT).Value = ary_2<br>    <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN><br>    <br>    bolMisMatch = <SPAN style="color:#00007F">Not</SPAN> Evaluate("=CORREL(E1:E" & U_LIMIT & ",F1:F" & U_LIMIT & ")") = 1#<br>    <br>    <SPAN style="color:#00007F">With</SPAN> ActiveSheet<br>        .Range("E1:F" & U_LIMIT).Clear<br>       <SPAN style="color:#007F00">'.Range("F1:F" & U_LIMIT).Clear</SPAN><br>    <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN><br>    <br>    Application.ScreenUpdating = <SPAN style="color:#00007F">True</SPAN><br>    Application.Calculation = xlCalculationAutomatic<br>    <br>    Debug.Print "Mismatch = " & bolMisMatch & " using Evaluate and CORREL: " & Timer - Start<br>    <br>    Start = Timer<br>    Application.ScreenUpdating = <SPAN style="color:#00007F">False</SPAN><br>    Application.Calculation = xlCalculationManual<br>    <SPAN style="color:#00007F">With</SPAN> ActiveSheet<br>        .Range("E1:E" & U_LIMIT).Value = ary_1<br>        .Range("F1:F" & U_LIMIT).Value = ary_2<br>        .Range("G1").Formula = "=CORREL(E1:E" & U_LIMIT & ",F1:F" & U_LIMIT & ")"<br>        bolMisMatch = .Range("G1").Value = 1<br>        .Range("E1:G" & U_LIMIT).Clear<br>        <br>    <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN><br>    <br>    Application.ScreenUpdating = <SPAN style="color:#00007F">True</SPAN><br>    Application.Calculation = xlCalculationAutomatic<br>    <br>    Debug.Print "Mismatch= " & bolMisMatch & " using CORREL alone: " & Timer - Start<br>    <br>    <SPAN style="color:#00007F">ReDim</SPAN> jaafar_1(1 <SPAN style="color:#00007F">To</SPAN> U_LIMIT)<br>    <SPAN style="color:#00007F">ReDim</SPAN> jaafar_2(1 <SPAN style="color:#00007F">To</SPAN> U_LIMIT)<br>    <br>    <SPAN style="color:#00007F">For</SPAN> i = 1 <SPAN style="color:#00007F">To</SPAN> <SPAN style="color:#00007F">UBound</SPAN>(ary_1)<br>        jaafar_1(i) = ary_1(i, 1)<br>        jaafar_2(i) = ary_2(i, 1)<br>    <SPAN style="color:#00007F">Next</SPAN><br>        <br>    <SPAN style="color:#00007F">Dim</SPAN> Answer <SPAN style="color:#00007F">As</SPAN> <SPAN style="color:#00007F">Boolean</SPAN><br>    <br>    Start = Timer<br>    bolMisMatch = AreArraysIdentical(jaafar_1, jaafar_2)<br>    Debug.Print "Mismatch= " & bolMisMatch & " using AreArraysIdentical W/O allowing for TRanspose time: "; Timer - Start<br>    <br><SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">Sub</SPAN><br>    <br><SPAN style="color:#00007F">Private</SPAN> <SPAN style="color:#00007F">Function</SPAN> AreArraysIdentical _<br>(Arg1 <SPAN style="color:#00007F">As</SPAN> <SPAN style="color:#00007F">Variant</SPAN>, Arg2 <SPAN style="color:#00007F">As</SPAN> <SPAN style="color:#00007F">Variant</SPAN>) <SPAN style="color:#00007F">As</SPAN> <SPAN style="color:#00007F">Boolean</SPAN><br>     <br>    <SPAN style="color:#00007F">Dim</SPAN> s1, s2 <SPAN style="color:#00007F">As</SPAN> <SPAN style="color:#00007F">String</SPAN><br>    <br>    s1 = Join(Arg1, "")<br>    s2 = Join(Arg2, "")<br>    <br>    AreArraysIdentical = s1 = s2<br><SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">Function</SPAN><br></FONT>

Results:

Mismatch= True at: 59999 using loop thru array: 0.4101563
Mismatch = True using Evaluate and CORREL: 2.34375
Mismatch= False using CORREL alone: 2.052734
Mismatch= False using AreArraysIdentical W/O allowing for TRanspose time: 3.025391

Again, on a deathly ill laptop, and of course differentiation may well not be linear, but does this disagree with your results?

Will have to check tomorrow, dead beat.

Have a geat day,

Mark
 
Upvote 0
ACK! If I typed slower, I'd be a seal! (yeh, the animal w/the flippers)

Hi Colin, will try yours tomorrow (or actually later today...) :)

Mark
 
Upvote 0
Triple ACK!

Sorry all, they of course all return a mismatch - I forgot a couple of NOT's.

Code:
        bolMisMatch = Not .Range("G1").Value = 1
        .Range("E1:G" & U_LIMIT).Clear
        
    End With
    
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    
    Debug.Print "Mismatch= " & bolMisMatch & " using CORREL alone: " & Timer - Start
    
    ReDim jaafar_1(1 To U_LIMIT)
    ReDim jaafar_2(1 To U_LIMIT)
    
    For i = 1 To UBound(ary_1)
        jaafar_1(i) = ary_1(i, 1)
        jaafar_2(i) = ary_2(i, 1)
    Next
        
    Dim Answer As Boolean
    
    Start = Timer
    bolMisMatch = Not AreArraysIdentical(jaafar_1, jaafar_2)
    Debug.Print "Mismatch= " & bolMisMatch & " using AreArraysIdentical W/O allowing for TRanspose time: "; Timer - Start

...resulted in...

Mismatch= True at: 59999 using loop thru array: 0.6210938
Mismatch = True using Evaluate and CORREL: 4.546875
Mismatch= True using CORREL alone: 2.322266
Mismatch= True using AreArraysIdentical W/O allowing for TRanspose time: 2.945313
At my admittedly laymen's level (amatuer, sportsman, whatever you wish), it still seems that the "inefficient" looping of an array is markedly quicker, and, can return wherein (at least the first) glitch is...

Again, sorry for the bad test...

Mark
 
Upvote 0
Thanks Colin and Mark for the followup.

Colin:

Nice the use of the Evaluate Method. This Evaluate Method can achieve amazing things when passed string values !

I tested the Transpose function on the entire Column and yes the Transpose function causes an error but the same goes when I tried passing an entire column to your Evaluate approach ! Why would that be ?!

Also, Do you know if a similar approach (ie using the Evaluate Method) would work for pure Arrays as opposed to Ranges ?

Mark:

Never used this cool Correl Function before :)

I implemented your code which gave me this :

Mismatch= Vrai at: 59999 using loop thru array: 0,03125
Mismatch = Vrai using Evaluate and CORREL: 0,1113281
Mismatch= Vrai using CORREL alone: 0,0625
Mismatch= Vrai using AreArraysIdentical W/O allowing _
for TRanspose time: 0,1582031

I didn't expect the approch that stores the Range values in arrays and then loops through the arrays to be the fastest method compared to the other ones !

I believe the line Start = Timer should logically be placed right before the Correl Function in both 2 and 3 approaches and not before the line Application.ScreenUpdating and the processings are even faster.

Again thank you both for the input and for reminding me of the power of already existing built-in funtions/Methods which i sometimes tend to overlook.

Regards.
 
Upvote 0
Easy to say after the event of course, but as I was reading through this thread I didn't imagine any of the alternatives presented would work out quicker. Take, for instance, the Join method. Behind the scenes, the Join function has to go through each and every element of each array and make a string out of it (which is a tricky process), meaning even if the very first elements of each array are non-equal, you still have to loop through each and every element of the arrays before you can tell whether they're equal or not. So that's not going to be a time saver. Likewise (in a slightly different sense) the formula methods.

Looping isn't necessarily slow. You just have to make sure you're not using it when it's not necessary.
 
Upvote 0
Hi Jafaar
Thanks Colin and Mark for the followup.

Colin:

Nice the use of the Evaluate Method. This Evaluate Method can achieve amazing things when passed string values !

I tested the Transpose function on the entire Column and yes the Transpose function causes an error but the same goes when I tried passing an entire column to your Evaluate approach ! Why would that be ?!

Also, Do you know if a similar approach (ie using the Evaluate Method) would work for pure Arrays as opposed to Ranges ?
It's effectively evaluating this CSE formula:
{=AND(A1:A60000=B1:B60000)}

So it's subject to the same restrictions as CSE formulas - you can't use entire columns prior to Excel 2007, maximum of 2 dimensions etc...




Hi James,
Easy to say after the event of course, but as I was reading through this thread I didn't imagine any of the alternatives presented would work out quicker.
I think the loop approach is the best because it's not subject to the restrictions of the others. It's not necessarily faster than the evaluate method I posted though. If the function is built to simply determine whether or not the two ranges match then the loop can be optimised (like Mark did) to Exit once a mismatch is found (of course this means that it will not tell you where all of the differences are, only the first difference). This means that a worst case scenario for the loop will be when the data matches entirely because it will loop all the way through; the best case scenario would be that the 1st element in each array is different. The evaluate approach will be pretty much constant because it always checks all of the cells. This made a difference when I tested over 10 iterations - the Evaluate method came in consistently at ~ 0.87 secs whereas the looping method varied between 0.73 secs (1st element different) through to 1.35 secs (all elements the same).
 
Upvote 0

Forum statistics

Threads
1,215,832
Messages
6,127,151
Members
449,366
Latest member
reidel

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top