vba for identifying strings of data and moving large sets of data

tony0217

Board Regular
Joined
Aug 31, 2012
Messages
134
i have this vba code that ive been working on since last year and i cant seem to run it without excel crashing when i add more data to it.
is there any way to tidy this up or make it run more efficiently (faster)?

i have 100 rows x 6 columns but sometimes up to 15 columns wide.

I just cant seem to speed it up. please help. thanks in advance.




Code:
Sub Permute()
Application.ScreenUpdating = False
Dim ix(100, 1) As Long, rc As Long, m As Long, br As Long, md As Variant, i As Long, r As Long
Dim str1 As String, r1 As Long, c1 As Long, element(100) As Variant


    rc = Cells(1, Columns.Count).End(xlToLeft).Column
    m = 0
    For i = 1 To rc
        br = Cells(Rows.Count, i).End(xlUp).Row
        If br > m Then m = br
        ix(i, 0) = br
        ix(i, 1) = 1
    Next i
    md = Range(Cells(1, 1), Cells(m, rc)).Value
    r = 0
Incr:
    str1 = ""
    For i = 1 To rc
        str1 = str1 & md(ix(i, 1), i)
        element(i) = md(ix(i, 1), i)
    Next i


MyCode:
    If Mid(element(1), 1, 1) = Mid(element(2), 1, 1) And _
    Mid(element(1), 1, 1) = Mid(element(3), 1, 1) And _
    Mid(element(2), 1, 1) = Mid(element(3), 1, 1) And _
    Mid(element(1), 2, 1) = Mid(element(4), 1, 1) And _
    Mid(element(1), 2, 1) = Mid(element(5), 1, 1) And _
    Mid(element(4), 1, 1) = Mid(element(5), 1, 1) And _
    Mid(element(2), 2, 1) = Mid(element(4), 2, 1) And _
    Mid(element(2), 2, 1) = Mid(element(6), 1, 1) And _
    Mid(element(4), 2, 1) = Mid(element(6), 1, 1) And _
    Mid(element(3), 2, 1) = Mid(element(5), 2, 1) And _
    Mid(element(3), 2, 1) = Mid(element(6), 2, 1) And _
    Mid(element(5), 2, 1) = Mid(element(6), 2, 1) Then _


        r = r + 1
        r1 = ((r - 1) Mod Rows.Count) + 1
        c1 = Int((r - 1) / Rows.Count) + 1
        Sheets("Sheet2").Cells(r1, c1) = str1
    End If
    
    For i = rc To 1 Step -1
        ix(i, 1) = ix(i, 1) + 1
        If ix(i, 1) <= ix(i, 0) Then Exit For
        ix(i, 1) = 1
    Next i
    If i > 0 Then GoTo Incr:
    
End Sub
 
Your latest explanation does clarify a bit, but I have some more questions (though I am not confident that I will be able to speed this significantly)

1. There seems to be no pattern about which character positions have to match, so when you say your data could be up to 15 columns wide,how would we know what characters had to match if the data was 7 columns wide or 15 columns wide or 2 columns wide?

2. "the o's are special characters separate from the zeroes".
a) Could cells that contain that character still form part of a successful permutation? For example, still with 6 columns, would "o7o2o6727626" count as a successful permutation?
b) Is it possible that a cell can contain 2 of those characters (eg "oo") or a digit first then that character (eg "7o")?

3. "and for some reason i could never seem to trim enough time off of it"
Did my suggestion trim a significant amount of time off it? Is it trimmed "enough"?
 
Upvote 0

Excel Facts

Who is Mr Spreadsheet?
Author John Walkenbach was Mr Spreadsheet until his retirement in June 2019.
1. There seems to be no pattern about which character positions have to match, ...
Actually I think I can see the pattern now, but your answer to this question (& the other two) could help confirm or clarify my thoughts.
 
Last edited:
Upvote 0
Your latest explanation does clarify a bit, but I have some more questions (though I am not confident that I will be able to speed this significantly)

1. There seems to be no pattern about which character positions have to match, so when you say your data could be up to 15 columns wide,how would we know what characters had to match if the data was 7 columns wide or 15 columns wide or 2 columns wide?

2. "the o's are special characters separate from the zeroes".
a) Could cells that contain that character still form part of a successful permutation? For example, still with 6 columns, would "o7o2o6727626" count as a successful permutation?
b) Is it possible that a cell can contain 2 of those characters (eg "oo") or a digit first then that character (eg "7o")?

3. "and for some reason i could never seem to trim enough time off of it"
Did my suggestion trim a significant amount of time off it? Is it trimmed "enough"?



1. I know how to scale it when i add rows, so for now we can just use the 6 column wide version.
2a. i realized that the zeros were all supposed to be changed to O's. this is because i cant get the cells to format correctly.
2b. it should not, no.

3. i ran the code on my end and it is faster but takes about the same amount of time. it took my system 12 minutes.
 
Upvote 0
1. I know how to scale it ... so for now we can just use the 6 column wide version.
Hmm, we'll see. :biggrin:
Given that you didn't provide details of how the matching would be done for any greater (or lesser) number of columns, I'm not even sure whether the approach I have taken below would be easily scaleable or not.


3. i ran the code on my end and it is faster but takes about the same amount of time. it took my system 12 minutes.
OK, I think some significant progress in relation to speed!
The code below processed the post #14 data in 0.016 seconds.
I generated 100 rows x 6 columns of random numbers from 10 to 99 and the code processed that (finding about 11,000 successful strings) in less than 6 seconds.

Also note that this test code ..
- Produces the results on the same sheet in column J and of course if satisfied you could easily change the output location to Sheet2
- Includes in parentheses after the successful string the row numbers that the values came from. For example, for the post #14 data it produces
272226727626 (7|5|5|4|9|7)
reflecting the source rows, column by column, of the successful string. I used that for checking & can easily be removed (blue code).
- If multiple identical successful strings are found, includes them all in the results. I don't know if it is possible with your data and, if so, whether you would want such a string listed multiple times or only once. Fairly easy to include a fast check and only list once if required. Post back if you need this)
- Will fail if there are more than about 65,000 successful strings found. If you think that might be possible, again post back for a modification.

Anyway, give this a try with some sample data.
Rich (BB code):
Sub permut_v4()
  Dim lrc(1 To 6) As Long
  Dim a As Long, b As Long, c As Long, d As Long, e As Long, f As Long, y As Long, z As Long, lr As Long
  Dim Data As Variant, Result As Variant
  Dim RX As Object
  Dim s As String
  
  Const P2 As String = "(.)(.)\1(.)"
  Const P3 As String = "(.)(.)\1(.)\1(.)"
  Const P4 As String = "(.)(.)\1(.)\1(.)\2\3"
  Const P5 As String = "(.)(.)\1(.)\1(.)\2\3\2\4"
  Const P6 As String = "(.)(.)\1(.)\1(.)\2\3\2\4\3\4"

  Set RX = CreateObject("VBScript.Regexp")
  lr = Columns("A:F").Find(What:="*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
  Data = Range("A1:F" & lr).Value
  For y = 1 To UBound(Data, 2)
    lrc(y) = Cells(Rows.Count, y).End(xlUp).Row
  Next y
  ReDim Result(0 To 0)
  For a = 1 To lrc(1)
    s = Data(a, 1)
    RX.Pattern = P2
    For b = 1 To lrc(2)
      If RX.test(s & Data(b, 2)) Then
        s = s & Data(b, 2)
        RX.Pattern = P3
        For c = 1 To lrc(3)
          If RX.test(s & Data(c, 3)) Then
            s = s & Data(c, 3)
            RX.Pattern = P4
            For d = 1 To lrc(4)
              If RX.test(s & Data(d, 4)) Then
                s = s & Data(d, 4)
                RX.Pattern = P5
                For e = 1 To lrc(5)
                  If RX.test(s & Data(e, 5)) Then
                    s = s & Data(e, 5)
                    RX.Pattern = P6
                    For f = 1 To lrc(6)
                      If RX.test(s & Data(f, 6)) Then
                        z = z + 1
                        ReDim Preserve Result(1 To z)
                        Result(z) = s & Data(f, 6) & " (" & Join(Array(a, b, c, d, e, f), "|") & ")"
                      End If
                    Next f
                    s = Left(s, 8)
                    RX.Pattern = P5
                  End If
                Next e
                s = Left(s, 6)
                RX.Pattern = P4
              End If
            Next d
            s = Left(s, 4)
            RX.Pattern = P3
          End If
        Next c
        s = Left(s, 2)
        RX.Pattern = P2
      End If
    Next b
  Next a
  If z > 0 Then
    Range("J2").Resize(z).Value = Application.Transpose(Result)
  End If
End Sub
 
Last edited:
Upvote 0
first of all I must say, YIKES! that is super fast, sir. perfecly executed!


so now that i have that out of the way... how can i scale it to add a column? for the 10 column wide one?


still 100 rows, but 10 columns wide.


the matches would be as follows..














ABACADAEBCBDBECDCEDE
columnmatch 1st halfmatch 2nd halfcolumnletter matched
AB1leftleft2ACA
AB1leftleft3ADA
AB1leftleft4AEA
AC2leftleft3ADA
AC2leftleft4AEA
AD3leftleft4AEA
AB1rightleft5BCB
AB1rightleft6BDB
AB1rightleft7BEB
BC5leftleft6BDB
BC5leftleft7BEB
BD6leftleft7BEB
AC2rightright5BCC
AC2rightleft8CDC
AC2rightleft9CEC
BC5rightleft8CDC
BC5rightleft9CEC
CD8leftleft9CEC
BD6rightright8CDD
BD6rightleft10DED
CD8rightleft10DED
AD3rightright6BDD
AD3rightright8CDD
AD3rightleft10DED
AE4rightright7BEE
AE4rightright9CEE
AE4rightright10DEE
BE7rightright9CEE
BE7rightright10DEE
CE9rightright10DEE

<colgroup><col width="64" span="10" style="width:48pt"> </colgroup><tbody>
</tbody>
 
Upvote 0
first of all I must say, YIKES! that is super fast, sir. perfecly executed!
Thank you. :)
I assume then that you got similar timing results to me?


... how can i scale it to add a column? for the 10 column wide one?
still 100 rows, but 10 columns wide.
Yes, I thought the scaling question would come. ;)

I can't see a way to make the scaling really simple. It took me a reasonable amount of manual changes which require great attention to detail - I made a number of mistakes along the way and had to revisit the code to find the errors & fix. However, I will give you some guidelines.
1. There are some fairly obvious changes like changing 6 to 10, columns A:F to A:J, introducing more variables (a-j instead of only a-f for the column loops)
2. You need to add/remove sections like the blue ones below, being very careful with the variables and numbers (highlighted red)
3. You need to add/remove 'Const' lines which are the Regular Expression patterns. Could be tricky to explain, but let's try. In those patterns, in relation to this problem, (.) represents the first time (position) that one of the matching characters appears so, looking at the P5 pattern as an example:
"(.)(.)\1(.)\1(.)\1(.)\2\3" is like
" A B \1 C ..."
A backslash followed by a number refers to the relevant earlier occurrence. So \1 refers back to the 1st (.), or A in our case. \3 refers back to the 3rd (.) or C in our case.
Altogether then, this P5, which checks a string formed by a number from each of the first five columns, is really saying
"ABACADAEBC"
Hopefully that would be enough information for you if you needed to build up different patterns.


Other notes:
a) This increase in columns means (potentially) a massive increase in the number of checks required and hence the (potential) running time of the code.
For example, if there are a lot of "near misses" caused by finding successful strings from, say, the first 8 columns but then failing, the code has had to do a lot more work that if a "fail" occurs early.
Take the small sample data below, which has just one successful string from the cells highlighted green.
If we start at the top left and look at the first 2 cells in that data: "8611". That fails the very first test for the "A" position so there is no need to worry about that combination with the other 8 columns. That is a saving of up to 390625 (5^8) tests.
With 100 rows, a fail with any col A value and col B value saves testing up to about a further 10,000,000,000,000,000 combinations!!
b) Sample timing. With this code and 100 rows of random 0-99, the code found about 202,000 successful strings in about 3 minutes.


Book1
ABCDEFGHIJ
186118535211560757013
245873580676521414315
338384244262314131350
420473925132916231824
544242829172916122938
6
Sheet10 (2)


Anyway, here is my 10-col code which I have also modified to fix the problem if more than 65,000 results are found. It now would return the results to the worksheet in blocks of 1,000,000 rows.

Rich (BB code):
Sub permut_v5_10_Cols()
  Dim lrc(1 To 10) As Long
  Dim a As Long, b As Long, c As Long, d As Long, e As Long, f As Long, g As Long, h As Long, i As Long, j As Long
  Dim y As Long, z As Long, lr As Long
  Dim Data As Variant, Temp As Variant, Result As Variant
  Dim RX As Object
  Dim s As String
  
  Const P2 As String = "(.)(.)\1(.)"
  Const P3 As String = "(.)(.)\1(.)\1(.)"
  Const P4 As String = "(.)(.)\1(.)\1(.)\1(.)"
  Const P5 As String = "(.)(.)\1(.)\1(.)\1(.)\2\3"
  Const P6 As String = "(.)(.)\1(.)\1(.)\1(.)\2\3\2\4"
  Const P7 As String = "(.)(.)\1(.)\1(.)\1(.)\2\3\2\4\2\5"
  Const P8 As String = "(.)(.)\1(.)\1(.)\1(.)\2\3\2\4\2\5\3\4"
  Const P9 As String = "(.)(.)\1(.)\1(.)\1(.)\2\3\2\4\2\5\3\4\3\5"
  Const P10 As String = "(.)(.)\1(.)\1(.)\1(.)\2\3\2\4\2\5\3\4\3\5\4\5"

  Set RX = CreateObject("VBScript.Regexp")
  lr = Columns("A:J").Find(What:="*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
  Data = Range("A1:J" & lr).Value
  For y = 1 To UBound(Data, 2)
    lrc(y) = Cells(Rows.Count, y).End(xlUp).Row
  Next y
  ReDim Temp(0 To 0)
  
  For a = 1 To lrc(1)
    s = Data(a, 1)
    RX.Pattern = P2
    For b = 1 To lrc(2)
      If RX.Test(s & Data(b, 2)) Then
        s = s & Data(b, 2)
        RX.Pattern = P3
        For c = 1 To lrc(3)
          If RX.Test(s & Data(c, 3)) Then
            s = s & Data(c, 3)
            RX.Pattern = P4
            For d = 1 To lrc(4)
              If RX.Test(s & Data(d, 4)) Then
                s = s & Data(d, 4)
                RX.Pattern = P5
                For e = 1 To lrc(5)
                  If RX.Test(s & Data(e, 5)) Then
                    s = s & Data(e, 5)
                    RX.Pattern = P6
                    For f = 1 To lrc(6)
                      If RX.Test(s & Data(f, 6)) Then
                        s = s & Data(f, 6)
                        RX.Pattern = P7
                        For g = 1 To lrc(7)
                          If RX.Test(s & Data(g, 7)) Then
                            s = s & Data(g, 7)
                            RX.Pattern = P8
                            For h = 1 To lrc(8)
                              If RX.Test(s & Data(h, 8)) Then
                                s = s & Data(h, 8)
                                RX.Pattern = P9
                                For i = 1 To lrc(9)
                                  If RX.Test(s & Data(i, 9)) Then
                                    s = s & Data(i, 9)
                                    RX.Pattern = P10
                                    For j = 1 To lrc(10)
                                      If RX.Test(s & Data(j, 10)) Then
                            
                                        z = z + 1
                                        ReDim Preserve Temp(1 To z)
                                        Temp(z) = s & Data(j, 10) & " (" & Join(Array(a, b, c, d, e, f, g, h, i, j), "|") & ")"
                                        
                                      End If
                                    Next j
                                    s = Left(s, 16)
                                    RX.Pattern = P9
                                  End If
                                Next i
                                s = Left(s, 14)
                                RX.Pattern = P8
                              End If
                            Next h
                            s = Left(s, 12)
                            RX.Pattern = P7
                          End If
                        Next g
                        s = Left(s, 10)
                        RX.Pattern = P6
                      End If
                    Next f
                    s = Left(s, 8)
                    RX.Pattern = P5
                  End If
                Next e
                s = Left(s, 6)
                RX.Pattern = P4
              End If
            Next d
            s = Left(s, 4)
            RX.Pattern = P3
          End If
        Next c
        s = Left(s, 2)
        RX.Pattern = P2
      End If
    Next b
  Next a
 
  If z > 0 Then
    ReDim Result(1 To 1000000, 1 To Int(z / 1000000) + 1)
    a = 1: b = 1
    For y = 1 To z
      Result(a, b) = Temp(y)
      a = a + 1
      If b > UBound(Result) Then
        a = 1: b = b + 1
      End If
    Next y
    Range("N2").Resize(UBound(Result, 1), UBound(Result, 2)).Value = Result
  Else
    Range("N2").Value = "No results"
  End If
End Sub
 
Last edited:
Upvote 0
This code is truly a work of art.
I dont think I could ever thank you enough, sir.
can I just ask you, How long have you been working with these sorts of code?
I only ask because I, someday would like to be as good as you are with this.

Thank you again for taking the time to make this possible. :)
 
Upvote 0
You are very welcome. Thanks for your kind comments. :)
It has been an interesting problem.

You will see from the details at the left of my post that I joined the forum in 2005. The reason that I joined was that I had at that time just been introduced to vba, knew nothing about it, but wanted to learn. Most of what I now know about vba has come from my participation in this forum over the intervening 13 years.

In the interest of tidiness I am posting yet another version of the code. I didn't like how, to cope with the 65,000+ limit on Application.Transpose, I had put all the results in a 1-dimensional array and then had to rearrange into a 2-dimensional array.
This code puts the results straight into a 2-D array, saving that rearrangement. The main changes are highlighted but there are a few other small consequential changes near the top of the code as well.
Rich (BB code):
Sub permut_v6_10_Cols()
  Dim lrc(1 To 10) As Long
  Dim a As Long, b As Long, c As Long, d As Long, e As Long, f As Long, g As Long, h As Long, i As Long, j As Long
  Dim rw As Long, col As Long, y As Long, lr As Long
  Dim Data As Variant, Result As Variant
  Dim RX As Object
  Dim s As String
  
  Const P2 As String = "(.)(.)\1(.)"
  Const P3 As String = "(.)(.)\1(.)\1(.)"
  Const P4 As String = "(.)(.)\1(.)\1(.)\1(.)"
  Const P5 As String = "(.)(.)\1(.)\1(.)\1(.)\2\3"
  Const P6 As String = "(.)(.)\1(.)\1(.)\1(.)\2\3\2\4"
  Const P7 As String = "(.)(.)\1(.)\1(.)\1(.)\2\3\2\4\2\5"
  Const P8 As String = "(.)(.)\1(.)\1(.)\1(.)\2\3\2\4\2\5\3\4"
  Const P9 As String = "(.)(.)\1(.)\1(.)\1(.)\2\3\2\4\2\5\3\4\3\5"
  Const P10 As String = "(.)(.)\1(.)\1(.)\1(.)\2\3\2\4\2\5\3\4\3\5\4\5"

  Set RX = CreateObject("VBScript.Regexp")
  lr = Columns("A:J").Find(What:="*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
  Data = Range("A1:J" & lr).Value
  ReDim Result(1 To 1000000, 1 To 1)
  For y = 1 To UBound(Data, 2)
    lrc(y) = Cells(Rows.Count, y).End(xlUp).Row
  Next y
  col = 1
  For a = 1 To lrc(1)
    s = Data(a, 1)
    RX.Pattern = P2
    For b = 1 To lrc(2)
      If RX.Test(s & Data(b, 2)) Then
        s = s & Data(b, 2)
        RX.Pattern = P3
        For c = 1 To lrc(3)
          If RX.Test(s & Data(c, 3)) Then
            s = s & Data(c, 3)
            RX.Pattern = P4
            For d = 1 To lrc(4)
              If RX.Test(s & Data(d, 4)) Then
                s = s & Data(d, 4)
                RX.Pattern = P5
                For e = 1 To lrc(5)
                  If RX.Test(s & Data(e, 5)) Then
                    s = s & Data(e, 5)
                    RX.Pattern = P6
                    For f = 1 To lrc(6)
                      If RX.Test(s & Data(f, 6)) Then
                        s = s & Data(f, 6)
                        RX.Pattern = P7
                        For g = 1 To lrc(7)
                          If RX.Test(s & Data(g, 7)) Then
                            s = s & Data(g, 7)
                            RX.Pattern = P8
                            For h = 1 To lrc(8)
                              If RX.Test(s & Data(h, 8)) Then
                                s = s & Data(h, 8)
                                RX.Pattern = P9
                                For i = 1 To lrc(9)
                                  If RX.Test(s & Data(i, 9)) Then
                                    s = s & Data(i, 9)
                                    RX.Pattern = P10
                                    For j = 1 To lrc(10)
                                      If RX.Test(s & Data(j, 10)) Then
                            
                                        rw = rw + 1
                                        If rw > UBound(Result) Then
                                          rw = 1: col = col + 1
                                          ReDim Preserve Result(1 To UBound(Result), 1 To col)
                                        End If
                                        Result(rw, col) = s & Data(j, 10) & " (" & Join(Array(a, b, c, d, e, f, g, h, i, j), "|") & ")"
                                        
                                      End If
                                    Next j
                                    s = Left(s, 16)
                                    RX.Pattern = P9
                                  End If
                                Next i
                                s = Left(s, 14)
                                RX.Pattern = P8
                              End If
                            Next h
                            s = Left(s, 12)
                            RX.Pattern = P7
                          End If
                        Next g
                        s = Left(s, 10)
                        RX.Pattern = P6
                      End If
                    Next f
                    s = Left(s, 8)
                    RX.Pattern = P5
                  End If
                Next e
                s = Left(s, 6)
                RX.Pattern = P4
              End If
            Next d
            s = Left(s, 4)
            RX.Pattern = P3
          End If
        Next c
        s = Left(s, 2)
        RX.Pattern = P2
      End If
    Next b
  Next a
 
  If rw > 0 Then
    Range("N2").Resize(UBound(Result, 1), UBound(Result, 2)).Value = Result
  Else
    Range("N2").Value = "No results"
  End If
End Sub
 
Upvote 0

Forum statistics

Threads
1,215,106
Messages
6,123,124
Members
449,097
Latest member
mlckr

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top