Randomize a Column & Identify a Word

rwmill9716

Active Member
Joined
May 20, 2006
Messages
493
Office Version
  1. 2013
Platform
  1. Windows
I have a set of letters in Column A. That Column has been randomized in Column B where the word 'Ric' from Cell C1 is found. I need a macro that will randomize Column A 1000 times and count how many times 'Ric' is spelled out.
 

Attachments

  • Random.jpg
    Random.jpg
    15.8 KB · Views: 17
A2:A4993 consists of 192 sets of A-Z, sequentially arranged as each set goes from A to Z and then repeats (referred to as the "List").
I tested with n=100 iterations. Adjustable in code.
For each test iteration:
  • This "List" will shuffle in position, but all items remains unchanged (192 A's, 192 B's,..., 192 Z's).
  • Afterward, it looks for sequences of R, I, C, or C, I, R (a condition sequence of any length specified in cell F1).
After n test iterations, the results will be in the range E:H.
If you want to know the specific results of each test iteration, they can be found in the columns to the right, starting from column M.


VBA Code:
Option Explicit
Sub words()
Dim lr&, i&, j&, k&, r, rng, arr(), res(), word As String, n&, c&, t&, st As String, count&
Dim dic As Object
Set dic = CreateObject("Scripting.Dictionary")
word = Range("F1").Value ' word criteria
n = 100 ' running time
ReDim res(1 To n, 1 To 3)
lr = Cells(Rows.count, "A").End(xlUp).Row
rng = Range("A2:A" & lr).Value ' given list in col A
Range("F3:H100000").ClearContents
Range("N2:XX100000").ClearContents
Randomize
Do
    ReDim arr(1 To UBound(rng), 1 To 1)
    c = c + 1: k = 0: count = 0
    Do
        r = Int(Rnd * UBound(rng)) + 1
        If Not dic.exists(r) Then
            dic.Add r, ""
            k = k + 1: arr(k, 1) = rng(r, 1)
        End If
    Loop Until k >= UBound(rng)
    dic.RemoveAll
    For i = 1 To UBound(arr)
        st = ""
        If arr(i, 1) = Left(word, 1) Then
            On Error Resume Next
            For j = 0 To Len(word) - 1
                st = st & arr(i + j, 1)
            Next
            On Error GoTo 0
            If st = word Then count = count + 1
        End If
        If arr(i, 1) = Right(word, 1) Then
            On Error Resume Next
            For j = Len(word) - 1 To 0 Step -1
                st = st & arr(i - j, 1)
            Next
            On Error GoTo 0
            If st = word Then count = count + 1
        End If
        If st = word Then res(c, 1) = count: res(c, 2) = st: res(c, 3) = i
    Next
    If count > 0 Then
        t = t + 1
        With Range("XX3").End(xlToLeft).Offset(0, 1)
            .Offset(-2, 0).Value = t
            .Offset(-1, 0).Value = count
            .Resize(UBound(arr), 1).Value = arr
        End With
    End If
Loop Until c >= n
Range("F3").Resize(n, 3).Value = res
MsgBox "Finish! match found: " & t
End Sub

Capture.JPG
 
Upvote 1
Solution

Excel Facts

Does the VLOOKUP table have to be sorted?
No! when you are using an exact match, the VLOOKUP table can be in any order. Best-selling items at the top is actually the best.
A2:A4993 consists of 192 sets of A-Z, sequentially arranged as each set goes from A to Z and then repeats (referred to as the "List").
I tested with n=100 iterations. Adjustable in code.
For each test iteration:
  • This "List" will shuffle in position, but all items remains unchanged (192 A's, 192 B's,..., 192 Z's).
  • Afterward, it looks for sequences of R, I, C, or C, I, R (a condition sequence of any length specified in cell F1).
After n test iterations, the results will be in the range E:H.
If you want to know the specific results of each test iteration, they can be found in the columns to the right, starting from column M.


VBA Code:
Option Explicit
Sub words()
Dim lr&, i&, j&, k&, r, rng, arr(), res(), word As String, n&, c&, t&, st As String, count&
Dim dic As Object
Set dic = CreateObject("Scripting.Dictionary")
word = Range("F1").Value ' word criteria
n = 100 ' running time
ReDim res(1 To n, 1 To 3)
lr = Cells(Rows.count, "A").End(xlUp).Row
rng = Range("A2:A" & lr).Value ' given list in col A
Range("F3:H100000").ClearContents
Range("N2:XX100000").ClearContents
Randomize
Do
    ReDim arr(1 To UBound(rng), 1 To 1)
    c = c + 1: k = 0: count = 0
    Do
        r = Int(Rnd * UBound(rng)) + 1
        If Not dic.exists(r) Then
            dic.Add r, ""
            k = k + 1: arr(k, 1) = rng(r, 1)
        End If
    Loop Until k >= UBound(rng)
    dic.RemoveAll
    For i = 1 To UBound(arr)
        st = ""
        If arr(i, 1) = Left(word, 1) Then
            On Error Resume Next
            For j = 0 To Len(word) - 1
                st = st & arr(i + j, 1)
            Next
            On Error GoTo 0
            If st = word Then count = count + 1
        End If
        If arr(i, 1) = Right(word, 1) Then
            On Error Resume Next
            For j = Len(word) - 1 To 0 Step -1
                st = st & arr(i - j, 1)
            Next
            On Error GoTo 0
            If st = word Then count = count + 1
        End If
        If st = word Then res(c, 1) = count: res(c, 2) = st: res(c, 3) = i
    Next
    If count > 0 Then
        t = t + 1
        With Range("XX3").End(xlToLeft).Offset(0, 1)
            .Offset(-2, 0).Value = t
            .Offset(-1, 0).Value = count
            .Resize(UBound(arr), 1).Value = arr
        End With
    End If
Loop Until c >= n
Range("F3").Resize(n, 3).Value = res
MsgBox "Finish! match found: " & t
End Sub

View attachment 97964
BEBO,
Thanks for your excellent work here.

I have a few questions for clarification:

1. What does the count in Column F and Row 2 (Cols N to DI mean? Why is it always an even number? In column N, for instance, I see only one occurrence, yet the count is 2.
2. I assume that each column (M and beyond) is a randomization of Column A, one that contains a match? Could I increase the size of Column A? Say I wanted to double it; what changes would be necessary in the code?
3. In your test case above, exactly how many randomized columns were analyzed? As I read your example, would I have found "ALL" or "LLA" 18 times in 100 randomizations 5000 letters, Correct? That would be: the 3 letter word "ALL" or its inverse in a random sequence of 500,000 letters.
4. When I run the program, the spinning wheel continues after your message pops up the count of matches. If I hit "Ok" for that message, the spinning wheel stops. I assume there's no reason to just wait until it stops spinning.
 
Upvote 0
BEBO,
Thanks for your excellent work here.

I have a few questions for clarification:

1. What does the count in Column F and Row 2 (Cols N to DI mean? Why is it always an even number? In column N, for instance, I see only one occurrence, yet the count is 2.
2. I assume that each column (M and beyond) is a randomization of Column A, one that contains a match? Could I increase the size of Column A? Say I wanted to double it; what changes would be necessary in the code?
3. In your test case above, exactly how many randomized columns were analyzed? As I read your example, would I have found "ALL" or "LLA" 18 times in 100 randomizations 5000 letters, Correct? That would be: the 3 letter word "ALL" or its inverse in a random sequence of 500,000 letters.
4. When I run the program, the spinning wheel continues after your message pops up the count of matches. If I hit "Ok" for that message, the spinning wheel stops. I assume there's no reason to just wait until it stops spinning.
I see that the count is counting both the word and its inverse, i.e., the pair will always be there.
 
Upvote 0
I've made some adjustments to the code, specifically in the word reversal part, and it's now working fine.

Column E represents the test number, ranging from 1 to 100.Column F counts the occurrences of either RIC or CIR in that specific test.Column G contains the found word. If there are two or more words, it belongs to the 1st found. Column H indicates the row number where the word was 1st found.

Starting from column N, there's a detailed breakdown of the random characters in each test where the word was discovered.

In the file, I'm currently testing for a random occurrence in test 11. It's found twice (E13:H13) on row 2486.The details of the character sets are in column O. Column M contains the string concatenation formula.

You can filter column M with two conditions: contains RIC or CIR, which will give you the two rows with RIC or CIR. (manually mark with "x" in column L)

 
Upvote 0
BEBO,
Thanks for your excellent work here.

I have a few questions for clarification:
Hello

1. What does the count in Column F and Row 2 (Cols N to DI mean? Why is it always an even number? In column N, for instance, I see only one occurrence, yet the count is 2.
Even number: Fixed in previous post.

2. I assume that each column (M and beyond) is a randomization of Column A, one that contains a match?
Yes.

- Could I increase the size of Column A? Say I wanted to double it; what changes would be necessary in the code?
Just copy/ paste/ or manual input more into column A
In the code, "lr" (Last row) to update last row col A automatically

3. In your test case above, exactly how many randomized columns were analyzed?
in example, would be 18 within 100 test

- As I read your example, would I have found "ALL" or "LLA" 18 times in 100 randomizations 5000 letters, Correct?
Correct. 192 A's, 192 B's,..., 192 A's
= 192*26=4992

That would be: the 3 letter word "ALL" or its inverse in a random sequence of 500,000 letters.
ALL or LLA, if found then count

4. When I run the program, the spinning wheel continues after your message pops up the count of matches. If I hit "Ok" for that message, the spinning wheel stops. I assume there's no reason to just wait until it stops spinning.
Jsu remove the msgbox in the last row in code script.
 
Upvote 0
BEBO,

Thanks again, this new version works very well.

Could you adapt this program to find sequences that skip letters? For instance, R I C (C I R) where the skips could be from 0 to 5 spaces where the spaces between letters is the same. Each space would be filled with any letter including R, I, C. For example, using 5 skips, I might get: XHRRTXIPIOBNKCCP... The bold letters skip 5, meeting the search criteria, while the underlined letters have an unbalanced skip 2 between R & I and 6 between I & C. If you prefer, I could write this in a new question?
 
Upvote 0
Of course, I can assist you with this new request. However, I assume that you have understood how my code works. If you do, please go ahead and customize it for your new requirements, alright? I'll be ready to help you step by step.
 
Upvote 0
Of course, I can assist you with this new request. However, I assume that you have understood how my code works. If you do, please go ahead and customize it for your new requirements, alright? I'll be ready to help you step by step.
Bebo,
I've come across a problem with your macro: different letter sequences give me different results. In column A, you have 192 copies of our 26 letter alphabet which you randomize 100 times and look for a 3-letter sequence. Under those conditions, I would expect no differences among any 3-letter combinations. Here's a sample where I've run each letter sequence 5 times through your latest version.

RIC: 36, 36, 34, 35, 37
ABC: 23, 25, 23, 24,22
XYZ: 39, 40, 38, 38, 39
MNO: 42, 40, 39, 43, 40

I would like to adapt this program and am studying yours's to see if I can do it.
I would like to enter my word as: RIC, R I C, R I C, R I C, R I C, R I C, three letters with zero to 5 spaces which can be filled with any of the other letters, including the ones in the sequence that I'm looking for. Fo example, RRIZC would be RIC with a one space skip. One way I tried to do this was to change all the letters to X except the three used in the sequence that I'm looking for, but the problem is that the Xs wouldn't include other letters in the sequence: RXXXIXXXC for example would be a 3 letter skip but none of the Xs would include members from the 192 RIC letters.

Ric
 
Upvote 0
Thanks, Bebo, this works great. My next step will be to load a book letter by letter to replace the 192 copies of the alphabet, and then look for the frequency of occurrences of certain words in the book itself, as well as the randomized versions, which your program does. I assume that there will be no problem with changing the length of Col A (say 500 to 10000 letters). To accomplish the first (look for the word sequence in the nonrandomized column), what changes do I need to make to the code to eliminate the randomization. Note, this will be a 2nd program, and I don't care if the 100 columns are the same. I tried eliminating Randomize command in the code and that didn't work, also change the definition of r=INT(Rnd*... to r=INT(1... but neither worked.
 
Upvote 0

Forum statistics

Threads
1,215,394
Messages
6,124,683
Members
449,180
Latest member
kfhw720

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top