Need help in pairing of sorted data

j12345

New Member
Joined
Jun 17, 2015
Messages
6
I am trying to pair (~60k lines) measurement data within a sorted list. if sorting can be automated that would be great but is not needed as I can do the sorting prior to comparing the data. (sorted by group1, then set, then measurement). I'd like to be able to pair two consecutive rows (within group and set) and label them when measurements are within 20 points of each other. see example before sorting(first set) and after sorting and labeling for pairs(last set).
I hope the question is clear. I tried to make the sample dataset as generic as possible without losing the intent.
Any help would be appreciated.

group1setmeasurements
1z2
4a32
4a5
1a3
1a4
2z5
2z65
3a1
1a1
5a6
5z65
1a2
1z3
4a43
3a12
3z3
4a56
4z6
4z456
5z654
2z3
2z3
5a65
5z354
5a345

<tbody>
</tbody>



group1setmeasurementspair within 20 points in group1 in each set
1a1pair1
1a2pair1
1a3pair2
1a4pair2
1z2pair3
1z3pair3
2z3pair4
2z3pair4
2z5null
2z65null
3a1pair5
3a12pair5
3z3null
4a5null
4a32pair6
4a43pair6
4a56null
4z6null
4z456null
5a6null
5a65null
5a345null
5z65null
5z354null
5z654null

<tbody>
</tbody>
 

Excel Facts

Do you hate GETPIVOTDATA?
Prevent GETPIVOTDATA. Select inside a PivotTable. In the Analyze tab of the ribbon, open the dropown next to Options and turn it off
Hi and welcome to the MrExcel Message Board,

To process 60,000 rows you need something quite quick. Unfortunately, that means it is harder to understand.

The following code uses an ADODB Recordset. This is a bit like a database and allows the data to be read from Excel, sorted and copied into an array.

The array is "the wrong way round" so has to be transposed. I use another array for the output. The data is transposed row by row and the pairing logic is applied straight after.

The final step is to write the processed data with the extra column back to a second worksheet.
You probably could over-write the first one but that makes it harder to test and if there is an error anywhere you have lost your data.

The code below needs to be pasted into a macro Module in Excel.
Also, while in the VB Editor, you need to set a reference to "Microsoft ActiveX Data Objects Library 6.0" (or whatever your highest version is).
The workbook also needs to be saved before you run the macro.

Code:
'Requires: Tools--> References--> Microsoft ActiveX Data Objects Library 6.0
Sub CopyFromRecordset_To_Range()

    Const ptDiff As Long = 20
    
    Dim sSQLQry As String
    Dim arrIn As Variant
    Dim arrOut As Variant
    Dim Conn As New ADODB.Connection
    Dim rs As New ADODB.Recordset
    Dim DBPath As String, sconnect As String
    Dim SQLString As String
    Dim pNo As Long
    Dim prevWasPair As Boolean
    Dim i As Long
    Dim ws1 As Worksheet
    Dim ws2 As Worksheet
    
    Set ws1 = ThisWorkbook.Worksheets("Sheet1")
    Set ws2 = ThisWorkbook.Worksheets("Sheet2")
    
    DBPath = ThisWorkbook.FullName
    sconnect = "Provider=MSDASQL.1;DSN=Excel Files;DBQ=" & DBPath & ";HDR=Yes';"
    Conn.Open sconnect
    SQLString = "SELECT * From [" & ws1.Name & "$] Order By [group1], [set], [measurements]"
    rs.Open SQLString, Conn
    arrIn = rs.GetRows
    rs.Close
    Conn.Close
    
    pNo = 1
    prevWasPair = True
    ReDim arrOut(1 To UBound(arrIn, 2) + 1, 1 To 4)
    For i = 1 To UBound(arrIn, 2) + 1
        ' Transpose and change array base from 0 to 1
        arrOut(i, 1) = arrIn(0, i - 1)
        arrOut(i, 2) = arrIn(1, i - 1)
        arrOut(i, 3) = arrIn(2, i - 1)
        arrOut(i, 4) = "null"

        ' Verify that previous row was NOT paired
        If prevWasPair = False Then
            ' Check if same grouping
            If (arrOut(i, 1) = arrOut(i - 1, 1)) And (arrOut(i, 2) = arrOut(i - 1, 2)) Then
                ' Check if previous record is withing limits
                If arrOut(i, 3) - arrOut(i - 1, 3) < ptDiff Then
                    arrOut(i - 1, 4) = "Pair" & pNo
                    arrOut(i, 4) = "Pair" & pNo
                    pNo = pNo + 1
                    prevWasPair = True
                Else
                    prevWasPair = False
                End If
            Else
                prevWasPair = False
            End If
        Else
            prevWasPair = False
        End If
    Next
    
    With ws2
        .Cells.Clear
        .Range("A1:D1") = Array("group1", "set", "measurements", "pair")
        .Range("A2").Resize(UBound(arrOut, 1), UBound(arrOut, 2)) = arrOut
        .Columns("A:D").AutoFit
        .Columns("A:D").HorizontalAlignment = xlCenter
    End With
    
End Sub

The sheet names can be changed by overtyping the names in the Set commands.
The points difference can be changed by changing ptDiff at the top of the macro.
 
Upvote 0
Thanks for the reply.
i tried your code. and am getting user defined type error when compiling. (i am a complete newbie to macros so i am going to ask for your patience). it stops at Dim Conn as New ADODB line.
i want to also clear up one thing. if the DB response is in here because of the size of the dataset i presented in the original post, i would like to know if the answer would be different if the data only had a few thousand lines. (daily/weekly data vs. running historical data). i can work with weekly data which is a smaller subset. I don't know if that makes a difference in the approach (using a DB export and re import solution you provided).
 
Last edited:
Upvote 0
Hi,

When I started I thought I was going to be able to do more processing with the recordset that I managed to achieve. I am still learning, too. So I created another version today which does not use ADODB at all, it uses the standard Excel sort procedure and I read the results into an array.

Yes, the design was because you might have 60,000 rows, Writing back and forth to the worksheet can take an age.

Please try this version. It is similar to the previous one but is more low tech.

Code:
Sub PairData()

    Const ptDiff As Long = 20
    
    Dim arrIn As Variant
    Dim arrOut As Variant
    Dim pNo As Long
    Dim prevWasPair As Boolean
    Dim i As Long
    Dim ws1 As Worksheet
    
    Set ws1 = ThisWorkbook.Worksheets("Sheet1")
    
    With ws1
        .UsedRange.Sort key1:=.Range("A1"), key2:=.Range("B1"), key3:=.Range("C1"), Header:=xlYes
        arrIn = .UsedRange.Columns("A:C")
    End With
    
    pNo = 1
    prevWasPair = False
    ReDim arrOut(1 To UBound(arrIn, 1), 1 To 1)
    arrOut(1, 1) = "pair"   ' Set title for the pair column
    arrOut(2, 1) = "null"
    For i = 3 To UBound(arrIn, 1)

        arrOut(i, 1) = "null"

        ' Verify that previous row was NOT paired
        If (prevWasPair = False) And _
              (arrIn(i, 1) = (arrIn(i - 1, 1)) And _
              (arrIn(i, 2) = arrIn(i - 1, 2)) And _
              arrIn(i, 3) - arrIn(i - 1, 3) < ptDiff) Then
            arrOut(i - 1, 1) = "Pair" & pNo
            arrOut(i, 1) = "Pair" & pNo
            pNo = pNo + 1
            prevWasPair = True
        Else
            prevWasPair = False
        End If

    Next
    
    With ws1
        .Columns("D:D").Clear
        .Range("D1").Resize(UBound(arrOut, 1), 1) = arrOut
        .Columns("A:D").AutoFit
        .Columns("A:D").HorizontalAlignment = xlCenter
    End With
    
End Sub
 
Upvote 0
Thanks RickXL,
That worked like a charm!
And as I am getting greedy, I would like for your help on customizing this code for my actual spreadsheet and ask how I might change the code for the following.
1. if the data has columns A through F,
2. Sorting groups are located in column (A: group, C:set, F:Measurements), and I'd like the pair/null output column to be in column G.
I tried to modify your code and am having trouble getting it to work properly for the above two reasons, (Mainly)....
I am sure there are other things that can be done to customize it further but for now the two items above would work to just copy and paste the code and run the macros in my spreadsheet without further adjustment.

Another question is: I assume the data has to be complete? as in no missing data point in any of the measurement lines, for instance. I wanted to confirm on an error it says about "can't execute code in break mode".
Just as an FYI, i ran your code on ~70k lines and it worked with no problem!
Thanks for your help.
 
Upvote 0
Hi,

Here is the rtevised macro:
Rich (BB code):
Sub PairData()

    Const ptDiff As Long = 20
    
    Dim arrIn As Variant
    Dim arrOut As Variant
    Dim pNo As Long
    Dim prevWasPair As Boolean
    Dim i As Long
    Dim ws1 As Worksheet
    
    Set ws1 = ThisWorkbook.Worksheets("Sheet1")
    
    With ws1
        .UsedRange.Sort key1:=.Range("A1"), key2:=.Range("C1"), key3:=.Range("F1"), Header:=xlYes
        arrIn = .UsedRange.Columns("A:F")
    End With
    
    pNo = 1
    prevWasPair = False
    ReDim arrOut(1 To UBound(arrIn, 1), 1 To 1)
    arrOut(1, 1) = "pair"   ' Set title for the pair column
    arrOut(2, 1) = "null"
    For i = 3 To UBound(arrIn, 1)

        arrOut(i, 1) = "null"

        ' Verify that previous row was NOT paired
        If (prevWasPair = False) And _
              (arrIn(i, 1) = (arrIn(i - 1, 1)) And _
              (arrIn(i, 3) = arrIn(i - 1, 3)) And _
              arrIn(i, 6) - arrIn(i - 1, 6) < ptDiff) Then
            arrOut(i - 1, 1) = "Pair" & pNo
            arrOut(i, 1) = "Pair" & pNo
            pNo = pNo + 1
            prevWasPair = True
        Else
            prevWasPair = False
        End If

    Next
    
    With ws1
        .Columns("G:G").Clear
        .Range("G1").Resize(UBound(arrOut, 1), 1) = arrOut
        .Columns("A:G").AutoFit
        .Columns("A:G").HorizontalAlignment = xlCenter
    End With
    
End Sub


When you say "missing data", which bit is missing, exactly?
 
Upvote 0
RickXL,
It doesn't compile if there is a gap in the datapoint. And one more peculiar thing I noticed is when I copy and paste subset of data into new sheet and run the macro, the pairing runs to the end of the sheet so it will just pair blank cells all the way down to the end. Just something peculiar I noticed. It doesn't affect my analysis but I am guessing there is a way to stop at the end of the list.
 
Upvote 0
Hi,

The previous versions used UsedRange to work out the data range. It does work properly if you replace a lot of data with less data. It remembers how big it used to be. So I have changed that.

Also, the new code skips blank data in column F.

Code:
Sub PairData()

    Const ptDiff As Long = 20
    
    Dim arrIn As Variant
    Dim arrOut As Variant
    Dim pNo As Long
    Dim prevWasPair As Boolean
    Dim i As Long
    Dim ws1 As Worksheet
    Dim r As Range
    
    Set ws1 = ActiveSheet       'ThisWorkbook.Worksheets("Sheet1")
    
    With ws1
        Set r = .Range("A1:F" & .Cells(.Rows.Count, "A").End(xlUp).Row)
        r.Sort key1:=.Range("A1"), key2:=.Range("C1"), key3:=.Range("F1"), Header:=xlYes
        arrIn = r.Value
    End With
    
    pNo = 1
    prevWasPair = False
    ReDim arrOut(1 To UBound(arrIn, 1), 1 To 1)
    arrOut(1, 1) = "pair"   ' Set title for the pair column
    arrOut(2, 1) = "null"

    For i = 3 To UBound(arrIn, 1)

        arrOut(i, 1) = "null"

        ' Verify that previous row was NOT paired
        If (prevWasPair = False) And _
                (arrIn(i, 1) = (arrIn(i - 1, 1)) And _
                (arrIn(i, 3) = arrIn(i - 1, 3)) And _
                arrIn(i, 6) - arrIn(i - 1, 6) < ptDiff) And _
                arrIn(i, 6) <> "" Then
                arrOut(i - 1, 1) = "Pair" & pNo
                arrOut(i, 1) = "Pair" & pNo
                pNo = pNo + 1
            prevWasPair = True
        Else
            prevWasPair = False
        End If

    Next
    
    With ws1
        .Columns("G:G").Clear
        .Range("G1").Resize(UBound(arrOut, 1), 1) = arrOut
        .Columns("A:G").AutoFit
        .Columns("A:G").HorizontalAlignment = xlCenter
    End With
    
End Sub
 
Upvote 0

Forum statistics

Threads
1,203,663
Messages
6,056,623
Members
444,878
Latest member
SoupLaura

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top