Best method of extracting data from large dataset

Joined
Aug 7, 2017
Messages
4
Hi

Apologies if something similar has been posted before, but I was hoping for some help with a problem I've been having with a project I'm working on.

I'm working with a large dataset (approx 200,000 rows), looking at a large number of petitions (approx 10,000). which a sample of people (approx 1,000) have signed.

The problem I have is that I need to produce a list of which people have signed the same petitions as each other. The data is in the below format (each row a signature, with columns including the name of respondent, name of petition etc).


Respondent namePetition namevar2var3
Person 11
Person 31
Person 61
Person 12
Person 22
Person 13
Person 173

<tbody>
</tbody>

Eventually, I want to be able to have a matrix showing the respondents as columns & rows, with the cells showing the no. of petitions signed by both respondents. However, I'm struggling to know how to get this information without taking months and months!

Any help would be greatly appreciated, please let me know if you need anything else which I've missed out.

Many thanks

Andrew
 

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand
ABCDEFGHIJKL
1Respondent name
Petition name
var2
var3
Person 1
Person 2
Person 3
Person 4
Person 6
Person 17
2Person 1
1
Person 1
1
1
2
1
1
3Person 3
1
Person 2
1
0
1
0
0
4Person 6
1
Person 3
1
0
1
1
0
5person 4
1
Person 4
2
1
1
1
0
6Person 1
2
Person 6
1
0
1
1
0
7Person 2
2
Person 17
1
0
0
0
0
8person 4
2
9Person 1
3
10Person 17
3

<tbody>
</tbody>

Try this VBA code you will need to change ranges and the do until loop to match your data. I only do the loop until it is > 3 since your sample data goes to 3.

Code:
Sub survay()
Dim lrl As Long
Dim lrm As Long
Dim lc As Long
lrl = Cells(Rows.Count, 1).End(xlUp).Row
lrm = Cells(Rows.Count, 6).End(xlUp).Row
lc = Cells(1, Columns.Count).End(xlToLeft).Column
samep = 0
For x = 2 To lrm
    For y = 7 To lc
        If Cells(x, 6) = Cells(1, y) Then
            Cells(x, y) = ""
        Else
            pnum = 1
            Do Until pnum > 3
                If Application.WorksheetFunction.SumIfs(Range("B2:B" & lrl), Range("A2:A" & lrl), Cells(x, 6), Range("B2:B" & lrl), pnum) > 0 And Application.WorksheetFunction.SumIfs(Range("B2:B" & lrl), Range("A2:A" & lrl), Cells(1, y), Range("B2:B" & lrl), pnum) > 0 Then
                    samep = samep + 1
                    pnum = pnum + 1
                Else
                pnum = pnum + 1
                End If
            
            Loop
        
            Cells(x, y) = samep
            samep = 0
                        
        End If
    
    Next y
Next x
End Sub
 
Upvote 0
Hi

Thanks so much for this, that is exactly the format that I need my data to be in!

I'm pretty unfamiliar with VBA, so could I ask a couple of quick follow ups to make sure I'm making the right amendments?

1. There are 9692 individual petitions, but their IDs are not ordered in this way (all defined as a random 5 digit number). Does this mean I need to do the loop until '> the number of petitions', or '> the highest number in the list'?

2. When it comes to editing the ranges, I assume that if I was to reduce my table to only 2 columns, then the above code would work without changes? Failing that, I can see that I would obviously need to put in "Range("C2:C" & lrl)" etc, but I'm not sure how I would go about ordering this within the wider script that you've given. Does that make sense?

Thanks again for your help

Andrew
 
Upvote 0
1. Pnum is used to increment the petition number as to check so you want to use the highest number on the list.

2. The code only looks at Respondent name and Petition name any other column in the table has no effect on the code. The ranges the code looks at are Respondent name, Petition name , the list of names in the matrix both down and across (F2:F7 and G1:L1 in my example data).
 
Upvote 0
Sub survay()Dim lrl As Long
Dim lrm As Long
Dim lc As Long
lrl = Cells(Rows.Count, 1).End(xlUp).Row
lrm = Cells(Rows.Count, 6).End(xlUp).Row
lc = Cells(1, Columns.Count).End(xlToLeft).Column
samep = 0
For x = 2 To lrm
For y = 7 To lc
If Cells(x, 6) = Cells(1, y) Then
Cells(x, y) = ""
Else
pnum = 1
Do Until pnum > 50701
If Application.WorksheetFunction.SumIfs(Range("B2:B" & lrl), Range("A2:A" & lrl), Cells(x, 6), Range("B2:B" & lrl), pnum) > 0 And Application.WorksheetFunction.SumIfs(Range("B2:B" & lrl), Range("A2:A" & lrl), Cells(1, y), Range("B2:B" & lrl), pnum) > 0 Then
samep = samep + 1
pnum = pnum + 1
Else
pnum = pnum + 1
End If

Loop

Cells(x, y) = samep
samep = 0

End If

Next y
Next x
End Sub

Hi

So if I understand correctly, all I need to do then is amend the 'Do until pnum > 3' section, like in the above quote, then the script should produce the matrix?

I'm afraid I've just run the script and it doesn't appear to be working (no error message, just seemingly no activity). I don't suppose you might have any idea why this might be the case? As I say, I'm pretty unfamiliar with this, so there's a pretty good chance that it's something I'm not doing rather than the script itself. I'm trying to use Youtube tutorial vidoes, but not managed to come across anything which helps so far.

Thanks again for your help

Andrew
 
Upvote 0
The original code did not create the list of unique names for the matrix, they had to be manually created. I have added code to create it. The code will delete the old matrix if it exists.

Code:
Sub survay()
Dim lrl As Long
Dim lrm As Long
Dim lc As Long
lrl = Cells(Rows.count, 1).End(xlUp).Row 'find last row of list
Application.ScreenUpdating = False
samep = 0
lrm = Cells(Rows.count, 6).End(xlUp).Row 'finds last row of matrix
Range(Cells(1, 6), Cells(lrm, 6)).ClearContents 'deletes old matrix

 Range("A2:A" & lrl).Copy Range("F2") 'copys list of names to create the matrix
   Application.CutCopyMode = False
    lrm = Cells(Rows.count, 6).End(xlUp).Row 'finds last row of matrix
    ActiveSheet.Range("$F$2:$F$" & lrm).RemoveDuplicates Columns:=1, Header:=xlNo 'removes duplicate names
    lrm = Cells(Rows.count, 6).End(xlUp).Row 'finds new last row of matrix
    ActiveWorkbook.Worksheets("Sheet1").Sort.SortFields.Clear 'sorts the names in the matrix
    ActiveWorkbook.Worksheets("Sheet1").Sort.SortFields.Add Key:=Range("F2"), _
        SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
    With ActiveWorkbook.Worksheets("Sheet1").Sort
        .SetRange Range("F2:F" & lrm)
        .Header = xlNo
        .MatchCase = False
        .Orientation = xlTopToBottom
        .SortMethod = xlPinYin
        .Apply
    End With
    Range("F2:F" & lrm).Copy 'this part copys the names and pasts them across the top
    Range("G1").Select
    Selection.PasteSpecial Paste:=xlPasteAll, Operation:=xlNone, SkipBlanks:= _
        False, Transpose:=True
        Application.CutCopyMode = False
lc = Cells(1, Columns.count).End(xlToLeft).Column 'finds last column of matrix
For x = 2 To lrm
    For y = 7 To lc
        If Cells(x, 6) = Cells(1, y) Then
            Cells(x, y) = ""
        Else
            pnum = 1
            Do Until pnum > 3
                If Application.WorksheetFunction.SumIfs(Range("B2:B" & lrl), Range("A2:A" & lrl), Cells(x, 6), Range("B2:B" & lrl), pnum) > 0 And Application.WorksheetFunction.SumIfs(Range("B2:B" & lrl), Range("A2:A" & lrl), Cells(1, y), Range("B2:B" & lrl), pnum) > 0 Then
                    samep = samep + 1
                    pnum = pnum + 1
                Else
                pnum = pnum + 1
                End If
            
            Loop
        
            Cells(x, y) = samep
            samep = 0
                        
        End If
    
    Next y
Next x
Application.ScreenUpdating = True
End Sub
 
Upvote 0
Use this instead I made a mistake in the above code.

Code:
Sub survay()
Dim lrl As Long
Dim lrm As Long
Dim lc As Long
lrl = Cells(Rows.count, 1).End(xlUp).Row 'find last row of list
Application.ScreenUpdating = False
samep = 0
lc = Cells(1, Columns.count).End(xlToLeft).Column 'finds last column of matrix
If lc < 6 Then lc = 6
lrm = Cells(Rows.count, 6).End(xlUp).Row 'finds last row of matrix
Range(Cells(1, 6), Cells(lrm, lc)).ClearContents 'deletes old matrix

Range("A2:A" & lrl).Copy Range("F2") 'copys list of names to create the matrix
   Application.CutCopyMode = False
    lrm = Cells(Rows.count, 6).End(xlUp).Row 'finds last row of matrix
    ActiveSheet.Range("$F$2:$F$" & lrm).RemoveDuplicates Columns:=1, Header:=xlNo 'removes duplicate names
    lrm = Cells(Rows.count, 6).End(xlUp).Row 'finds new last row of matrix
    ActiveWorkbook.Worksheets("Sheet1").Sort.SortFields.Clear 'sorts the names in the matrix
    ActiveWorkbook.Worksheets("Sheet1").Sort.SortFields.Add Key:=Range("F2"), _
        SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
    With ActiveWorkbook.Worksheets("Sheet1").Sort
        .SetRange Range("F2:F" & lrm)
        .Header = xlNo
        .MatchCase = False
        .Orientation = xlTopToBottom
        .SortMethod = xlPinYin
        .Apply
    End With
    Range("F2:F" & lrm).Copy 'this part copys the names and pasts them across the top
    Range("G1").Select
    Selection.PasteSpecial Paste:=xlPasteAll, Operation:=xlNone, SkipBlanks:= _
        False, Transpose:=True
        Application.CutCopyMode = False
lc = Cells(1, Columns.count).End(xlToLeft).Column 'finds last column of matrix
For x = 2 To lrm
    For y = 7 To lc
        If Cells(x, 6) = Cells(1, y) Then
            Cells(x, y) = ""
        Else
            pnum = 1
            Do Until pnum > 3
                If Application.WorksheetFunction.SumIfs(Range("B2:B" & lrl), Range("A2:A" & lrl), Cells(x, 6), Range("B2:B" & lrl), pnum) > 0 And Application.WorksheetFunction.SumIfs(Range("B2:B" & lrl), Range("A2:A" & lrl), Cells(1, y), Range("B2:B" & lrl), pnum) > 0 Then
                    samep = samep + 1
                    pnum = pnum + 1
                Else
                pnum = pnum + 1
                End If
            
            Loop
        
            Cells(x, y) = samep
            samep = 0
                        
        End If
    
    Next y
Next x
Application.ScreenUpdating = True
End Sub
 
Upvote 0
Thanks, it appears to be working now.

That said, Excel has been 'not responding' for several hours since I first entered the code. I assume that this can just be put down to the volume of data that is being processed?
 
Upvote 0
Yes, it will take some time to run since the code is looping though many petition numbers even if they do not exist on your list and the size of your list.
It would probable best to put the petition numbers in an array and loop though the array. It would still take some time if you have a lot of data. I do not have time to write and test the code right now.
 
Last edited:
Upvote 0
This should run a little bit faster as it will only test for petition numbers in your list.

Code:
Sub survay()
Dim lrl As Long
Dim lrm As Long
Dim lc As Long
lrl = Cells(Rows.Count, 1).End(xlUp).Row 'find last row of list
Application.ScreenUpdating = False
samep = 0
lc = Cells(1, Columns.Count).End(xlToLeft).Column 'finds last column of matrix
If lc < 6 Then lc = 6
lrm = Cells(Rows.Count, 6).End(xlUp).Row 'finds last row of matrix
Range(Cells(1, 6), Cells(lrm, lc)).ClearContents 'deletes old matrix
lrb = Cells(Rows.Count, 2).End(xlUp).Row 'creates array
Range("B2:B" & lrb).Copy Range("F2")
ActiveSheet.Range("$F$2:$F$" & lrb).RemoveDuplicates Columns:=1, Header:=xlNo 'removes duplicate names
lrb = Cells(Rows.Count, 2).End(xlUp).Row
Dim parray As Variant
parray = Range("F2:F" & lrb).Value
Range("F2:F" & lrb).ClearContents
Range("A2:A" & lrl).Copy Range("F2") 'copys list of names to create the matrix
   Application.CutCopyMode = False
    lrm = Cells(Rows.Count, 6).End(xlUp).Row 'finds last row of matrix
    ActiveSheet.Range("$F$2:$F$" & lrm).RemoveDuplicates Columns:=1, Header:=xlNo 'removes duplicate names
    lrm = Cells(Rows.Count, 6).End(xlUp).Row 'finds new last row of matrix
    ActiveWorkbook.Worksheets("Sheet1").Sort.SortFields.Clear 'sorts the names in the matrix
    ActiveWorkbook.Worksheets("Sheet1").Sort.SortFields.Add Key:=Range("F2"), _
        SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
    With ActiveWorkbook.Worksheets("Sheet1").Sort
        .SetRange Range("F2:F" & lrm)
        .Header = xlNo
        .MatchCase = False
        .Orientation = xlTopToBottom
        .SortMethod = xlPinYin
        .Apply
    End With
    Range("F2:F" & lrm).Copy 'this part copys the names and pasts them across the top
    Range("G1").Select
    Selection.PasteSpecial Paste:=xlPasteAll, Operation:=xlNone, SkipBlanks:= _
        False, Transpose:=True
        Application.CutCopyMode = False
lc = Cells(1, Columns.Count).End(xlToLeft).Column 'finds last column of matrix
For x = 2 To lrm
    For y = 7 To lc
        If Cells(x, 6) = Cells(1, y) Then
            Cells(x, y) = ""
        Else
            For Each element In parray
        pnum = (element)
            If Application.WorksheetFunction.SumIfs(Range("B2:B" & lrl), Range("A2:A" & lrl), Cells(x, 6), Range("B2:B" & lrl), pnum) > 0 And Application.WorksheetFunction.SumIfs(Range("B2:B" & lrl), Range("A2:A" & lrl), Cells(1, y), Range("B2:B" & lrl), pnum) > 0 Then
                    samep = samep + 1
            End If
        
    Next element
                 
            Cells(x, y) = samep
            samep = 0
                        
        End If
    
    Next y
Next x
Application.ScreenUpdating = True
End Sub
 
Upvote 0

Forum statistics

Threads
1,215,729
Messages
6,126,525
Members
449,316
Latest member
sravya

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top