Hello all,
I have a Master list of 230k IDs that correspond to different values depending on the raw data source. I have 30 Source sheets with ~140k IDs each, the sort and number of which are unique to each sheet. The Source sheets contain 2 columns, an ID column and a value ("Bucket") column (excuse all the tables but I understand it might be easier with a visual):
Master
[TABLE="width: 100"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Source1[/TD]
[TD]Source2[/TD]
[TD]Source3[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]8[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]10[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
Source1
[TABLE="width: 100"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Bucket[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]A[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]B[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]A[/TD]
[/TR]
[TR]
[TD]8[/TD]
[TD]C[/TD]
[/TR]
</tbody>[/TABLE]
Source2
[TABLE="width: 100"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Bucket[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]B[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]B[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]C[/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD]D[/TD]
[/TR]
[TR]
[TD]10[/TD]
[TD]A[/TD]
[/TR]
</tbody>[/TABLE]
Source3
[TABLE="width: 100"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Bucket[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]C[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]D[/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD]A[/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD]C[/TD]
[/TR]
</tbody>[/TABLE]
Ordinarily, one could do a VLOOKUP on the ID column in the master sheet and return the value in source sheet, but due to the size of these files, VLOOKUPs are no longer an option. I would like to create a scripting dictionary for each source which then uses the ID column in the master to populate the Bucket value. The outcome would look something like this:
Master
[TABLE="width: 100"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Source1[/TD]
[TD]Source2[/TD]
[TD]Source3[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]A[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD][/TD]
[TD]B[/TD]
[TD]C[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD][/TD]
[TD]B[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD][/TD]
[TD][/TD]
[TD]D[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]B[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD][/TD]
[TD][/TD]
[TD]A[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]A[/TD]
[TD]C[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]8[/TD]
[TD]C[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD][/TD]
[TD]D[/TD]
[TD]C[/TD]
[/TR]
[TR]
[TD]10[/TD]
[TD][/TD]
[TD]A[/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
Here is the code that I have so far. Credit goes to the excellent walk through by matthewspatrick at Experts Exchange which can be found here:
If I've done this correctly, this should create one of the 30 dictionaries. What I am stuck on is how to then call the values from this dictionary on the Master sheet. Ideally the above "create" code would be in each Source sheet, and the "call" code in the Master file would run all 30 dictionaries one after another.
I am also open to other ideas of how to solve this problem if I've tried to create too elegant a solution and made more work for myself.
Thank you for your help!
Kirsten
I have a Master list of 230k IDs that correspond to different values depending on the raw data source. I have 30 Source sheets with ~140k IDs each, the sort and number of which are unique to each sheet. The Source sheets contain 2 columns, an ID column and a value ("Bucket") column (excuse all the tables but I understand it might be easier with a visual):
Master
[TABLE="width: 100"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Source1[/TD]
[TD]Source2[/TD]
[TD]Source3[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]8[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]10[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
Source1
[TABLE="width: 100"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Bucket[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]A[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]B[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]A[/TD]
[/TR]
[TR]
[TD]8[/TD]
[TD]C[/TD]
[/TR]
</tbody>[/TABLE]
Source2
[TABLE="width: 100"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Bucket[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]B[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]B[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]C[/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD]D[/TD]
[/TR]
[TR]
[TD]10[/TD]
[TD]A[/TD]
[/TR]
</tbody>[/TABLE]
Source3
[TABLE="width: 100"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Bucket[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]C[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]D[/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD]A[/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD]C[/TD]
[/TR]
</tbody>[/TABLE]
Ordinarily, one could do a VLOOKUP on the ID column in the master sheet and return the value in source sheet, but due to the size of these files, VLOOKUPs are no longer an option. I would like to create a scripting dictionary for each source which then uses the ID column in the master to populate the Bucket value. The outcome would look something like this:
Master
[TABLE="width: 100"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Source1[/TD]
[TD]Source2[/TD]
[TD]Source3[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]A[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD][/TD]
[TD]B[/TD]
[TD]C[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD][/TD]
[TD]B[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD][/TD]
[TD][/TD]
[TD]D[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]B[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD][/TD]
[TD][/TD]
[TD]A[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]A[/TD]
[TD]C[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]8[/TD]
[TD]C[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD][/TD]
[TD]D[/TD]
[TD]C[/TD]
[/TR]
[TR]
[TD]10[/TD]
[TD][/TD]
[TD]A[/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
Here is the code that I have so far. Credit goes to the excellent walk through by matthewspatrick at Experts Exchange which can be found here:
Code:
Sub Rosetta()
Dim Contents As Variant
Dim r As Long
Dim dic As Object
Dim ID As String
Dim bucket As String
Dim Keys As Variant
With Application
.Calculation = xlCalculationManual
.ScreenUpdating = False
End With
With ThisWorkbook.Worksheets("Buckets")
.Range("c1").Resize(1, .Columns.Count - 2).EntireColumn.Delete
Contents = .Range("a2", .Cells(.Rows.Count, "B").End(xlUp)).Value
Set dic = CreateObject("Scripting.Dictionary")
For r = 1 To UBound(Contents, 1)
ID = Contents(r, 1)
bucket = Contents(r, 2)
If dic.Exists(ID) Then
If bucket < dic.Item(ID) Then dic.Item(ID) = bucket
Else
dic.Add ID, bucket
End If
Next
End With
With Application
.Calculation = xlCalculationAutomatic
.ScreenUpdating = True
End With
MsgBox "Done"
End Sub
If I've done this correctly, this should create one of the 30 dictionaries. What I am stuck on is how to then call the values from this dictionary on the Master sheet. Ideally the above "create" code would be in each Source sheet, and the "call" code in the Master file would run all 30 dictionaries one after another.
I am also open to other ideas of how to solve this problem if I've tried to create too elegant a solution and made more work for myself.
Thank you for your help!
Kirsten