I have what I *think* is just a complicated IF/INDEX/MATCH problem that I have not quite been able to figure out. Hoping someone here can help.
I have two data sets which I am trying to merge. Dataset 1 contains a series of cases defined by a 9-digit serial number, a 4-digit ID number (this is needed as a unique identifier because the same serial # may appear multiple times in the set), a date, and a few columns of additional data. Dataset 2 contains 9-digit serial numbers, dates, and again, a few columns of (different) data. In this set, there are again multiple entries per serial number, but there is no unique identifying 4-digit ID number. The sets overlap completely in that they contain the same 9-digit serial numbers, but there are far more rows/entries in dataset 2.
To properly merge these sets, what I need to do is assign the right 4-digit ID number (along with corresponding date and data) from set 1 to the 9-digit serial numbers in set 2. Usually, for this type of problem, INDEX/MATCH is the solution I employ. However, what makes this a bit more complicated is that there are multiple unique 4-digit ID numbers for many of the 9-digit serial numbers (with the differentiating variable being a date), but simple INDEX/MATCH will only result in assigning one ID number to a given serial number. The right choice of ID number is a function of serial number AND date, and I cannot figure out how to create a formula to incorporate both of these properly.
Below I have provided some example data, including desired output. Basically, I want to match on serial number and merge "pet" data from dataset 1 with dataset 2 provided that the corresponding date for that entry in dataset 1 occurred before the date in for the ID number in dataset 2.
Example Set 1:
<tbody>
</tbody>
Example Set 2:
<tbody>
</tbody>
Ideal merged table appearance (though other forms of merge would be ok if this is too difficult - I can make data more 'wide' later):
<tbody>
</tbody>
Any help from experts or ambitious amateurs would be appreciated. Thanks!
I have two data sets which I am trying to merge. Dataset 1 contains a series of cases defined by a 9-digit serial number, a 4-digit ID number (this is needed as a unique identifier because the same serial # may appear multiple times in the set), a date, and a few columns of additional data. Dataset 2 contains 9-digit serial numbers, dates, and again, a few columns of (different) data. In this set, there are again multiple entries per serial number, but there is no unique identifying 4-digit ID number. The sets overlap completely in that they contain the same 9-digit serial numbers, but there are far more rows/entries in dataset 2.
To properly merge these sets, what I need to do is assign the right 4-digit ID number (along with corresponding date and data) from set 1 to the 9-digit serial numbers in set 2. Usually, for this type of problem, INDEX/MATCH is the solution I employ. However, what makes this a bit more complicated is that there are multiple unique 4-digit ID numbers for many of the 9-digit serial numbers (with the differentiating variable being a date), but simple INDEX/MATCH will only result in assigning one ID number to a given serial number. The right choice of ID number is a function of serial number AND date, and I cannot figure out how to create a formula to incorporate both of these properly.
Below I have provided some example data, including desired output. Basically, I want to match on serial number and merge "pet" data from dataset 1 with dataset 2 provided that the corresponding date for that entry in dataset 1 occurred before the date in for the ID number in dataset 2.
Example Set 1:
Serial # | Pet | Date of Pet purchase |
002241081 | cat | 3/23/2012 |
002241081 | dinosaur | 3/13/2013 |
002241081 | cow | 8/24/2010 |
002241081 | bird | 9/24/2013 |
002241081 | frog | 10/13/2010 |
002241081 | snake | 4/6/2012 |
<tbody>
</tbody>
Example Set 2:
ID # | Serial # | Date of score | Score |
1098 | 002241081 | 6/21/2012 | 2R |
2056 | 002241081 | 3/6/2013 | 1R |
1115 | 002241081 | 6/18/2013 | 2R |
2251 | 002241081 | 7/25/2014 | 2R |
<tbody>
</tbody>
Ideal merged table appearance (though other forms of merge would be ok if this is too difficult - I can make data more 'wide' later):
ID # | Serial # | Date of score | Score | Pet 1 | Pet 2 | Pet 3 | Pet 4 | Pet 5 | Pet 6 |
1098 | 002241081 | 6/21/2012 | 2R | cow | frog | cat | snake | ||
2056 | 002241081 | 3/6/2013 | 1R | cow | frog | cat | snake | ||
1115 | 002241081 | 6/18/2013 | 2R | cow | frog | cat | snake | dinosaur | |
2251 | 002241081 | 7/25/2014 | 2R | cow | frog | cat | snake | dinosaur | bird |
<tbody>
</tbody>
Any help from experts or ambitious amateurs would be appreciated. Thanks!