Hello... I have a spreadsheet of nearly 100,000 rows of data for work, from two spreadsheets combined into one. I'm new to Excel, but enjoy using it, not TOO incompetent - I've figured out how to answer most of my questions and have learned some great new formulas. But I'm stuck, and need an answer to a complex one! I have a feeling I'm making it more complicated than I am, so forgive the length of this. I don't know how to really explain it, but I'll try.
See below - I have three tables, and the info I need is from Table 3. I'm changing the facts slightly to protect my job, but the substance is the same. (I don't really work for Major League Baseball)
Table 1: (5 Columns), A: Last Name, B: First Name, C: Middle Name, D: Shirt Number (#s), and E: Triple A team played for (a Triple A team will only ever be one of two teams: A12 or B12)
Table 2: (5 Columns), A: Last Name, B: First Name, C: Middle Name, D: Shirt Number (all #s), and E: Just shows that a Triple A player from Table 1 actually went pro with a single digit, "X" and it doesn't change
Table 3:(5 Columns) a single table comprised of all data from Table 1 and Table 2, and the table from which I have to get my answers. I can't get my answers from Table 1 or Table 2 individually, because I have to identify the commonalities between the two tables combined.
The Question I have to answer: I am identifying individuals, that is - from Table 3, how many players from Triple A (Table 1) actually went Pro (Table 2)?
Seems pretty straightforward. Of the nearly 100k rows of data I have in Table 3, we expect probably 10% of players in Table 3 at most will be found in BOTH Table 1 and Table 2. In other words, tons of people played Triple A, and tons more went pro - but our list of pro includes thousands of people who went pro that were never in Triple A. These are all real people, but the problem is actually identifying them and counting them. And I'm limited to the data, which came from a stat program, and it can't be changed: shirt number, triple A team, the digit showing they went pro, and three names...
Here is the realities with the data:
- Shirt #: good news: your shirt # is unique. Every name has it's own shirt number and it doesn't duplicate. Bad news: about 13k of the names from Triple A and Pros do not have shirt #s and must be identified by name.
- Triple A Team: good news: if you're in Triple A, the Team you played for is either A12 or B12 (Column 5). That's it, no exceptions.
- Pro Team: if you went pro, you're identified by the digit X (Column 5) - that's it, no exceptions.
- First, Middle, Last Names: Identifying by first, middle, and last name only is not enough, because due facts out of my control, the same individual may have their middle name included in Table 1, but not in Table 2; or the names are spelled slightly different. So even if you have the right person, you can't really search by first middle or last name in Table 3. But I will have to deal with this data limitation for the names without shirt numbers that went pro by trying to run a match by maybe just last name.
I hope this makes sense, it's been a long day! With everything above, is there any way to find the people who were in triple A that went Pro from Table 3? Or is it even possible? It's basically identifying the number of unique repeats over a series of columns over two rows (for the same person who was in triple A and went pro), and counting that "once." It would be really easy if it were just a few dozen lines to I... I think my strongest values are the Shirt #s, which identify individuals, and whether a shirt number that started as Team A12 or Team B12 became X at some point in Column 5?
Help! Much obliged in advance!
See below - I have three tables, and the info I need is from Table 3. I'm changing the facts slightly to protect my job, but the substance is the same. (I don't really work for Major League Baseball)
Table 1: (5 Columns), A: Last Name, B: First Name, C: Middle Name, D: Shirt Number (#s), and E: Triple A team played for (a Triple A team will only ever be one of two teams: A12 or B12)
Table 2: (5 Columns), A: Last Name, B: First Name, C: Middle Name, D: Shirt Number (all #s), and E: Just shows that a Triple A player from Table 1 actually went pro with a single digit, "X" and it doesn't change
Table 3:(5 Columns) a single table comprised of all data from Table 1 and Table 2, and the table from which I have to get my answers. I can't get my answers from Table 1 or Table 2 individually, because I have to identify the commonalities between the two tables combined.
The Question I have to answer: I am identifying individuals, that is - from Table 3, how many players from Triple A (Table 1) actually went Pro (Table 2)?
Seems pretty straightforward. Of the nearly 100k rows of data I have in Table 3, we expect probably 10% of players in Table 3 at most will be found in BOTH Table 1 and Table 2. In other words, tons of people played Triple A, and tons more went pro - but our list of pro includes thousands of people who went pro that were never in Triple A. These are all real people, but the problem is actually identifying them and counting them. And I'm limited to the data, which came from a stat program, and it can't be changed: shirt number, triple A team, the digit showing they went pro, and three names...
Here is the realities with the data:
- Shirt #: good news: your shirt # is unique. Every name has it's own shirt number and it doesn't duplicate. Bad news: about 13k of the names from Triple A and Pros do not have shirt #s and must be identified by name.
- Triple A Team: good news: if you're in Triple A, the Team you played for is either A12 or B12 (Column 5). That's it, no exceptions.
- Pro Team: if you went pro, you're identified by the digit X (Column 5) - that's it, no exceptions.
- First, Middle, Last Names: Identifying by first, middle, and last name only is not enough, because due facts out of my control, the same individual may have their middle name included in Table 1, but not in Table 2; or the names are spelled slightly different. So even if you have the right person, you can't really search by first middle or last name in Table 3. But I will have to deal with this data limitation for the names without shirt numbers that went pro by trying to run a match by maybe just last name.
I hope this makes sense, it's been a long day! With everything above, is there any way to find the people who were in triple A that went Pro from Table 3? Or is it even possible? It's basically identifying the number of unique repeats over a series of columns over two rows (for the same person who was in triple A and went pro), and counting that "once." It would be really easy if it were just a few dozen lines to I... I think my strongest values are the Shirt #s, which identify individuals, and whether a shirt number that started as Team A12 or Team B12 became X at some point in Column 5?
Help! Much obliged in advance!