I am doing a research project using downloadable U.S. Census data.
So far I’ve been able to rely primarily on Excel. I am about to start a more complicated analysis, concerning childcare options used by parents. Census has suggested I need statistical software, maybe to learn R, which I understand has a pretty steep learning curve.
I want to describe what I need below and ask advice as to whether I can accomplish it with Excel formulas, whether learning VBA would be a good route to go here. I’ve got no experience with VBA either at this point.
Here is a much simplified spreadsheet showing data for two households:
Excel 2007
<tbody>
</tbody>
I am interested in percentage of children in various childcare arrangements. The census associates the data as to childcare with a parent’s record (at this point it appears to me that in a two-parent household they associate the childcare data with the mother’s record, not with both mother and father, so as to avoid duplication). But I need to tag onto the parent’s record certain data about the children, specifically the “person weight” associated with each child.
I did something similar earlier that was much easier, with married couples where I had one row for each spouse. Since there were always two rows for each couple, I could easily tag information, for example, about husband’s employment status onto the wife’s row.
But here the number of rows for each family is variable. Determining which data to tag onto the parent’s row will require matching up the household ID, and also looking at the numbers indicating relationships so that each child's person weight is associated with the correct childcare data.
Aside from the fact that the real data could obviously include more than two children, it will also involve far more variables – yes or no on a whole list of specific childcare options. I may also want to tag information about the husband onto the wife’s record.
Would much appreciate advice about what direction to go in software-wise.
<tbody>
</tbody>
So far I’ve been able to rely primarily on Excel. I am about to start a more complicated analysis, concerning childcare options used by parents. Census has suggested I need statistical software, maybe to learn R, which I understand has a pretty steep learning curve.
I want to describe what I need below and ask advice as to whether I can accomplish it with Excel formulas, whether learning VBA would be a good route to go here. I’ve got no experience with VBA either at this point.
Here is a much simplified spreadsheet showing data for two households:
Excel 2007
A | B | C | D | E | F | G | H | I | J | K | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Household ID | PersonID | Sex | Age | Person weight | Person ID youngest child | Paid care youngest child | Person ID 2nd youngest child | Paid care 2nd youngest child | Person ID mother | Employed |
2 | 35678 | 101 | 1 | 35 | 245.67 | -1 | -1 | -1 | -1 | 9999 | 1 |
3 | 35678 | 102 | 2 | 32 | 245.67 | 104 | 1 | 103 | 2 | 9999 | 1 |
4 | 35678 | 103 | 2 | 8 | 314.53 | -1 | -1 | -1 | -1 | 102 | 2 |
5 | 35678 | 104 | 2 | 3 | 256.98 | -1 | -1 | -1 | -1 | 102 | 2 |
6 | 45962 | 101 | 2 | 24 | 236.8 | 102 | 2 | -1 | -1 | 9999 | 2 |
7 | 45962 | 102 | 1 | 1 | 302.56 | -1 | -1 | -1 | -1 | 101 | 2 |
<tbody>
</tbody>
Sheet1
I am interested in percentage of children in various childcare arrangements. The census associates the data as to childcare with a parent’s record (at this point it appears to me that in a two-parent household they associate the childcare data with the mother’s record, not with both mother and father, so as to avoid duplication). But I need to tag onto the parent’s record certain data about the children, specifically the “person weight” associated with each child.
I did something similar earlier that was much easier, with married couples where I had one row for each spouse. Since there were always two rows for each couple, I could easily tag information, for example, about husband’s employment status onto the wife’s row.
But here the number of rows for each family is variable. Determining which data to tag onto the parent’s row will require matching up the household ID, and also looking at the numbers indicating relationships so that each child's person weight is associated with the correct childcare data.
Aside from the fact that the real data could obviously include more than two children, it will also involve far more variables – yes or no on a whole list of specific childcare options. I may also want to tag information about the husband onto the wife’s record.
Would much appreciate advice about what direction to go in software-wise.
101 | 2 |
<tbody>
</tbody>