How should I organize a large data set for statistical analysis? Any suggestions? Examples? Sample control chart xls files?
Data:
- Each part undergoes 3 inspections:
visual (6 data points per part)
test1 (40 data points per part)
test2 (40 data points per part) {ex measure paint thickness in 40 locations on each part)
- one test lot of 25 parts is inspected per day.
- Therefore 25 x(6 + 40 + 40) = 2150 data points/day
Analysis:
- Individual part: plot graphs of tests 1 &2 to show variation over the surface.
- Daily Lot of 25: plot mean and std dev for the 25 parts,
- Weekly: plot control chart and other summaries. Some chart plot avg per part, others avg per lot of 25 parts.
- Monthly: plot charts with each week's test data condensed to a single point on a chart.
Current (inherited) system:
- daily data file contains all data, calculations and charts for the lot of 25.
- macro copies the summary data that was calculated in the daily file (mean, std dev, etc.) to the weekly file
- weekly file: plots summary data (contains many weeks of daily summary data)
- monthly file: manually paste one week of data onto a sheet that calculates the weeks averages, then copy that result to a new table.
My concerns:
- since the daily file pushes the results of its calculations up to the weekly file, If I fins a mistake in the daly file calc, then I need to edit 6 months or more of daily files to correct it. (already found one mistake in the monthly calc, which would be ugly to recalc with the current manual copy/paste intensive method)
- Since the weekly file does not contain the original data, (1) to drill down into the data, I need to find and open the daily files, and (2)to add any new calcs, I need to add equations to every individual file and push the data back up again.
Question:
- How should the data be organized? I am leaning towards:
1) keeping only raw data in the daily file (labelled with the date), then have a "daily analysis file" where you select the desired date, and it either uses a macro to copy the data into the file, or use indirect statements to link to the file.
2) For the weekly file, store all data, then us a pivot table to perform the calcs (but then the file will get huge) or, use a macro to import a week of data at a time, perform the stat calcs, save them and then erase the original data, or lots of indirect equations to link files (but then long recalc time)
Any suggestions? Examples? Sample control chart xls files?
Thanks,
Brian
Data:
- Each part undergoes 3 inspections:
visual (6 data points per part)
test1 (40 data points per part)
test2 (40 data points per part) {ex measure paint thickness in 40 locations on each part)
- one test lot of 25 parts is inspected per day.
- Therefore 25 x(6 + 40 + 40) = 2150 data points/day
Analysis:
- Individual part: plot graphs of tests 1 &2 to show variation over the surface.
- Daily Lot of 25: plot mean and std dev for the 25 parts,
- Weekly: plot control chart and other summaries. Some chart plot avg per part, others avg per lot of 25 parts.
- Monthly: plot charts with each week's test data condensed to a single point on a chart.
Current (inherited) system:
- daily data file contains all data, calculations and charts for the lot of 25.
- macro copies the summary data that was calculated in the daily file (mean, std dev, etc.) to the weekly file
- weekly file: plots summary data (contains many weeks of daily summary data)
- monthly file: manually paste one week of data onto a sheet that calculates the weeks averages, then copy that result to a new table.
My concerns:
- since the daily file pushes the results of its calculations up to the weekly file, If I fins a mistake in the daly file calc, then I need to edit 6 months or more of daily files to correct it. (already found one mistake in the monthly calc, which would be ugly to recalc with the current manual copy/paste intensive method)
- Since the weekly file does not contain the original data, (1) to drill down into the data, I need to find and open the daily files, and (2)to add any new calcs, I need to add equations to every individual file and push the data back up again.
Question:
- How should the data be organized? I am leaning towards:
1) keeping only raw data in the daily file (labelled with the date), then have a "daily analysis file" where you select the desired date, and it either uses a macro to copy the data into the file, or use indirect statements to link to the file.
2) For the weekly file, store all data, then us a pivot table to perform the calcs (but then the file will get huge) or, use a macro to import a week of data at a time, perform the stat calcs, save them and then erase the original data, or lots of indirect equations to link files (but then long recalc time)
Any suggestions? Examples? Sample control chart xls files?
Thanks,
Brian