Data Cleaning VBA

leeb91

New Member
Joined
May 22, 2015
Messages
20
Basically, what I am creating is a access data cleaning template which will automatically sort out information into a single standardized form for all of the excel files that I drop into the cleaning template.
I will have 200 different excel files with similar column headings, but not exactly the same (For example, "Disease Type" column in one excel sheet would correspond to "Disease" column in another excel sheet.
Some sheets will have more than usual number of columns in which case I will have to remove unnecessary column headings.
How do you go about it? I am thinking about developing macro that compares two data sheet (one standardized, one raw sheet) and take only the column headings I need and save that information on a different sheet. I would appreciate any help on this project.

Here is similar question by someone else from this forum

I am working with 2 separate files. Let's refer to them as (File 1) and (Template 1).
After opening (File 1), I need to compare the number of columns with headings to the number of columns with headings in
(Template 1). If the number of columns in (File 1) exceeds the number in (Template 1) than I must now examine each of the column headings in (File 1) to determine which of these does not exist in (Template 1). I will delete the entire column from (File 1) after determining which it is.

Also, the data has to be in a separate file (shouldn't be consolidated) since they are from many different companies. We would like to standardize and remove the unnecessary columns first and work on sorting the data within the column into a few different categories that we need to analyze the data.

One suggestion has been made which was to manually standardize the column headings for all 200 files since trying to program all of the variable possibilities does not seem practicable. Once it's done, I can use code to "Find" those headings I need and copy the columns to a new file. , then your first object should be to standardize the column headings for data which you want to capture. The most likely method to do that would be manually, since trying to program all of the variable possibilities does not seem practicable. Once you get the headers standardized, you could then use code to 'Find' those headers and copy the columns to a new excel file.

Any other ideas? Final work will be ultimately moved to SQL database.


Thanks
 

Excel Facts

Create a chart in one keystroke
Select the data and press Alt+F1 to insert a default chart. You can change the default chart to any chart type
Hi,

I agree with the manual fixing of the column headings. Exen if you use a program you will need to review the files anyway to see if it has made a good choice. So probably easier to just get that out of the way.

I am concerned about the idea of keeping the files separate when they are in the database. Databases like to keep similar things all together. I would recommend combining all the excel files into just one table. You will need to add an extra column for the attribute that will allow you to separate them virtually. So a column for a supplier key should be added. If you create an index on this column in the database then you will be able to display the data by supplier if you need to.

Also, I would be slightly surprised if your database supported grouping the data in any way.

Perhaps a recap on some differences between spreadsheets and databases might be in order. A database stores data in the order it receives it. That may well not be in the order you ever want to view it but that is no problem. Database systems split the storage part of the system from the inputting and reporting parts. Spreadsheets merge all three operations into one. That means that you are always trying to enter data into a spreadsheet in a way that will make data presentation easier. A database does not have that problem. Also, where in Excel you might use VLookup, in a database that kind of process is built-in. So you would not add the supplier name to each row just a code that points to another table that has suppliers and codes in it.

It sounds as if once the data is in the database then most of the work required will be reporting. That would include selecting by supplier, if necessary and grouping data on the fly.

At first a database will seem like a bit of a straight-jacket. You won't be able to just add a totals row to the data or type in that extra character if the field would go beyond its definition. However, as time goes by you will get to appreciate that the data will always be in the format that will work with the code you have written even if more data is added. The data storage format always remains fixed but the reporting can be as flexible as you need.
 
Upvote 0

Forum statistics

Threads
1,215,679
Messages
6,126,177
Members
449,296
Latest member
tinneytwin

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top