Cleaning up data - Fixed width + page print format

vgr

New Member
Joined
May 25, 2012
Messages
15
I recently landed up getting a whole lot of data that I wished was in a csv format. It looks like the data is ready to be printed - fixed width, page number, report name, subtotal on each page... the works.. something from my worst nightmare...
It would be great if you could possibly tell me how to tackle cleaning up this data. How do I remove blank rows, rows that do not basically fall in line with the headers...
 

Excel Facts

Square and cube roots
The =SQRT(25) is a square root. For a cube root, use =125^(1/3). For a fourth root, use =625^(1/4).

RonB1111

Well-known Member
Joined
Nov 28, 2011
Messages
2,277
While each report is different, here's some ideas:
1. First step: Add a column filled with sequential numbers so you can get easily the data back to the original sort order if necessary.
2. Usually you can find something unique about each header row, so you may be able to autofilter out some of the header rows.
3. If Autofilter doesn't clean out all unwanted rows, use a blank column and a formula to identify such rows and then autofilter or sort out those rows.
4. You didn't say if the data loads into excel entirely in 1 column or multiple columns. Use Data-TextToColumns to split out into columns if necessary.

If you post 2 pages of sample data and explain whether it loads into 1 or multiple cells, it would be easier to offer more specific suggestions.
 

RonB1111

Well-known Member
Joined
Nov 28, 2011
Messages
2,277
1. If you wish to use Excel, the file would need to be split into 7 files each with no more than 1 million lines. I've done this before but only with files less than 2 million lines. Opened in Wordpad, deleted half the rows, then saved it as a new file named ...1of2. Then reopened the original large file, deleted the other half of the rows and saved it as ....2of2. Don't know if wordpad would handle such a large file, but there may be other text editors that can.

2. I used Excel 2013 to open your sample txt file and it automatically split most into separate columns, the exception being the 6 columns beginning with CaseNo and ending with STMvalue which it placed in a single column. You could split these with Excels Text to Columns, but 1st do items 3 & 4 below.

3. The Courier Name column seems to be the only column that wraps to a 2nd row in some cases (but not all). After I numbered all rows using a blank column on the right (ie Fill - series), I inserted a row above the 1st row and then copied a header line into row 1. I then inserted a column after Courier name and entered this formula in N9 and copied down so that the full courier name would be on the main row:
Code:
=IF(AND(M10<>"",L10=""),M9&" "&M10,M9)
.

4. Then turned on autofilters to filter out the dashed lines, header rows and blanks in Col I.

5. That left data rows and subtotals beneath them. I couldn't figure out what determines the rows belonging to each subtotal. If you can, then you probably don't need the subtotals and can filter those out. If you can't, then the sequential line numbering comes in handy.

I suspect even after doing this you'll still have well over 1 million rows of data so it won't fit into a single excel worksheet. However once you've cleaned out unnecessary rows, you could upload each file to a MSAccess database for further analysis.

It's definitely a time consuming project to do it this way. Certainly worth investigating if the provider of the file could deliver it in another format. Also worth looking into a custom script to directly edit the text file.
 

Forum statistics

Threads
1,143,839
Messages
5,721,096
Members
422,339
Latest member
SHIVATVM

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top