strip special characters from imported file


Posted by WK on October 24, 2000 5:15 PM

I recently downloaded a simple data file, labeling it a '.csv' file. When I open it in excel, the last name has leading '<&LT;B+&GT;>' and trailing '<&LT;B-&GT;>', with a colon folowing the last name. The data falls in consistant row/columns but in the wrong sequence. Is there any way to strip the special characters and colon and reformat the date into a data string that makes sense? Thanks for any help!

Posted by Loren on October 25, 2000 4:57 AM


Use find &LT;&GT; and replace with ""

Posted by WK on October 25, 2000 7:39 AM

Thanks for your help. Now how am I able to reformat the data? For example, the last name is at B2, but the first name is at A3. The address is at B4, the city/state/zip is at B5. The phone number is combined with the last name at B2, as in SMITH: 677-5845. I then have a blank row and the next record is displayed. My goal is to reformat in traditional excel column/row order. Thanks for any help you can give me!

Posted by Ben O. on October 25, 2000 9:09 AM

Okay, you've got some major cleanup work to do. Each record of yours spans several row, and you want them in one row. To do that, put this equation in E2: =IF(A3&LT;&GT;"",A3,"") and autofill it all the way down. That will take the first name for each record. Don't worry about the blank rows--those can be easily deleted later.

Okay, now get the rest of the info in row 2. In F2, use =B2 (last name), in G2 use =B4 (address), and in H2 use =B5 (city, state, ZIP). Autofill those formulas all the way down. Now, select columns E through I, select Copy, the select Edit > Paste Special > Values. This will replace the formulas with static values. Now, each of your records should have one line. The next step is to get rid of the rows you don't need anymore.

To do this you'll to use the Autofilter. But before you use the autofilter make you have a header row. So E1 should be first name, F1 should be last name, etc. Give your header row special formatting, such as bold text, so the autofilter recognizes it as your header row. Now highlight column E and select Data > Filter > Autofilter. A small arrow will appear by your first name header. Click on that arrow and chooses (Blanks) from the drop-down list. The rows you don't need will be diplayed. Select all these rows and delete them. Now, remove your filter by selecting Data > Filter > Autofilter again, and you should be left with only the records you need. You can go ahead and delete columns A through D.

To avoid this mess in the future, make sure that the text file you import only contains line breaks or carriage returns between records. In Microsoft Word, you can do a find/replace to replace unwanted line breaks with an asterisk or some other symbol that you can later use as a field delimiter in Excel.

Okay, if you made it this far, the rest is easy.

To separate the last name and the phone number, use Excel's Text-to-Columns feature. First, insert a new column to the right of column B. Then highlight column B and select Text-to-Columns from the Data menu. You'll be given the choice between fixed width and delimited. Select delimeted. In the next step you'll be able to choose what what you want use as your delimeter. Choose colon. This will separate the last name from the phone number.

To parse your city/state/zip data into different columns, use Text-to-Columns again. If there's a comma between the city and state, and a space between state and ZIP code, first use a comma as your delimeter, separating city from state and ZIP, then use a space as a delimeter for you state/ZIP column, separating those two. Just remember to have empty columns to the right of whatever column you're parsing so that Excel has somewhere to put the data.

I hope this helps,

-Ben

Posted by WK on October 25, 2000 3:49 PM

Ben - great response! Thanks for all the time you spent on this. I did find one other field that needs to be moved. Right underneath First-Name in A3, I find the first names of their children (A4). Of course, when I perform the copy to E2, it also copies the childrens names. I want to copy these to a new column as all the others. Is there a way to move the data rather that copy? That might be a better approach. Thanks again!

Posted by Ben O. on October 26, 2000 9:41 AM

For a formula to move data, it would have to copy the data and then delete it from the cell it copied it from. I don't think there are an Excel functions that do this. There's probably a way to do it in VBA, but that would be too complicated for me!

Having the childrens' names under the parent's name calls for a revision of my proposed solution. We'll have to modify the formula in column E, =IF(A3&LT;&GT;"",A3,""), so it only copies the parent's names. What is unique about the parents' names that an Excel function could recognize? From the looks of it, the cells above parent's names are blank. So we'll only copy the names in column A that are beneath blank cells.

Here's the formula you should put in E2:

=IF(AND(A3&LT;&GT;"",A2=""),A3,"")

Autofill it all the way down column E and it should take only the parent's names.

Now, to get the children's names in a new column is a little trickier. I take it that the number of kids for each person varies. So if you just take the names in column A that are not beneath blank cells, you'll have more children than parents. So, you'll have to make it so that column I takes the names of kids that are directly below their parent's name, column J takes the names of the kids that are two cells beneath their parents name, etc. It might take a little trial and error to figure out. Good luck,

-Ben

Posted by WK on October 27, 2000 10:08 AM

Having the childrens' names under the parent's name calls for a revision of my proposed solution. We'll have to modify the formula in column E, =IF(A3&LT;&GT;"",A3,""), so it only copies the parent's names. What is unique about the parents' names that an Excel function could recognize? From the looks of it, the cells above parent's names are blank. So we'll only copy the names in column A that are beneath blank cells. Here's the formula you should put in E2: =IF(AND(A3&LT;&GT;"",A2=""),A3,"") Autofill it all the way down column E and it should take only the parent's names. Now, to get the children's names in a new column is a little trickier. I take it that the number of kids for each person varies. So if you just take the names in column A that are not beneath blank cells, you'll have more children than parents. So, you'll have to make it so that column I takes the names of kids that are directly below their parent's name, column J takes the names of the kids that are two cells beneath their parents name, etc. It might take a little trial and error to figure out. Good luck,


Ben - it all works great, but for the familys that have the "kid" row, the record is 5 rows deep with info, whereas the 'non-kid' family is only 4 rows deep. As a result, the address and/or city get skewed. I have tried to do a row count and make an adjustment in the formula, but no luck. Thanks again.

WK



Posted by WK on October 29, 2000 6:44 PM

figured it out

Ben - I figured this out. THanks for all your help!