Format converted data for analyzing

Apprentice

New Member
Joined
Apr 24, 2010
Messages
6

I recently converted 20 or so pdf files from nitro pdf to excel for puposes of analyzing the data. My main objective is to count how many occurences there are of certian words from a list i created. The issue I'm having is all the data after it's been converted is inconistent so my 'countif" function is not recognizing all the occurrences. Is there a way for me to alter the entire data set so that there are no irregularities and my function can do its job with complete accuracy?

Please help
 

Excel Facts

What do {} around a formula in the formula bar mean?
{Formula} means the formula was entered using Ctrl+Shift+Enter signifying an old-style array formula.
Apprentice, what irregularities do you have? Post examples of the inconsistencies and we'll have a crack at it.
 
Upvote 0
Hello milesUK,

Thanks for the help.

Well, the main issue after converting the data from pdf to excel is most of the text looks like this:
<TABLE style="WIDTH: 95pt; BORDER-COLLAPSE: collapse" cellSpacing=0 cellPadding=0 width=127 border=0 x:str><COLGROUP><COL style="WIDTH: 95pt; mso-width-source: userset; mso-width-alt: 4644" width=127><TBODY><TR style="HEIGHT: 30pt" height=40><TD class=xl63 style="BORDER-RIGHT: windowtext 0.5pt solid; BORDER-TOP: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 95pt; BORDER-BOTTOM: windowtext 0.5pt solid; HEIGHT: 30pt; BACKGROUND-COLOR: transparent" width=127 height=40>Test
</TD></TR></TBODY></TABLE>
instead of this:

<TABLE style="WIDTH: 95pt; BORDER-COLLAPSE: collapse" cellSpacing=0 cellPadding=0 width=127 border=0 x:str><COLGROUP><COL style="WIDTH: 95pt; mso-width-source: userset; mso-width-alt: 4644" width=127><TBODY><TR style="HEIGHT: 15pt" height=20><TD class=xl65 style="BORDER-RIGHT: windowtext 0.5pt solid; BORDER-TOP: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 95pt; BORDER-BOTTOM: windowtext 0.5pt solid; HEIGHT: 15pt; BACKGROUND-COLOR: transparent" width=127 height=20>Test</TD></TR></TBODY></TABLE>

The space below the text throws off the "countif" function and does not include that text in the count. Is there a way to format the entire data to look like the one or the other?

The second issue is a bit more elusive. Even though the text and cell formatting look exactly the same, the countif function still does not recognize the text in the cell. I literally have to copy one text a time from my list of criteria to the cell reference (range of date) attached to the formula so the countif function will add it to the count.

That's why I'm looking for a way to completely format the data range to match the criteria list without having to manually edit, which takes forever!

Thanks for any help you can provide.

apprentice
 
Upvote 0
Some things to try:
1. The TRIM function will remove any unneeded space characters that may be there.
2. The CLEAN function will remove all non-printable characters from text.
3. If these don't solve the problem, then try a FIND and REPLACE by copying one of your criteria, then place it in the "Replace With" box while having one of your imported items in the "Find What?" Box. (that may have been stated backwards, but my point is that you must have all things equal for this to work).
4. One last thing: There is a pesky character that is not visible, but may be lurking nonetheless. Do a Find And Replace specifically for the following character: ALT+0160 that is, Find: (Hold down the ALT key while pressing ON NUMERIC KEYPAD only... 0 1 6 0) and replace with... nothing!
Hope this gets you back on track.
Larry.
 
Upvote 0
The "clean text" and "trim" functions worked GREAT!!!! There were only 100 cells total out of the entire 1500 cell spread sheet that weren't inlcuded in the countIf. The remaining 100 were either mispelled or had special characters that needed to be edited.

Thank you so much! You're a life saver. Excel and the people who know how to maximize it are a beautiful thing.


Thanks again!
 
Upvote 0
indiantrix, thank you also. I would have suggested TRIM to remove leading and trailing spaces but was not aware of CLEAN nor the 0160 character. :)
 
Upvote 0

Forum statistics

Threads
1,214,515
Messages
6,119,970
Members
448,933
Latest member
Bluedbw

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top