Thanks:  0
Likes:  0

# Thread: how to run regression on data including non-numeric data?

1. > Just did a small check: TREND produces the same seq of vals on a series that is scrambled.

That's not my finding. If after sorting the #N/A errors to bottom I substitute RAND() for ROW()+A1*0 I get different TREND() results.

JUST to make sure: My small checking was with a fully numeric series (no #N/A's).

>Why not substitute the mean of the series for the #N/A's

It's not the same. Try it! Bottom line: the order and magnitude of these values affect the regression line.

WELL, I didn't claim the substitution would not affect the regression line. Of course it will. The question is whether it be sound to make substitutions (missing value --> subst mean of the series governed by the 5% rule). I believe it is not unsound and statistically defensible.

IF interpreting your #N/A's as missing values is sound and you can live the 5% rule, I'd suggest using this missing value transformation.

2. On 2002-02-25 17:02, Mark W. wrote:
That was me above. My logon must have timed out...

I also wanted to point out that AVERAGE(A1:A5) where A1:A5 contains an #N/A error will produce #N/A.
This must be your first one. I already got a few anonymous postings... New trouble.

Who needs AVERAGE

Lets say that we have your example sample

{10;20;30;#N/A;40}

I create in B1:B5 by using (20% missing vals!)

=IF(ISNUMBER(A1),A1,SUMIF(\$A\$1:\$A\$5,"<>#N/A")/COUNTIF(\$A\$1:\$A\$5,"<>#N/A"))

so I get in B:

{10;20;30;25;40}

Applying

{=TREND(B1:B5)}

in C1:C5, I get:

{12;18.5;25;31.5;38}

PS. This thread will be confusing to others, I'm afraid.

3. I'm afraid we'll have to agree to disagree on this one. I stand by my original recommendation:

You could sort the #N/A errors to the bottom of your list and have the TREND() function ignore them. For example, suppose that A1:A5 contained {10;30;20;#N/A;40}. {=TREND(A1:A5)} entered into D1:D5 produces #VALUE! errors. Instead, enter the array formula...

{=TREND(A1:OFFSET(A1,SUM(ISNUMBER(A1:A5)+0)-1,))}

...into D1:D5. Next, enter the formula, =ROW()+A1*0, into B1 and fill down to B5. Now, sort on column B. Voila!! The #N/A errors are sorted to the bottom of the list and ignored by TREND().

[ This Message was edited by: Mark W. on 2002-02-26 06:33 ]

## User Tag List

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•