# how to run regression on data including non-numeric data?

This is a discussion on how to run regression on data including non-numeric data? within the Excel Questions forums, part of the Question Forums category; Input range contains non-numeric data such as "#N/A". How can I run regression without deleting these non-numeric data one by ...

1. Input range contains non-numeric data such as "#N/A". How can I run regression without deleting these non-numeric data one by one?

2. You could sort the #N/A errors to the bottom of your list and have the TREND() function ignore them. For example, suppose that A1:A5 contained {10;30;20;#N/A;40}. {=TREND(A1:A5)} entered into D1:D5 produces #VALUE! errors. Instead, enter the array formula...

{=TREND(A1:OFFSET(A1,SUM(ISNUMBER(A1:A5)+0)-1,))}

...into D1:D5. Next, enter the formula, =ROW()+A1*0, into B1 and fill down to B5. Now, sort on column B. Voila!! The #N/A errors are sorted to the bottom of the list and ignored by TREND().

[ This Message was edited by: Mark W. on 2002-02-25 15:36 ]

3. On 2002-02-25 15:30, Mark W. wrote:
You could sort the #N/A errors to the bottom of your list and have the TREND() function ignore them. For example, suppose that A1:A5 contained {10;30;20;#N/A;40}. {=TREND(A1:A5)} entered into D1:D5 produces #VALUE! errors. Instead, enter the array formula...

{=TREND(A1:OFFSET(A1,SUM(ISNUMBER(A1:A5)+0)-1,))}

...into D1:D5. Next, enter the formula, =ROW()+A1*0, into B1 and fill down to B5. Now, sort on column B. Voila!! The #N/A errors are sorted to the bottom of the list and ignored by TREND().

[ This Message was edited by: Mark W. on 2002-02-25 15:36 ]

=TREND(IF(ISNUMBER(A1:A5),A1:A5))

array-entered in D1:D5?

4. On 2002-02-25 15:39, Aladin Akyurek wrote:
On 2002-02-25 15:30, Mark W. wrote:
You could sort the #N/A errors to the bottom of your list and have the TREND() function ignore them. For example, suppose that A1:A5 contained {10;30;20;#N/A;40}. {=TREND(A1:A5)} entered into D1:D5 produces #VALUE! errors. Instead, enter the array formula...

{=TREND(A1:OFFSET(A1,SUM(ISNUMBER(A1:A5)+0)-1,))}

...into D1:D5. Next, enter the formula, =ROW()+A1*0, into B1 and fill down to B5. Now, sort on column B. Voila!! The #N/A errors are sorted to the bottom of the list and ignored by TREND().

[ This Message was edited by: Mark W. on 2002-02-25 15:36 ]

=TREND(IF(ISNUMBER(A1:A5),A1:A5))

array-entered in D1:D5?
Aladin, using my example =TREND(IF(ISNUMBER(A1:A5),A1:A5,)) would produce {10;30;20;0;40}, and a TREND result of {14;17;20;23;26} rather than {13;21;29;37}.

[ This Message was edited by: Mark W. on 2002-02-25 15:50 ]

5. Mark,

Even worse...

Showeling down #N/A's by sorting and determining the range with OFFSET to which to apply TREND is a good idea.

By the way, although it boils down to the same thing, after sorting the data in A,in D1:D5 we can array-enter:

=TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1))

It just occurred to me: Does the sorting affect the results, statistically speaking?

[ This Message was edited by: Aladin Akyurek on 2002-02-25 16:14 ]

6. On 2002-02-25 16:02, Aladin Akyurek wrote:
Mark,

Even worse...

Showeling down #N/A's by sorting and determining the range with OFFSET to which to apply TREND is a good idea.

Duh, I get it now... you were referring to the TREND results.

[ This Message was edited by: Mark W. on 2002-02-25 16:12 ]

7. On 2002-02-25 16:02, Aladin Akyurek wrote:
Mark,

Even worse...

Showeling down #N/A's by sorting and determining the range with OFFSET to which to apply TREND is a good idea.

By the way, although it boils down to the same thing, after sorting the data in A,in D1:D5 we can array-enter:

=TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1))

It just occurred to me: Does the sorting affect the results, statistically speaking?

[ This Message was edited by: Aladin Akyurek on 2002-02-25 16:14 ]
>Does the sorting affect the results, statistically speaking?

This approach is no different the deleting the #N/A errors 1 by 1. In effect #N/A in this context means "no data". Since the original order of the data is unaffected by my approach I don't see how this will have a statistical impact. =TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1)) works fine too.

[ This Message was edited by: Mark W. on 2002-02-25 16:22 ]

8. On 2002-02-25 16:19, Mark W. wrote:
On 2002-02-25 16:02, Aladin Akyurek wrote:
Mark,

Even worse...

Showeling down #N/A's by sorting and determining the range with OFFSET to which to apply TREND is a good idea.

By the way, although it boils down to the same thing, after sorting the data in A,in D1:D5 we can array-enter:

=TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1))

It just occurred to me: Does the sorting affect the results, statistically speaking?

[ This Message was edited by: Aladin Akyurek on 2002-02-25 16:14 ]
>Does the sorting affect the results, statistically speaking?

This approach is no different the deleting the #N/A errors 1 by 1. In effect #N/A in this context means "no data". Since the original order of the data is unaffected by my approach I don't see how this will have a statistical impact. =TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1)) works fine too.

[ This Message was edited by: Mark W. on 2002-02-25 16:22 ]
Just did a small check: TREND produces the same seq of vals on a series that is scrambled.

Another thing: Why not substitute the mean of the series for the #N/A's (for the missing values, so to speak) if these values do not exceed, say, 5% of the series?

If this is sound, an additional column can be created using

=IF(ISNUMBER(A1),A1,AVERAGE(A1:A5))

which allows to apply TREND to the whole set as one would do ordinarily.

What do you think?

9. On 2002-02-25 16:31, Aladin Akyurek wrote:
On 2002-02-25 16:19, Mark W. wrote:
On 2002-02-25 16:02, Aladin Akyurek wrote:
Mark,

Even worse...

Showeling down #N/A's by sorting and determining the range with OFFSET to which to apply TREND is a good idea.

By the way, although it boils down to the same thing, after sorting the data in A,in D1:D5 we can array-enter:

=TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1))

It just occurred to me: Does the sorting affect the results, statistically speaking?

[ This Message was edited by: Aladin Akyurek on 2002-02-25 16:14 ]
>Does the sorting affect the results, statistically speaking?

This approach is no different the deleting the #N/A errors 1 by 1. In effect #N/A in this context means "no data". Since the original order of the data is unaffected by my approach I don't see how this will have a statistical impact. =TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1)) works fine too.

[ This Message was edited by: Mark W. on 2002-02-25 16:22 ]
Just did a small check: TREND produces the same seq of vals on a series that is scrambled.

Another thing: Why not substitute the mean of the series for the #N/A's (for the missing values, so to speak) if these values do not exceed, say, 5% of the series?

If this is sound, an additional column can be created using

=IF(ISNUMBER(A1),A1,AVERAGE(A1:A5))

which allows to apply TREND to the whole set as one would do ordinarily.

What do you think?
> Just did a small check: TREND produces the same seq of vals on a series that is scrambled.

That's not my finding. If after sorting the #N/A errors to bottom I substitute RAND() for ROW()+A1*0 I get different TREND() results.

>Why not substitute the mean of the series for the #N/A's

It's not the same. Try it! Bottom line: the order and magnitude of these values affect the regression line.

10. That was me above. My logon must have timed out...

I also wanted to point out that AVERAGE(A1:A5) where A1:A5 contains an #N/A error will produce #N/A.

Page 1 of 2 12 Last

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•