Page 1 of 2 12 LastLast
Results 1 to 10 of 13

how to run regression on data including non-numeric data?

This is a discussion on how to run regression on data including non-numeric data? within the Excel Questions forums, part of the Question Forums category; Input range contains non-numeric data such as "#N/A". How can I run regression without deleting these non-numeric data one by ...

  1. #1
    Guest

    Default

    Input range contains non-numeric data such as "#N/A". How can I run regression without deleting these non-numeric data one by one?

  2. #2
    MrExcel MVP
    Join Date
    Feb 2002
    Location
    Austin, Texas USA
    Posts
    11,654

    Default

    You could sort the #N/A errors to the bottom of your list and have the TREND() function ignore them. For example, suppose that A1:A5 contained {10;30;20;#N/A;40}. {=TREND(A1:A5)} entered into D1:D5 produces #VALUE! errors. Instead, enter the array formula...

    {=TREND(A1:OFFSET(A1,SUM(ISNUMBER(A1:A5)+0)-1,))}

    ...into D1:D5. Next, enter the formula, =ROW()+A1*0, into B1 and fill down to B5. Now, sort on column B. Voila!! The #N/A errors are sorted to the bottom of the list and ignored by TREND().

    [ This Message was edited by: Mark W. on 2002-02-25 15:36 ]

  3. #3
    MrExcel MVP Aladin Akyurek's Avatar
    Join Date
    Feb 2002
    Location
    The Hague, NL
    Posts
    65,817

    Default

    On 2002-02-25 15:30, Mark W. wrote:
    You could sort the #N/A errors to the bottom of your list and have the TREND() function ignore them. For example, suppose that A1:A5 contained {10;30;20;#N/A;40}. {=TREND(A1:A5)} entered into D1:D5 produces #VALUE! errors. Instead, enter the array formula...

    {=TREND(A1:OFFSET(A1,SUM(ISNUMBER(A1:A5)+0)-1,))}

    ...into D1:D5. Next, enter the formula, =ROW()+A1*0, into B1 and fill down to B5. Now, sort on column B. Voila!! The #N/A errors are sorted to the bottom of the list and ignored by TREND().

    [ This Message was edited by: Mark W. on 2002-02-25 15:36 ]
    How about:

    =TREND(IF(ISNUMBER(A1:A5),A1:A5))

    array-entered in D1:D5?

  4. #4
    MrExcel MVP
    Join Date
    Feb 2002
    Location
    Austin, Texas USA
    Posts
    11,654

    Default

    On 2002-02-25 15:39, Aladin Akyurek wrote:
    On 2002-02-25 15:30, Mark W. wrote:
    You could sort the #N/A errors to the bottom of your list and have the TREND() function ignore them. For example, suppose that A1:A5 contained {10;30;20;#N/A;40}. {=TREND(A1:A5)} entered into D1:D5 produces #VALUE! errors. Instead, enter the array formula...

    {=TREND(A1:OFFSET(A1,SUM(ISNUMBER(A1:A5)+0)-1,))}

    ...into D1:D5. Next, enter the formula, =ROW()+A1*0, into B1 and fill down to B5. Now, sort on column B. Voila!! The #N/A errors are sorted to the bottom of the list and ignored by TREND().

    [ This Message was edited by: Mark W. on 2002-02-25 15:36 ]
    How about:

    =TREND(IF(ISNUMBER(A1:A5),A1:A5))

    array-entered in D1:D5?
    Aladin, using my example =TREND(IF(ISNUMBER(A1:A5),A1:A5,)) would produce {10;30;20;0;40}, and a TREND result of {14;17;20;23;26} rather than {13;21;29;37}.

    [ This Message was edited by: Mark W. on 2002-02-25 15:50 ]

  5. #5
    MrExcel MVP Aladin Akyurek's Avatar
    Join Date
    Feb 2002
    Location
    The Hague, NL
    Posts
    65,817

    Default

    Mark,

    Even worse...

    Showeling down #N/A's by sorting and determining the range with OFFSET to which to apply TREND is a good idea.

    Addendum:

    By the way, although it boils down to the same thing, after sorting the data in A,in D1:D5 we can array-enter:

    =TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1))

    It just occurred to me: Does the sorting affect the results, statistically speaking?



    [ This Message was edited by: Aladin Akyurek on 2002-02-25 16:14 ]

  6. #6
    MrExcel MVP
    Join Date
    Feb 2002
    Location
    Austin, Texas USA
    Posts
    11,654

    Default

    On 2002-02-25 16:02, Aladin Akyurek wrote:
    Mark,

    Even worse...

    Showeling down #N/A's by sorting and determining the range with OFFSET to which to apply TREND is a good idea.

    Didn't quite follow your "Even worse..." statement.

    Duh, I get it now... you were referring to the TREND results.

    [ This Message was edited by: Mark W. on 2002-02-25 16:12 ]

  7. #7
    MrExcel MVP
    Join Date
    Feb 2002
    Location
    Austin, Texas USA
    Posts
    11,654

    Default

    On 2002-02-25 16:02, Aladin Akyurek wrote:
    Mark,

    Even worse...

    Showeling down #N/A's by sorting and determining the range with OFFSET to which to apply TREND is a good idea.

    Addendum:

    By the way, although it boils down to the same thing, after sorting the data in A,in D1:D5 we can array-enter:

    =TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1))

    It just occurred to me: Does the sorting affect the results, statistically speaking?



    [ This Message was edited by: Aladin Akyurek on 2002-02-25 16:14 ]
    >Does the sorting affect the results, statistically speaking?

    This approach is no different the deleting the #N/A errors 1 by 1. In effect #N/A in this context means "no data". Since the original order of the data is unaffected by my approach I don't see how this will have a statistical impact. =TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1)) works fine too.

    [ This Message was edited by: Mark W. on 2002-02-25 16:22 ]

  8. #8
    MrExcel MVP Aladin Akyurek's Avatar
    Join Date
    Feb 2002
    Location
    The Hague, NL
    Posts
    65,817

    Default

    On 2002-02-25 16:19, Mark W. wrote:
    On 2002-02-25 16:02, Aladin Akyurek wrote:
    Mark,

    Even worse...

    Showeling down #N/A's by sorting and determining the range with OFFSET to which to apply TREND is a good idea.

    Addendum:

    By the way, although it boils down to the same thing, after sorting the data in A,in D1:D5 we can array-enter:

    =TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1))

    It just occurred to me: Does the sorting affect the results, statistically speaking?



    [ This Message was edited by: Aladin Akyurek on 2002-02-25 16:14 ]
    >Does the sorting affect the results, statistically speaking?

    This approach is no different the deleting the #N/A errors 1 by 1. In effect #N/A in this context means "no data". Since the original order of the data is unaffected by my approach I don't see how this will have a statistical impact. =TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1)) works fine too.

    [ This Message was edited by: Mark W. on 2002-02-25 16:22 ]
    Just did a small check: TREND produces the same seq of vals on a series that is scrambled.

    Another thing: Why not substitute the mean of the series for the #N/A's (for the missing values, so to speak) if these values do not exceed, say, 5% of the series?

    If this is sound, an additional column can be created using

    =IF(ISNUMBER(A1),A1,AVERAGE(A1:A5))

    which allows to apply TREND to the whole set as one would do ordinarily.

    What do you think?

  9. #9
    Guest

    Default

    On 2002-02-25 16:31, Aladin Akyurek wrote:
    On 2002-02-25 16:19, Mark W. wrote:
    On 2002-02-25 16:02, Aladin Akyurek wrote:
    Mark,

    Even worse...

    Showeling down #N/A's by sorting and determining the range with OFFSET to which to apply TREND is a good idea.

    Addendum:

    By the way, although it boils down to the same thing, after sorting the data in A,in D1:D5 we can array-enter:

    =TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1))

    It just occurred to me: Does the sorting affect the results, statistically speaking?



    [ This Message was edited by: Aladin Akyurek on 2002-02-25 16:14 ]
    >Does the sorting affect the results, statistically speaking?

    This approach is no different the deleting the #N/A errors 1 by 1. In effect #N/A in this context means "no data". Since the original order of the data is unaffected by my approach I don't see how this will have a statistical impact. =TREND(OFFSET(A1,0,0,MATCH(9.99999999999999E+307,A:A),1)) works fine too.

    [ This Message was edited by: Mark W. on 2002-02-25 16:22 ]
    Just did a small check: TREND produces the same seq of vals on a series that is scrambled.

    Another thing: Why not substitute the mean of the series for the #N/A's (for the missing values, so to speak) if these values do not exceed, say, 5% of the series?

    If this is sound, an additional column can be created using

    =IF(ISNUMBER(A1),A1,AVERAGE(A1:A5))

    which allows to apply TREND to the whole set as one would do ordinarily.

    What do you think?
    > Just did a small check: TREND produces the same seq of vals on a series that is scrambled.

    That's not my finding. If after sorting the #N/A errors to bottom I substitute RAND() for ROW()+A1*0 I get different TREND() results.

    >Why not substitute the mean of the series for the #N/A's

    It's not the same. Try it! Bottom line: the order and magnitude of these values affect the regression line.

  10. #10
    MrExcel MVP
    Join Date
    Feb 2002
    Location
    Austin, Texas USA
    Posts
    11,654

    Default

    That was me above. My logon must have timed out...

    I also wanted to point out that AVERAGE(A1:A5) where A1:A5 contains an #N/A error will produce #N/A.

Page 1 of 2 12 LastLast

Like this thread? Share it with others

Like this thread? Share it with others

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


DMCA.com