Error Checking in Excel
Thanks Thanks:  0
Likes Likes:  0
Page 2 of 2 FirstFirst 12
Results 11 to 13 of 13

Thread: how to run regression on data including non-numeric data?

  1. #11
    MrExcel MVP Aladin Akyurek's Avatar
    Join Date
    Feb 2002
    Location
    The Hague, NL
    Posts
    84,038
    Post Thanks / Like
    Mentioned
    37 Post(s)
    Tagged
    7 Thread(s)

    Default


    > Just did a small check: TREND produces the same seq of vals on a series that is scrambled.

    That's not my finding. If after sorting the #N/A errors to bottom I substitute RAND() for ROW()+A1*0 I get different TREND() results.

    JUST to make sure: My small checking was with a fully numeric series (no #N/A's).

    >Why not substitute the mean of the series for the #N/A's

    It's not the same. Try it! Bottom line: the order and magnitude of these values affect the regression line.

    WELL, I didn't claim the substitution would not affect the regression line. Of course it will. The question is whether it be sound to make substitutions (missing value --> subst mean of the series governed by the 5% rule). I believe it is not unsound and statistically defensible.

    IF interpreting your #N/A's as missing values is sound and you can live the 5% rule, I'd suggest using this missing value transformation.

    Aladin

  2. #12
    MrExcel MVP Aladin Akyurek's Avatar
    Join Date
    Feb 2002
    Location
    The Hague, NL
    Posts
    84,038
    Post Thanks / Like
    Mentioned
    37 Post(s)
    Tagged
    7 Thread(s)

    Default

    On 2002-02-25 17:02, Mark W. wrote:
    That was me above. My logon must have timed out...

    I also wanted to point out that AVERAGE(A1:A5) where A1:A5 contains an #N/A error will produce #N/A.
    This must be your first one. I already got a few anonymous postings... New trouble.

    Who needs AVERAGE

    Lets say that we have your example sample

    {10;20;30;#N/A;40}

    I create in B1:B5 by using (20% missing vals!)

    =IF(ISNUMBER(A1),A1,SUMIF($A$1:$A$5,"<>#N/A")/COUNTIF($A$1:$A$5,"<>#N/A"))

    so I get in B:

    {10;20;30;25;40}

    Applying

    {=TREND(B1:B5)}

    in C1:C5, I get:

    {12;18.5;25;31.5;38}

    This doesn't look too bad!

    PS. This thread will be confusing to others, I'm afraid.


  3. #13
    MrExcel MVP
    Join Date
    Feb 2002
    Location
    Austin, Texas USA
    Posts
    11,654
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default

    I'm afraid we'll have to agree to disagree on this one. I stand by my original recommendation:

    You could sort the #N/A errors to the bottom of your list and have the TREND() function ignore them. For example, suppose that A1:A5 contained {10;30;20;#N/A;40}. {=TREND(A1:A5)} entered into D1:D5 produces #VALUE! errors. Instead, enter the array formula...

    {=TREND(A1:OFFSET(A1,SUM(ISNUMBER(A1:A5)+0)-1,))}

    ...into D1:D5. Next, enter the formula, =ROW()+A1*0, into B1 and fill down to B5. Now, sort on column B. Voila!! The #N/A errors are sorted to the bottom of the list and ignored by TREND().

    [ This Message was edited by: Mark W. on 2002-02-26 06:33 ]

Some videos you may like

User Tag List

Like this thread? Share it with others

Like this thread? Share it with others

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •