Which histogram formula is correct?

macfuller

Board Regular
Joined
Apr 30, 2014
Messages
242
Poking around on the web I find various formulas for creating histograms. I'm getting different results for each formula and I want to know which one is accurate.

I have a table of purchase orders and I want to get the distribution of how many POs are in various dollar ranges (e.g. how many are for between $100 and $500). In the Orders table each purchase order can have multiple lines, which add up to the total value of the purchase order. So I need to sum [Extended Merch Amt] for all the lines with the same Orders[PO No.] value to get the full PO value. My base measures are:
Code:
PO Spend:=SUM ( Orders[Extended Merch Amt] )
Code:
PO Count:=DISTINCTCOUNT ( Orders[PO No.] )
The competing histogram measures are:
Code:
PO Count Distribution:=CALCULATE (    [PO Count],
    FILTER (
        Orders,
        AND (
            [PO Spend] >= MIN ( tblDollarRanges[Min] ),
            [PO Spend] < MAX ( tblDollarRanges[Max] )
        )
    )
)
Code:
PO Distribution Count:=CALCULATE (    [PO Count],
    FILTER (
        VALUES ( Orders[PO No.] ),
        COUNTROWS (
            FILTER (
                tblDollarRanges,
                [PO Spend] >= tblDollarRanges[Min]
                    && [PO Spend] < tblDollarRanges[Max]
            )
        )
    )
)
And my results are:
LabelPO Count DistributionPO Distribution Count
Up to $1089,4267,557
$10 to $50133,53634,782
$50 to $10099,60432,989
$100 to $500144,745112,692
$500 to $1,00050,10350,861
$1,000 to $5,00055,33265,079
$5,000 to $10,0008,85112,483
$10,000 to $50,0006,5339,261
$50,000 to $100,0009431,100
$100,000 to $500,0008281,021
$500,000 to $1 million107125
$1 million to $5 million116121
$5 million to $10 million2222
$10 million to $100 million99
$100 million +11
Grand Total328,103328,103

<tbody>
</tbody>

The rows for PO Count Distribution add up to more than the Grand Total so I suspect the PO Distribution Count is the accurate one since I'm doing the VALUES(Orders[PO No.]) but I like the simplicity of PO Count Distribution and am wondering if there's a way to make it work better. Any insight as to why they work as they do, and if there's a simpler solution? Thanks.
 
Last edited:

Matt Allington

MrExcel MVP
Joined
Dec 18, 2014
Messages
1,189
The first one is counting the line level items, not the aggregation of all lines in a PO. Have you tried taking the first one, and replace “orders” with “values(orders[PO no.])”
 

Forum statistics

Threads
1,077,737
Messages
5,335,914
Members
399,057
Latest member
mgabr

Some videos you may like

This Week's Hot Topics

Top