Poking around on the web I find various formulas for creating histograms. I'm getting different results for each formula and I want to know which one is accurate.
I have a table of purchase orders and I want to get the distribution of how many POs are in various dollar ranges (e.g. how many are for between $100 and $500). In the Orders table each purchase order can have multiple lines, which add up to the total value of the purchase order. So I need to sum [Extended Merch Amt] for all the lines with the same Orders[PO No.] value to get the full PO value. My base measures are:
The competing histogram measures are:
And my results are:
<tbody>
</tbody>
The rows for PO Count Distribution add up to more than the Grand Total so I suspect the PO Distribution Count is the accurate one since I'm doing the VALUES(Orders[PO No.]) but I like the simplicity of PO Count Distribution and am wondering if there's a way to make it work better. Any insight as to why they work as they do, and if there's a simpler solution? Thanks.
I have a table of purchase orders and I want to get the distribution of how many POs are in various dollar ranges (e.g. how many are for between $100 and $500). In the Orders table each purchase order can have multiple lines, which add up to the total value of the purchase order. So I need to sum [Extended Merch Amt] for all the lines with the same Orders[PO No.] value to get the full PO value. My base measures are:
Code:
PO Spend:=SUM ( Orders[Extended Merch Amt] )
Code:
PO Count:=DISTINCTCOUNT ( Orders[PO No.] )
The competing histogram measures are:
Code:
PO Count Distribution:=CALCULATE ( [PO Count],
FILTER (
Orders,
AND (
[PO Spend] >= MIN ( tblDollarRanges[Min] ),
[PO Spend] < MAX ( tblDollarRanges[Max] )
)
)
)
Code:
PO Distribution Count:=CALCULATE ( [PO Count],
FILTER (
VALUES ( Orders[PO No.] ),
COUNTROWS (
FILTER (
tblDollarRanges,
[PO Spend] >= tblDollarRanges[Min]
&& [PO Spend] < tblDollarRanges[Max]
)
)
)
)
And my results are:
Label | PO Count Distribution | PO Distribution Count |
Up to $10 | 89,426 | 7,557 |
$10 to $50 | 133,536 | 34,782 |
$50 to $100 | 99,604 | 32,989 |
$100 to $500 | 144,745 | 112,692 |
$500 to $1,000 | 50,103 | 50,861 |
$1,000 to $5,000 | 55,332 | 65,079 |
$5,000 to $10,000 | 8,851 | 12,483 |
$10,000 to $50,000 | 6,533 | 9,261 |
$50,000 to $100,000 | 943 | 1,100 |
$100,000 to $500,000 | 828 | 1,021 |
$500,000 to $1 million | 107 | 125 |
$1 million to $5 million | 116 | 121 |
$5 million to $10 million | 22 | 22 |
$10 million to $100 million | 9 | 9 |
$100 million + | 1 | 1 |
Grand Total | 328,103 | 328,103 |
<tbody>
</tbody>
The rows for PO Count Distribution add up to more than the Grand Total so I suspect the PO Distribution Count is the accurate one since I'm doing the VALUES(Orders[PO No.]) but I like the simplicity of PO Count Distribution and am wondering if there's a way to make it work better. Any insight as to why they work as they do, and if there's a simpler solution? Thanks.
Last edited: