SUMX: Count unique records in a table (ignoring blanks blank rows)

circledchicken · Dec 8, 2012

Hi PowerPivot experts,

I am beginning to learn how to use PowerPivot and this question is based on the last 'Useful' example in this post:
SUMX() – The 5-point palm, exploding fxn technique « PowerPivotPro

I have a table named Table1 that looks like this:

	A	B
1	Product	Store
2	Apples	1
3	Oranges
4	Pears	1
5	Apples	1
6
7		2
8	Apples	1

Excel 2010

I have two measures defined as follows:

Count of Stores:
=DISTINCTCOUNT(Table1[Store])
UniqueCombinations:
=SUMX(DISTINCT(Table1[Product]), [Count of Stores])

In my PivotTable, the only element is the 'UniqueCombinations' measure in the Values section. Currently the value of the measure is 5 corresponding to these rows:

	A	B
1	Product	Store
2	Apples	1
3	Oranges
4	Pears	1
6
7		2

My question is, how do I adjust the measures to exclude any rows where all the corresponding columns are blank? I would like my UniqueCombinations measure to return 4 in this case (excluding row 6).

I know I could create a calculated column for example and then count non-blanks on that (using something like =COUNTAX( DISTINCT( Table1[CalcColumn] ), Table1[CalcColumn] ). However I would like to learn how to do this using purely measures if possible.

Many thanks for your time.

miguel.escobar · Dec 11, 2012

Welcome to the party, Alberto!

I was aiming at a formula without the need for the SUMX but the iteration would need to be row by row and the easiest way to do it seems to be the sumx

Code:

=SUMX(
                    SUMMARIZE(Table1,[Product],[Store],"Sum", 
                                    IF(CONCATENATE(Table1[Product],Table1[Store])=BLANK(),
                                                             BLANK(),1)
                                                             )
                     ,[Sum]
                    )

that should read as follows, I'd like to have a SUMX of the new table made with the SUMMARIZE on the colum "Sum" and this column has either the value 1 on the every record or 0 in the event that the combination of product&store is blank.

AlbertoFerrari · Dec 11, 2012

Miguel,
From a performance point of view, putting the IF inside SUMMARIZE would result in the formula engine kicking in during evaluation and do a lot of work, because the storage engine cannot resolve IF expression. Moreover, using CONCATENATE you are preventing the optimizer to understand what you are doing, using a simpler boolean condition would make the optimizer life much easier.
I have not looked at the query plan and, in my previous formula, I was more concerned with clarity than performance, but I guess that the top performer for such a formula would be something like this:

Code:

=SUMX (
    FILTER (
        CROSSJOIN (
            VALUES (Table1[Product]),
            VALUES (Table1[Store])
        ),
        NOT (
            ISBLANK ( Table1[Store] ) && 
            ISBLANK ( Table1[Product] )
        )
    ),
    CALCULATE (
        DISTINCTCOUNT ( Table1[Store] )
    )
)

This query does not use formula engine, apart from the filter on the CROSSJOIN, which should result in a small table, and relies on code that can be executed in VertiPaq. This has a significant effect on the usage of cache too. The drawback of this code is that it is much harder to read, until you used DAX for a while. I have to say that I read much better this latter formula than the previous one.
As I said, I have not checked the query plan, so I am not 100% sure about performance, but my guess is that this latter formula would perform much better than all of the previous ones.

Alberto

miguel.escobar · Dec 11, 2012

Alberto,

That formula just throw this party out the window! REALLY good approach and yes, I agree with you that this last formula it's much easier to read than the previous one and it's basically doing what I wanted to do with the summarize, which is joining both columns and make a distinct count of those without making a count of the result where store&product is blank.

Thanks, Alberto!
Miguel

Laurent C · Dec 13, 2012

My spider-sense triggers whenever I see a DISTINCTCOUNT within a SUMX.
If we only had these rows
Product Store
Apple 1
Pears 1
The sum of the distinct number of stores for product would be 2, even though there would only one store. EDIT: Would such a measure make sense?
@Alberto: your second formula does not quite match the first one you provided. Whereas it provides the same grand-totals and sub-totals, the result will be 1 for the case Product and Store are both blank.

AlbertoFerrari · Dec 13, 2012

Laurent, I agree with you that the formula does not seem to have a lot of sense, but it looks like what was requested.

And... yes, the border case of everything blank might be different, thanks for noting this. Anyway, I wanted to show a technique, then you need to adapt it to suite your specific needs and there are always big chances that results from my formulas are not perfect, because i seldom test them when writing answers. The interesting part of the formula is the filtering of a CROSSJOIN. The first time I saw this used by the dev team members I was amazed by the power of this feature.
A formula that has more sense is the following (not tested!):

Code:

=CALCULATE (
    DISTINCTCOUNT ( Table1[Store] ),
    FILTER (
        CROSSJOIN (
            VALUES (Table1[Product]),
            VALUES (Table1[Store])
        ),
        NOT (
            ISBLANK ( Table1[Store] ) && 
            ISBLANK ( Table1[Product] )
        )
    )
)

But it computes a different number... I don't really know the business rule underneath the request, it all depends from what the original user wanted.

Finally, it is worth for anybody using FILTER over a CROSSJOIN, to learn the KEEPFILTERS function too. It is a really hard function to digest, in my personal opinion it is the hardest function in DAX, not because of its semantic but because you need to learn when to use it. I wrote a blog post some months ago (Alberto Ferrari : KEEPFILTERS: a new DAX feature to correctly compute over arbitrary shaped sets) about KEEPFILTERS, but it only covers what I learned at that point. There are more considerations that I still had not time to put in a blog post but, roughly speaking, whenever you create a filter on more than one column and then you iterate over one of the columns, context transition can really drive you crazy and KEEPFILTERS is your (only) friend at that point.

Alberto

circledchicken · Dec 13, 2012

Wow! miguel.escobar, AlbertoFerrari and Laurent C - you are awesome!

Thankyou so much for your replies, comments and suggestions. I barely understand the DAX language at the moment and your explanations are very valuable and much appreciated.

AlbertoFerrari said:
But it computes a different number... I don't really know the business rule underneath the request, it all depends from what the original user wanted.

As per post 8, I was just trying to learn and experimenting with PowerPivot, and this was a problem I was unsure of how to solve. The underlying idea was basically:

I want a measure that returns the count of unique records in a table (in this case the two-column Product and Store table), ignoring any cases where both columns are blank. I wanted to do this with a measure, but some simple equivalent ways of expressing what I was trying to get are:

Using Advanced Filter for unique records only on the whole table in normal Excel, and then counting the resulting unique entries, excluding any resulting row where both column entries were blank.
Creating a new column where both columns in the table are concatenated, then doing a unique count on that excluding any resulting blank row.

I know there shouldn't be duplicate records in the dataset in the first place and I should get rid of them first, but was just wondering if there was a way of getting a DAX measure to do this count.

Alberto - both your suggestions (post #10 and post #12) seem to do exactly what I had asked. miguel - your latest one also seems to do exactly what I had asked.

Laurent - I couldn't quite figure out the case where Alberto's second formula (post #12) would deliver a different result from the first (post #10). I tried it on my example table, with a single row where the Product and Store columns were both blank and both of the suggested formulas returned nothing as expected. They also both returned the correct amounts for various other sample entries in my table.

Many thanks again to you all for your time and for helping me!

miguel.escobar · Dec 14, 2012

some people go by the motto
"Excel is Fun" - Excelisfun

but nowadays, people are using more:
"Powerpivot is FUN!" - Powerpivot users

I hope that our post encourage you to use more Powerpivot and you can be certain that this community is extremely active.

Laurent C · Dec 14, 2012

If you put Product and Store on rows the formula in post #10 will return 1 for the combination (blank, blank). The formula in post #12 returns blank.
As already said totals and sub-totals look the same, so they only differ at the detail level.
The following part in formula from post #10 seems to cause the unexpected results:
CALCULATE (
DISTINCTCOUNT ( Table1[Store] );
NOT ( ISBLANK ( Table1[Store] ))
)
It will return the expected result (blank) when applied to the combination (Oranges, blank), but will return 1 when applied to the (blank, blank) combination.
Strangely, previous formula and the following one will both return the same result (1), whenever Product is blank:
CALCULATE (
DISTINCTCOUNT ( Table1[Store] );
ISBLANK ( Table1[Store] )
)
EDIT: the last statement is not exactly true, if you had a 3rd column (like a RowID) you could have different results depending on whether this third column is in the pivot table or not. Glitch?

Laurent C · Dec 14, 2012

So, I made some more testing and found out what the reason is. If you had another row in your data with Product = blank and Store = 3, the following measure would return the result 2
CALCULATE (
DISTINCTCOUNT ( Table1[Store] );
NOT ( ISBLANK ( Table1[Store] ))
)
This clearly indicates what the problem is: NOT ( ISBLANK ( Table1[Store] )) will overwrite the current context filter not just add to it.

AlbertoFerrari · Dec 14, 2012

Pretty cool finding, you're right.

This should make the trick.

CALCULATE (
DISTINCTCOUNT ( Table1[Store] );
NOT ( ISBLANK ( Table1[Store] )),
VALUES ( Table1[Store )
)

I still believe the other formula is much better, at least it does not suffer from these context interactions.

SUMX: Count unique records in a table (ignoring blanks blank rows)

circledchicken

Well-known Member

miguel.escobar

Active Member

Excel Facts

AlbertoFerrari

New Member

miguel.escobar

Active Member

Laurent C

Board Regular

AlbertoFerrari

New Member

circledchicken

Well-known Member

miguel.escobar

Active Member

Laurent C

Board Regular

Laurent C

Board Regular

AlbertoFerrari

New Member

Similar threads

Forum statistics

Share this page

SUMX: Count unique records in a table (ignoring blanks blank rows)

Well-known Member

Active Member

Excel Facts

New Member

Active Member

Board Regular

New Member

Well-known Member

Active Member

Board Regular

Board Regular

New Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock