PowerPivot model: Duplicate rows adding up to over inflated total.

HarryR1 · Dec 8, 2014

Hi all
I am new to PowerPivot and data models, and I am querying an SSAS cube in a PowerPivot data model. I have a problem with duplicate names inflating figures. The duplicates come from a Project Server SSAS cube.

Row	Project Name	Cost
1	1004 - Parent Project	£500
2	1004 - Sub Project	£200
3	1004 - Sub Project with same name as Parent Project	£300

Rows 2 and 3 are both sub projects that make up the whole project that is totalled up in row 1. However, when looking a the results in PowerPivot, the model is adding all these three rows separately, producing a total figure of £1000.
How can I get the model to a)recognise the hierarchy from Project Server. it does if querying the cube directly in excel into a pivot chart but not in a PowerPivot model.
Is there a DAX or MDX calculation I can use to only select or count the highest figure of any duplicates rows with the same name?
I understand it's a pretty complex question but hoping someone can help.
Thanks in advance.
Harry

scottsen · Dec 8, 2014

So, in the your sample, there is just NOTHING that makes it clear there is a parent/child relationship between those rows. You have two choices... (at least

)
1) What you are referencing... create a calculated column which only has a value if it's the highest value of its grouping.
2) actually bring in some parent/child info. (You seem to have a Row# column is there a ParentRow you can bring in?)

the first should generally be easier, but ... does have some risks. Negative #'s could really mess w/ you and ... a parent w/ just 1 child... both would have the same value. Need to be careful not to include both in that scenario. (So, logic that is like... "only include this node if its the largest in the set..." would typically include both which isn't what you want).

We can probably sleeze our way around the 2nd issue.... the negative issues seems harder.
Parent 500
Kid1 600
Kid2 -100

We would choose Kid1, since it is largest.

Is that an issue for you?

HarryR1 · Dec 8, 2014

scottsen said:
So, in the your sample, there is just NOTHING that makes it clear there is a parent/child relationship between those rows. You have two choices... (at least )
1) What you are referencing... create a calculated column which only has a value if it's the highest value of its grouping.
2) actually bring in some parent/child info. (You seem to have a Row# column is there a ParentRow you can bring in?)

the first should generally be easier, but ... does have some risks. Negative #'s could really mess w/ you and ... a parent w/ just 1 child... both would have the same value. Need to be careful not to include both in that scenario. (So, logic that is like... "only include this node if its the largest in the set..." would typically include both which isn't what you want).

We can probably sleeze our way around the 2nd issue.... the negative issues seems harder.
Parent 500
Kid1 600
Kid2 -100

We would choose Kid1, since it is largest.

Is that an issue for you?

scottsen

Thanks for your reply.

The row numbers are just for illustrative purposes, there actually isn't anything else apart from the project title that identifies a project (lazy cube work in my opinion).

Creating a calculated column that only picks the highest of the rows with the same name sounds ideal, except I don't know yet how to write a formula that compares the names of a rows, identifies duplicates and only shows the value of the highest one in the set.

Could you point me in the direction of a function/formula or resource area where I can research how to do it?

Thanks

scottsen · Dec 8, 2014

Yep, I can help there

So, no issues w/ negative numbers or "just 1 sub-project" to worry about?

Something like:

Code:

=MyTable[Cost] = 
     CALCULATE(MAX(MyTable[Cost]), ALL(MyTable), MyTable[Project Name] = EARLIER(MyTable[Project Name]))

Kazlik · Dec 10, 2014

I had a similar issue last week with an event log and duplicate values for coverage_id's. I made the two Measures below.

MaxTime:=MAX(Events[TimingMinutes])
DistinctAvgTimeMinutes:=AVERAGEX(DISTINCT(Events[coverage_id]),[MaxTime])

HarryR1 · Dec 15, 2014

scottsen said:
Yep, I can help there

So, no issues w/ negative numbers or "just 1 sub-project" to worry about?

Something like:

Code:

=MyTable[Cost] = CALCULATE(MAX(MyTable[Cost]), ALL(MyTable), MyTable[Project Name] = EARLIER(MyTable[Project Name]))

Thanks a lot

I am going to give those a whirl and will update the thread with my findings. Thanks again.

HarryR1 · Dec 15, 2014

scottsen said:
Yep, I can help there

So, no issues w/ negative numbers or "just 1 sub-project" to worry about?

Something like:

Code:

=MyTable[Cost] = CALCULATE(MAX(MyTable[Cost]), ALL(MyTable), MyTable[Project Name] = EARLIER(MyTable[Project Name]))

Hi Scottsen

That worked great in that it marked a 'true' next to unique names and a 'false' next to repating project names with a lower value.

Now I just need to call these into another table but only the ones that are marked as true. Could you advise me how to do that?

BTW: This has inspired me to dive into learning as much DAX as I can

.

Thanks

Harry

HarryR1 · Dec 15, 2014

Thanks Kazlik.

HarryR1 · Dec 15, 2014

To be more specific:

I am using this solution in a table that acts as a lookup for a column in another table.

How can I amend the lookup in my target table's calculated column to only fetch the value from the project name marked with a 'True' value?

Lookup Table.
Project A: 100
Project A: 50

Target Table.
Project A: 100

Thanks again.

scottsen · Dec 15, 2014

I get pretty nervous when you start copying data around between tables. Sooome reference to that in my most recent blog post... it's a vlookup/old-skool-excel thing to do.

The theory is that you just need to apply a filter on that new column... to only include the ones that are marked true. The application of said theoy will depend on what you are trying to do

LOOKUPVALUE will probaby work for you. It can take multiple columns as input... one of which would be the "IsTheMax" column with a value of True.

PowerPivot model: Duplicate rows adding up to over inflated total.

HarryR1

New Member

Excel Facts

scottsen

Well-known Member

HarryR1

New Member

scottsen

Well-known Member

Kazlik

Board Regular

HarryR1

New Member

HarryR1

New Member

HarryR1

New Member

HarryR1

New Member

scottsen

Well-known Member

Similar threads

Forum statistics

Share this page

PowerPivot model: Duplicate rows adding up to over inflated total.

New Member

Excel Facts

Well-known Member

New Member

Well-known Member

Board Regular

New Member

New Member

New Member

New Member

Well-known Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock