Multiple table relationships

denisl

New Member
Joined
Apr 3, 2013
Messages
12
I found powerpivot a couple of days ago and am just getting started using it.

I have 3 csv files that I load into powerpivot.
Here's an example -

main.csv:

Name,Value
A,100
B,200
A,300
B,400
B,500
C,600
C,700
A,800

file1.csv

Name,Value_A
A,1
B,2
C,3

file2.csv

Name,Value_B
A,1000
B,2000
C,3000

I load all three files into power pivot under "get external data - text"
I create the following two relationships:
1. main.csv "Name" to lookup table file1.csv "Name"
2. main.csv "Name" to lookup table file2.csv "Name"

Then I generate the Pivot table and have the 3 data sources to build my pivot table.
I select from main.csv - Name and Value
I select from file1.csv - Value_A
I select from file2.csv - Value_B

The resulting table looks like this:

Row LabelsSum of Value_ASum of Value_BSum of Value
A160001200
B260001100
C360001300

<tbody>
</tbody>


So my question is why does the sum of Value_B add up to the total of all the Value_B numbers in file2.csv? For A it should be 1000, B it should be 2000 and C it should be 3000 - yet it's showing 6000 (the sum of all the Value_B's in file2.csv). The Sum of Value_A and Value is working fine.

This is an example - my actual data is 3 very large CSV files with over 2M rows and about 30 columns.

Thank you
 

Some videos you may like

Excel Facts

Convert text numbers to real numbers
Select a column containing text numbers. Press Alt+D E F to quickly convert text to numbers. Faster than "Convert to Number"

Jacob Barnett

Board Regular
Joined
Dec 3, 2012
Messages
155
denisl,

Fundamentally your relationships are the wrong way round - BTW I don't believe your actual model is set up exactly as you describe because if it was then the Sum of Value_A would 6 for all 3 rows.

PowerPivot works on One to Many relationships where the lookup table is the many.

I am making the assumption based on your example that Files 1 & 2 both have a name column that contains unique values in which case the simplest way to solve this is to pick one of the two to be your 'dimension' table while the other 2 are 'fact' tables. If you chose file 1 as the dimension the two relationships would be:
- Main to File 1
- File 2 to File 1

Best practice may be to create a separate name table that could potentially feature other info about the name but that consisted of a least a column containing all unique names. If this table was called 'Name' then the relationships would be:
- Main to Name
- File 1 to Name
- File 2 to Name

Conceptually this is probably easier to get your head around and will be mandatory if the names in Files 1&2 are not unique.

HTH
Jacob
 

denisl

New Member
Joined
Apr 3, 2013
Messages
12
Thanks Jacob, still trying to connect the dots with your explanation.
My main.csv has multiple duplicate values for the names column.
I would have thought that the main.csv would be linked to file1.csv and file2.csv Name column?

Here's my sample powerpivot.
pivot1_zps09e47852.png

pivot2_zpsf64adc8a.png

pivot4_zps7c5b31ae.png

pivot3_zps4705a848.png
 

Jacob Barnett

Board Regular
Joined
Dec 3, 2012
Messages
155

ADVERTISEMENT

I would have thought that the main.csv would be linked to file1.csv and file2.csv Name column?
Negative Ghost Rider. :p

What your screenshots show is that although you've used the main table as the common link, you've actually used the name field from File 1 in your Pivot. You effectively got a situation where there is no link between Name in File 1 which is creating the filter context in the Pivot and the Name in File 2, hence why you get the sum of the entire column.

On the basis that the name in File 1 is your 'one' it stands to reason that the name in the other 2 files must be the 'many', it just becomes confusing because File 2 has unique values.

I recreated your example with the 'correct' relationships and saved here: https://docs.google.com/file/d/0Bz5yMU2oooW2QjU5d0Y3MGxFc1k/edit?usp=sharing

I also created a second example that has a separate name table that is related to all three other tables - conceptually easier and arguably a more efficient structure: https://docs.google.com/file/d/0Bz5yMU2oooW2RE12ajNZaU5wVk0/edit?usp=sharing

HTH
Jacob
 

denisl

New Member
Joined
Apr 3, 2013
Messages
12
Jacob - thank you so much for taking the time to give me some examples. The second example you provided made it finally click. I need to do some testing with my data now.

I have a second questions...
My real data has a site and a name where the name can exist in more than one site.
For example,

site_a, name_a
site_b, name_a

Can you have two fields be the lookup relation?
 

Jacob Barnett

Board Regular
Joined
Dec 3, 2012
Messages
155

ADVERTISEMENT

Its not totally clear what the goal is here but working on the basis that you want to differentiate between different sites then the answer is a separate site table in the same way that I created a name table in example 2. You thne create the same relationships and you are in business. The advantage of this approach is that your data will be totally sliceable by any combination of site and or name you wish.

This is a pretty standard way to set up your model - I work in retail and my equivalent of your 'Site' and 'Name' is 'Product' and 'Store', these 2 'dimension' tables are connected to numerous fact tables such as 'Sales' and 'Inventory' which are the equivalent of your Main/File 1/File 2
 

denisl

New Member
Joined
Apr 3, 2013
Messages
12
I'm definitely out of my comfort zone here, thank you for your patience.
To put it into context let me give you a little background as to what I'm doing.
I have data from a data center backup environment across multiple physical locations, multiple business units and various backup servers.
Each backup server manages data from distributed servers (nodes). These nodes are aligned to different business units.

My row labels need to be site, business unit, backups server, node and platform of the node. Platform is windows, linux, aix, etc.
The values I want to display come from 4 different data sources (csv files). Each csv file has site,business unit,backup server and node as the first 4 columns.
A node may exist more than once when it is backing up to more than one backup server so the unique combination is really backup server,node. Based on these two combinations we can derive the site and business unit since the backup server names are all unique and the backup server is dedicated to a business unit. However, a node may exist in more than one site.

Here's an example:
site-a,business_unit-a,backup_server-a,node-a
site-a,business_unit-a,backup_server-b,node-a
site-b,business_unit-a,backup_server-c,node-b
site-b,business_unit-b,backup_server-d,node-c
site-b,business_unit-a,backup_server-d,node-a

For each backup_server,node combination there is information that I need to extract from the other csv files.
For example,
"node" csv has site,business_unit,backup_server,node,platform
"occupancy" csv has site,business_unit,backup_server,node,total_data_mb
"summary" csv has site,business_unit,bacakup_server,node,total_mb_transferred

However, each of the data source csv files have multiple rows per site,business_unit,backup_server,node combination.
For example, for occupancy a site,business_unit,backup_server,node will have a row for each drive (windows C,D,F drives, etc).
So I need to sum up (like I would in a regular pivot table) the occupancy.total_data_mb when that value is added to my powerpivot.

Now back to the previous post/question -
In your example you have "product", "store" as 2 dimension tables.
I think I need "Site", "business unit", "backup server", "node" as my 4 dimension tables.
I created 4 linked tables with the unique Sites, Business Units, Backup Servers and Nodes.
Then linked these columns to the data source CSV's (occupancy, summary, node).

When I create the power pivot - I can select site,business unit, backup server and nodes which populate my row values. when I add in platform_name from the nodes data source it adds all the platform options under every node - where the correct value is only one of the platform types. I not understanding how to build the relationship so powerpivot looks up the platform from the site,business_unit,backup_server,node combination.

If you decide to abandon me at this point I completely understand :)
 

Jacob Barnett

Board Regular
Joined
Dec 3, 2012
Messages
155
I think you are pretty close. The question is do your 2 fact tables ('Occupancy' and 'Summary') have details of the Platform?I'm guessing the answer is yes!

At the moment you obviously don't have any relationship between the Platform in your 'Node' Dimension table and the 2 fact tables. As PowerPivot does not allow you to relate 2 tables on multiple columns you are going to have to either:

- Create a further dimension table called Platform that is related to your 2 fact tables.

- Expand your node table to include a row for each Platform for each Node, concatenate the two fields together and then relate that to a concatenated column on each fact table.

I prefer the first option as its cleaner and will probably deliver better performance.
 

denisl

New Member
Joined
Apr 3, 2013
Messages
12
The only table that has platform is the 'node' table.
I think I got it working but I don't think I solved the problem the pivotpower professional way...
Since all of my csv files contain instance,node I created a calculated column in powerpivot that concatenates the instance,node columns into a new 'instancenode' value.
The nodes table is a unique 'instancenode' per line where the 'occupancy' and 'summary' csv's have multiple lines with the same instancenode. I call this calculated column 'node-key', 'occupancy-key', and 'summary-key'.

My powerpivot relationships are for the 'occupancy-key' and 'summary-key' columns to the 'nodes-key' column.

Did I cheat? it seems to be working now.
 

Watch MrExcel Video

Forum statistics

Threads
1,123,307
Messages
5,600,869
Members
414,411
Latest member
Snowmanaus

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top