Optimal Amount of Data


New Member
Oct 27, 2016
Hi All -

I have been attempting to put together a tool and was wondering how much data and the best way to structure it for re usability.

I have several clients and am trying to measure each client employee base on how prepared they are for retirement. My initial thought was to handle this in a one-off scenario, using the individual client (company) employee data and then changing the data each time I run the analysis.

Would it make more sense to put together a database of every employee tied to each employer and then pull that data into the data model, filtering by employer to get the correct data set?

Since I am starting from scratch, I could create this in excel, access, or even on sharepoint - allowing the clients to maintain their own employee base.

The other issue for consideration is updating the data on a regular basis. For each client there could be a different provider for the retirement plan - meaning different data downloads and formats for each employee group - some have fields ordered in different ways, etc. To run the calculations, I need accurate account balance and deferral information for each employee. What would be the best format for storing the data so this could be updated on a regular basis (ideally by someone other than me). I'm assuming I would have to do this manually, employee by employee. As the number of employees is relatively small right now, this isn't an issue, but I would need to consider that the number could be in excess of 100,000 unique employees within a few years, making data updates very time consuming.

I apologize if this is a little disjointed - just need some advice on where to start housing the data so I can make use of powerpivot for the analysis.


Edit: One thing I forgot - I use a sharepoint server, so whatever I use to store the data, I would prefer to keep there (revision history, etc.). I know the "apps" in access have less functionality than a regular database, but that might be easier to use for others in my organization.
Last edited:

Some videos you may like

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand

Matt Allington

MrExcel MVP
Dec 18, 2014
Yep, there's a lot in this question :).

There are no right or wrong answers and it depends on what you need. Based on your description of the problem, here are my thoughts.

Power Pivot or Power BI is the right reporting tool. There is no amount of data that is too big, and it seems your needs are modest. So don't make a decision on how much to load based on size - it can handle it.

Power Pivot and Power BI are not data storage tools, they are reporting tools. So you need somewhere else to store and enter data. SharePoint is a good option for small data and when you need distributed solutions. A database is better for large data but distributed sharing can be a challenge. Given you have lots of different clients with different systems, I am thinking store the data in Excel format any way you can get it. The closer it is to what is easy for the client the better. That way it is easy for them to update (just give you a new extract).

I think there is value in having all data in one place at some stage in the process. Imagine the bench marking benefit of being able to say to Client A how they compare with "like clients" etc. This implies you would need a standardised end format for the data. The good news is you can use Power Query to manage the transformation from a non standard starting point to a common end point.

Based on what I understand, this is what I would do
1 gather data from each client in one or more Excel files as needed. Make it simple for them so it is easy for them to refresh
2 store these excel files in a SharePoint library and use the version control and online editing features to manage data entry etc
3 build a transformation tool for each client using power query in Power BI Desktop to get the data into the require data format
4 build out the reporting in Power BI and distribute via the service. Apply Row Level Security if necessary
5 later copy this desktop file and repurpose for the next client. All that is needed is to change the power queries to deliver the same starting data shape as before. The starting point is irrelevant other than you need access to the same data.
6 even later, if you want to benchmark, you can combine the queries from multiple clients into a single workbook and bring the data together for cross industry reporting.
Last edited:


New Member
Oct 27, 2016
Thanks Matt -

Also have to say that you blew my mind by adding comparison tools for the report... with enough data on a large scale that would be extremely helpful and valuable to the process.

I think I have a good place to start now and can start gathering the data I need through sharepoint from each client. Most of what I need I can get from the recordkeeper data file, the census data would be something they would input.

Along those lines - could I split the data needed? For instance, have the client fill out the census information (name, DOB, salary) and then match that with recordkeeper data (account balance, deferral amount)?

I'm guessing I need to learn to use PowerQuery next...


New Member
Oct 27, 2016
Perhaps not - I just registered for the updates... By the time it comes around, I should have a solid foundation of PowerPivot and how I want to structure the data to make it worthwhile.

As the defacto tech person in the firm (defacto because I am the youngest by far - I beat everyone by at least 15 years) I have a lot of time to work through all these things. Short of someone with the skills already coming in and taking this project up with me, there is no way I will be done in the next few months. Luckily, since we are starting from the bottom here and I have pulled a lot of input from other industry people into the creation of this, I don't think it will be a problem taking my time to ensure this is put together correctly and the process can be repeated easily many times over. Also, luckily for me, this is a way of looking at data that the industry has largely neglected over the last few years - there are existing tools out there, but all of them were developed by a mutual fund company with the express purpose of pushing more sales to said mutual fund company.

Watch MrExcel Video

Forum statistics

Latest member

This Week's Hot Topics

  • Timer in VBA - Stop, Start, Pause and Reset
    [CODE=vba][/CODE] Option Explicit Dim CmdStop As Boolean Dim Paused As Boolean Dim Start Dim TimerValue As Date Dim pausedTime As Date Sub...
  • how to updates multiple rows in muliselect listbox
    Hello everyone. I need help with below code. code is only chaning 1st row in mulitiselect list box. i know issue with code...
  • Delete Row from Table
    I am trying to delete a row from a table using VBA using a named range to find what I need to delete. My Range is finding the right cell. In the...
  • Assigning to a variable
    I have a for each block where I want to assign the value in column 5 of the found row to the variable Serv. [CODE=vba] For Each ws In...
  • Way to verify information
    Hi All, I don't know what to call this formula, and therefore can't search. I have a spreadsheet with information I want to reference...
  • Active Cell Address – Inactive Sheet
    How to use VBA to get the cell address of the active cell in an inactive worksheet and then place that cell address in a location on the current...