Forecasting in Power Query or Powerpivot?

SOQLee

Board Regular
Joined
Mar 18, 2015
Messages
52
I would like to forecast the values for each X variable for each Group. There is 5 years worth of data for Groups A-D, Group E has 2 years worth of data. Regardless, I would like to forecast values for August 31, 2018. Can you do this in Powerquery or Powerpivot?

I have been using the Forecast Sheet function in Excel 2016 for a quick time series forecast, however in the example below, I have multiple groups and multiple X variables.

Any advice appreciated. Thank you.


GroupCycleX1X2X3X4X5
A2013-08-3131.0040.5032.203.3021.90
A2014-08-3125.0040.9032.407.3018.50
A2015-08-3118.0052.4050.0017.3026.10
A2016-08-3120.0044.9042.3018.0022.30
A2017-08-3120.0033.9030.606.0020.60
B2013-08-317.002.002.000.002.00
B2014-08-312.006.007.002.003.00
B2015-08-313.005.006.000.006.00
B2016-08-315.005.005.001.004.00
B2017-08-313.000.000.000.000.00
C2013-08-3110.0029.0020.007.009.00
C2014-08-3110.0027.0013.003.0010.00
C2015-08-3110.0020.0014.004.0010.00
C2016-08-3110.0025.0020.0010.0010.00
C2017-08-3110.0020.0013.004.009.00
D2013-08-3128.00123.0047.0019.0028.00
D2014-08-3128.00109.0055.0029.0020.00
D2015-08-3130.00116.0051.0022.0029.00
D2016-08-3130.00111.0042.0020.0022.00
D2017-08-3127.00116.0042.0026.0014.00
E2016-08-3116.0029.2022.603.0015.60
E2017-08-3120.0045.7031.607.3018.30

<tbody>
</tbody>
 

bkjohn2016

New Member
Joined
Sep 9, 2016
Messages
38
Interesting...I recently refined my depreciation technique and have been wanting to apply it to forecasting. You're going to need several things. First, you'll have to take your data out of a tabular format. Second, Power BI doesn't have the equivalency of the financial functions available in Excel so you're going to have to build an old school regression model with all the columns and calculations. Finally, these BI models are great for calculating activity that has already occurred but forecasting can be a bit difficult. The key is creating a disconnected table and using it in your forecasting calculations. The output table for Group A - X1 looks like this:

Row Labels TOTAL X TOTAL Y TOTAL X-SQUARED TOTAL Y-SQUARED TOTAL XY b a N RegressionFormula
8/31/2012 211,237.00 114.00 8,925,548,475.00 2,710.00 4,806,343.00 (0.0073893) 334.98 5.00 30.89
8/31/2013 211,237.00 114.00 8,925,548,475.00 2,710.00 4,806,343.00 (0.0073893) 334.98 5.00 28.20
8/31/2014 211,237.00 114.00 8,925,548,475.00 2,710.00 4,806,343.00 (0.0073893) 334.98 5.00 25.50
8/31/2015 211,237.00 114.00 8,925,548,475.00 2,710.00 4,806,343.00 (0.0073893) 334.98 5.00 22.80
8/31/2016 211,237.00 114.00 8,925,548,475.00 2,710.00 4,806,343.00 (0.0073893) 334.98 5.00 20.10
8/31/2017 211,237.00 114.00 8,925,548,475.00 2,710.00 4,806,343.00 (0.0073893) 334.98 5.00 17.40
8/31/2018 211,237.00 114.00 8,925,548,475.00 2,710.00 4,806,343.00 (0.0073893) 334.98 5.00 14.70
8/31/2019 211,237.00 114.00 8,925,548,475.00 2,710.00 4,806,343.00 (0.0073893) 334.98 5.00 12.01


Obviously, the only column you're really interested in is the regression column. Please be aware that this is a single variable linear regression model. But, adapting the technique to accommodate for multiple variables shouldn't be too difficult, it should just be an extension of what's already there. I'll put the query code and power pivot formulas together for you and you can paste it into the appropriate editors.
 

bkjohn2016

New Member
Joined
Sep 9, 2016
Messages
38
Convert the dataset you posted to a table and name it RegressionDataSet.

Create a blank query and name it RegressionSourceData; then copy and paste this code into the query editor:

Rich (BB code):
// Query Name = RegressionSourceData
let
Source = Excel.CurrentWorkbook(){[Name="RegressionDataSet"]}[Content],
CreateTable=
    Table.TransformColumnTypes(
        Table.ReorderColumns(
            Table.AddColumn(
                Table.AddColumn(
                    Table.AddColumn(
                        Table.AddColumn(
                            Table.RenameColumns(
                                Table.UnpivotOtherColumns(
                                Source, {"Group", "Cycle"}, "Attribute", "Value"),
                            {{"Cycle", "Cycle (x)"}, {"Value", "Value (y)"}}),
                        "x-Squared", each Number.Power(Number.From([#"Cycle (x)"]),2)),
                    "y-Squared", each Number.Power([#"Value (y)"],2)),
                "xy", each Number.From([#"Cycle (x)"])*[#"Value (y)"]),
            "Group-Attribute", each [Group]&"-"&[Attribute]),
        {"Attribute",
        "Group-Attribute",
        "Cycle (x)",
        "Value (y)",
        "x-Squared",
        "y-Squared",
        "xy"}),
    {{"Group", type text},
    {"Attribute", type text},
    {"Group-Attribute", type text},
    {"Cycle (x)", type date},
    {"Value (y)", type number},
    {"x-Squared", type number},
    {"y-Squared", type number},
    {"xy", type number}})
in
    CreateTable
In excel create these items:
ForecastDate
=MIN(RegressionDataSet[Cycle])
=EOMONTH(MAX(RegressionDataSet[Cycle]),24)

<tbody>
</tbody>

It should look like this:

ForecastDate
8/31/2013
8/31/2019

<tbody>
</tbody>

<tbody>
</tbody>

Convert the list to a table and name it ForecastCalendar

Create a blank query and name it ForecastCalendar; then copy and paste this code into the query editor:

Rich (BB code):
// Query Name = ForecastCalendar
let
    Source = Excel.CurrentWorkbook(){[Name="ForecastCalendar"]}[Content],
    Thresholds = Table.TransformColumnTypes(Source,{{"ForecastDate", Int64.Type}}),
    
CreateCalendar = 
    Table.TransformColumnTypes(
        Table.AddColumn(
            Table.AddColumn(
                Table.RenameColumns(
                    Table.TransformColumnTypes(
                        Table.FromList(
                            List.Numbers(
                            Thresholds[ForecastDate]{0},Thresholds[ForecastDate]{1}-Thresholds[ForecastDate]{0}),
                        Splitter.SplitByNothing(), null, null, ExtraValues.Error),
                    {{"Column1", type date}}),
                {{"Column1", "ForecastDate"}}),
            "ForecastMonth", each Date.EndOfMonth([ForecastDate])),
        "ForecastYear", each Date.From("8/"&"31/"&Number.ToText(Date.Year([ForecastDate])))),
    {{"ForecastDate", type date}, {"ForecastMonth", type date}, {"ForecastYear", type date}})
in
    CreateCalendar
Close and load the queries. Open the power pivot manager and create these measures:

Rich (BB code):
TOTAL X:=SUM(RegressionSourceData[Cycle (x)])

TOTAL Y:=SUM(RegressionSourceData[Value (y)])

TOTAL X-SQUARED:=SUM(RegressionSourceData[x-Squared])

TOTAL Y-SQUARED:=SUM(RegressionSourceData[y-Squared])

TOTAL XY:=SUM(RegressionSourceData[xy])

N:=COUNTA(RegressionSourceData[Group-Attribute])

b:=(([N]*[TOTAL XY])-([TOTAL X]*[TOTAL Y]))
/
(([N]*[TOTAL X-SQUARED])-([TOTAL X]^2))

a:=([TOTAL Y]/[N])-(*([TOTAL X]/[N]))

Forecast (X):=AVERAGE(ForecastCalendar[ForecastYear])

Forecast:=[a]+(*[Forecast (X)])


Create a pivot table placing the [Forecast] measure in the values area, the ForecastYear field into rows, and the Group-Attribute field into columns. Isolate A-X1 at 8/31/2018 and make sure it equals 14.70427.


Forecast
A-X1
8/31/2013
28.1972
8/31/2014
25.5001
8/31/2015
22.8030
8/31/2016
20.0985
8/31/2017
17.4014
8/31/2018
14.7043
8/31/2019
12.0072

<tbody>
</tbody>

<tbody>
</tbody>


Excellent exercise! Good luck in your forecasting endeavors.
 

macfuller

Active Member
Joined
Apr 30, 2014
Messages
257
bkjohn2016, forgive me for following along but this is incredibly useful. I have two questions.

1) On the Total X measure I'm getting a System.FormatException error. (Using Office 365 2016). I've never seen that before. The error itself shows up in the measure in the data model table rather than #ERROR . It says "The string '14638-08-23T00:00:00' is not a valid AllXsd value. at System.Xml.Schema.XsdDateTime..ctor(" and so on. As far as I can tell all the dates from the original dataset are valid when I copied to a table following your instructions.

2) For your queries I haven't seen the nesting of the AddColumn structure in most postings here at MrExcel. I'm curious as to whether this makes the query run faster on large data sets. Usually each .AddColumn is a separate line in the query and I've often wondered if that means PowerQuery iterates through the table each time to create each additional column. If your structure means that every column is constructed such that only one iteration of the rowset is necessary then the time savings could be huge.
 

bkjohn2016

New Member
Joined
Sep 9, 2016
Messages
38
Hi Macfuller,

I'm glad you enjoyed the model, I found it entertaining.

While building the model I did make note that I left the "Cycle (x)" field in a Date format but disregarded because I did not receive any errors and the field wasn't going to be used in any visible output. I am working on a dual-boot system with 16GB of RAM and windows 7 professional 64-bit OS and windows 10 professional 64-bit OS both have office 2016 64-bit and the model worked well in both environments. My understanding is that Office 2016 is a bit more up to date than 365...on several occasions I've created models on my personal PC and taken the build to work to receive warning message indicating the model was created on a newer version of office and may not work the same. It is a bit strange because I built the same calculations in excel using the "Forecast" function and got to the same outcome...but excel is not power pivot. At any rate, my suggestion is to go into Power Query and change this code:

{"Cycle (x)", type date},

to

{"Cycle (x)", type number},

You may need to do the same thing on the calendar table as well.

Regarding your second question with respect to query speed, I really cannot speak to that as my background is in finance and accounting. I have only recently begun to dabble with the coding. And to be honest the reason my query is structured like that was to gain familiarity with query structuring as well as to clean that big mess generated in the "Applied Steps" section of the UI while using the UI.

This is how PQ defines the AddColumn function:

Table.AddColumn = Adds a column named newColumnName to the table table. The values for the column are computed using the specified function columnGenerator with each row taken as an input.

I wish I could be of more help but like I said, I'm still learning.
 

macfuller

Active Member
Joined
Apr 30, 2014
Messages
257
bkjohn -

I wish I learned at the rate you do! Changing the type of cycle x to number eliminated the error and gave me your A-X1 results.

clean that big mess generated in the "Applied Steps" section of the UI
:LOL: Amen to that!
 

SOQLee

Board Regular
Joined
Mar 18, 2015
Messages
52
Hi bkjohn2016, wow...that's amazing :eek: I'm going to try to recreate your results in Excel 2016. I've also noted macfuller's input. I'm constantly learning; this model will required a bit more intensive studying for me. I'm looking forward to making this model work.
 

Forum statistics

Threads
1,086,220
Messages
5,388,542
Members
402,119
Latest member
matts17

Some videos you may like

This Week's Hot Topics

Top