Adding trendlines to PowerPivot
It started during a presentation of Cathy Dumas at the latest SQLBits where she compared the different charting capabilities of Excel, Power View, PerformancePoint and Reporting Service. One of those capabilities were trendlines who are missing in V1 of Power View. She suggested to add a trendline as a measure, but didn’t succeed, so she raised a challenge. And here I am taking this challenge to add a trendline to a PowerPivot model to be used in Power View.
The theory behind a linear trendline is minimizing the least-square of the given set. The Wikipedia page gives a lot of theory and I must admit that the Dutch version has besides the theory also a simple solution of solving the equation.
Given a trendline with the following equation:
Then a and b can be solved by using the following equations:
For the following dataset:
The following base values can be calculated and are input for a and b:
So a and b are now:
a = –0.8
b = 7
And the trendline is now y = –0.8x + 7. With this formula it is possible to calculate a trendline value for each x of the dataset, which gives a nice trendline as a extra line chart.
Plot this trendline with the original data and the build-in trendline in Excel, the two line are overlapping each other:
To convert this theory to a real life scenario I use the AdventueWorks database with the FactResellerSales fact table, the DimProduct tables and the DimDate table. With this dataset I ran into some challenges. First challenge is the series – I use a combination of Years and Quarters, like (20051, 20052, …) – which is not linear and continues. Also the data is not on the lowest grain of those series values.
The first step will be to convert the used series to a usable series for minimizing the least-square. Next step is adding all the needed measures to the FactResellerSales table to calculate the correct a and b.
Converting the Series
I created a nice long date series by adding a calculated column to DimDate YearQuarter
=[CalendarYear] & [CalendarQuarter]
Next I added this calculated column to the FactResellerSales table by using a calculated column RelYearQuarter
Problem is that this series is not nicely formed as need for a correct calculation, so a conversion is needed. The current version of PowerPivot has the function RANKX which can be used to rank our series. The default order of RANKX is ascending, but a descending order is needed. Also we need a dense ties to get a continues series. This will give as a calculation for the column RankYQ
=RANKX(ALL(FactResellerSales), [RelYearQuarter], [RelYearQuarter], 1, Dense)
Calulation the Base Values
Most of these values needs to be calculated against the full selected set. Therefor most of the calculation have ALLSELECTED(FactResellerSales) included to make sure that the measure is calculated against the selected dataset creating the PivotTable.
n : NrSeries
To calculate the number of series value a maximum of the newly RankYQ column should be enough. Problem is that RankYQ is calculated again the complete set of the data, ignoring all PivotTable filters. So when the first series value is not selected in the PivotTable, the RankYQ will remain the same and is not recalculated. One solution is to convert the column to a measure, but this will cause a huge performance impact if a maximum is calculated. The best way is to correct the value of the maximum of RankYQ by subtracting the minimum of RankYQ value and adding 1 for the right offset. In this case the total number of series – NrSeries – will give the correct value for a continues selection of the series:
=MAXX(ALLSELECTED(FactResellerSales), [RankYQ])-MINX(ALLSELECTED(FactResellerSales), [RankYQ])+1
E(x) : SumSeries
To calculate the total sum of the series values the best way is to revert back to basic mathematics knowledge. What we need for our sum of the series is for a given n:
Which can be calculated for that same n as:
So the measure SumSeries can be calculated as follows
E(x2) : SumSeries2
With the same mathematical knowledge we can solve the sum of our series values squared:
The measure SumSeries2 can be calculated as follows
E(y) : SumAmount
This is the most easiest calculation we need as it is a standard way to calculate the total of the selected values, here named SumAmount
E(x*y) : SumRankSales
To calculate this measure the same trick is needed as with the **NrSeries **measure: The selection of the series can start not as the first value, so it is not possible to simply multiply the **RankYQ **value with the Sales Amount. For this value also the minimal selected RankYQ value of the series needs to be subtracted of the current RankYQ value and corrected with 1. So the SumRankSales measure is calculated as follows
=SUMX(ALLSELECTED(FactResellerSales), ([RankYQ]-MINX(ALLSELECTED(FactResellerSales), [RankYQ])+1)*[SalesAmount])
a and b
With all the base values in place a and b can be calculated as follows. In the calculation of a there is a check for division by zero (0).
=IF((([NrSeries]*[SumSeries2])-([SumSeries]*[SumSeries]))=0, BLANK(), (([NrSeries]*[SumRankSales])-([SumSeries]*[SumAmount]))/(([NrSeries]*[SumSeries2])-([SumSeries]*[SumSeries])))
With a and b now correctly calculated we can calculate a measure – **SalesAmount Trend **– that gives the trendline usable for Power View. A IF statement is included to make sure that the trendline is only calculated when there is a original value of SalesAmount available. And the current series value is calculated using the current RankYQ value minus the minimal selected RankYQ value (and a correction of 1 to complete the calculation).
=IF(ISBLANK([Sum of SalesAmount]), BLANK(), [a]*(MAX([RankYQ])-MINX(ALLSELECTED(FactResellerSales), [RankYQ])+1)+[b])
This trend value can now be used to validate with Excel if it shows the correct trendline.
Of course there are some limitation to this approach, which aren’t there if Power View supports trendlines natively:
- The value of series needs to be a continues series, so no gaps. Is this case the trendline will be off if 2006 is deselected and 2005 and 2007 are selected.
- The trendline is only valid for the given series and values.
- Only a linear trendline is calculated at this moment. For Exponential and Logarithmic trendlines the calculation is almost the same, but maybe I blog about this some later time.
- In Power View it is not possible to combine a bar chart with a line chart, so if you need to show your data as bars this trendline will also show as a bar chart.
With all the intermediate measure in the model, some hiding is needed before it can be really used in Power View. But most of that is needed anyway, because of the different nature of PowerPivot and Power View.
There is probably a better and faster way to determine the total distinct series values. If anyone has a good suggestion: please let me know.
Unfortunately I don’t have a working Power View installation, so I cannot show the real result. But you can try it yourself by download the sample Excel sheet from here/[link broken].