/ Analysis

Analysis on tagged customers

It is been a while since my last blog and of course I got the standard excuses. The two main are: busy with my project/customer and I intended to blog about the upcoming SQL Azure Reporting release, but as you know: it is not released to the beta tester yet.

Therefor I revived an old PoC I created two years ago, but was lost during a laptop change: Analysis on tagged customers. Tagging is very Web 2.0 (and should exist in a cloud solution) and how about tagging the customers by the business users. The advantage of tagging over (default) grouping is that it is possible to assign multiple tags to the same customer.

And now how can we use that advantage in our analysis environment. First we start with the data model.

The model is very simple: one Customers table with associated Sales table. And with an N:N relation between the Tags and Customers.

If we now create a cube in Visual Studio by using the wizard we can create the base of the tagging cube. We can use the default settings to create the cube. I got only the Sales Count measure deselected. SSAS creates by default a Many-to-Many relation between the Tags dimension and Sales measure group.

This relation already creates a part of the desired behavior: it is now possible to filter the value based on the customers that have a link to one of the selected tags. In our case when we select the tag International and Large we see that Cust B and Cust C are filtered**. **

But now we want the make sure that only the customers are shown that are tagged with both International and Large, in our case Cust C.

This is a little complicated, but we need the look for every customer that have as many tags as there are selected. The count of tags that are linked to a customer are already available in the cube: Tag ID Count (I removed the ID from the measure name). And the amount of tags selected we can easily determine by taking the distinct count of the tags on the total level of the customers. So we first add the distinct count based on the TagID of the TagCloud table to the cube and I renamed it to Tag Distinct Count.

Also we need an attribute to separate the default Or operator and the calculated And operator. First we add a column to the Tags table to host the Operator attribute and default behavior Or.

After we add this column as an attribute to the Tags dimension we can start creating and adding the And operator to that attribute. In the Calculate tab of the cube we add the following statement:

CREATE [Tags].[Operator].[All].[And];

This will create the new value to the attribute. And for the calculation we add the following statement.

([Tags].[Operator].[All].[And]) =
Aggregate(Filter(Descendants([Customers].[Customer Name].CurrentMember, , LEAVES)
, ([Tags].[Operator].[All], [Customers].[Customer Name].CurrentMember, [Measures].[Tag Count]) =
([Tags].[Operator].[All], [Customers].[Customer Name].[All], [Measures].[Tag Distinct Count])),
([Tags].[Operator].[All], [Measures].CurrentMember));

The last statement needs some explanation. The first line makes sure that the value is set to the following expression. Next we need to aggregate the current value ([Measures].CurrentMember) of the unfiltered Operator attribute ([Tags].[Operator].[All]). But we still need to filter the customers and aggregate only that customers which have the same amount of tags linked as selected is the Tags attribute. But as we are working in a calculated member we always need to refer to the [Tags].[Operator].[All] level because that is the level that the cube has values.

And if we now open the Excel 2010 we can add the Operator attribute and use that in our analysis. And luckily Excel 2010 has some major improvements it handling calculated measures. Before Exel 2010 it was not possible to deselect the calculated value, but that is now finally available. And with the use of the new slicers we can create a nice and easy to use analysis environment.



One thing I need to say: the solution is created using a small dataset. When using it in a larger scale it is possible that either performance is dropping (due to the distinct count eg) or the calculation becomes complex to make is calculate for more customer attributes.

Jan Pieter Posthuma

Jan Pieter Posthuma

I am Jan Pieter Posthuma and I am a Microsoft Data Consultant working for DataScenarios. With the current changes I found it time to share my thoughts and ideas with the community.

Read More