DAX – Finding the largest value in a grouping

This will be very technical, and just a “resting place” for some code that took a while to make work. If it doesn’t make sense, then I apologize to all the non-code folks who occasionally stop by.

While working on a new PowerPivot report for the Boy Scouts, I needed to find out the number of Roundtables a District had entered into the Commissioner Tools in order to determine the percent of Roundtables a unit had attended. This was my solution.

Formula for calculating the maximum number of RT which have been recorded in CT by a District:
RT:=CALCULATE(MAX([Num_RT_Attended]),FILTER(RT_Merge,[District Name]=EARLIER([District Name])))
Where ...
[Num_RT_Attended] - must be a column in the data model
[District Name] - must be a column in the data model
RT_Merge = the name of the data model
[Num_RT_Attended]:=if([Aug]>0,1,0)+if([Sep]>0,1,0)+if([Oct]>0,1,0)+if([Nov]>0,1,0)+if([Dec]>0,1,0)+if([Jan]>0,1,0)+if([Feb]>0,1,0)+if([Mar]>0,1,0)+if([Apr]>0,1,0)+if([May]>0,1,0)+if([Jun]>0,1,0)+if([Jul]>0,1,0)

Final output:

[Num_RT_Attended] ~ (# RT Attended)
[PCT] ~ (% RT Attended):=Divide([Num_RT_Attended],[RT])

Fun stuff, eh!?! Of course, you can take the same code and apply it to many situations where you are trying to find the maximum number by a specific filter in the CALCULATE() function of DAX. One thing to remember, the MAX() function must be working on a column of data, it cannot work on a DAX Measure.

Say, wouldn’t it be nice if … ?

I really like the challenges associate with trying to “out think” a computer system. Power Query’s M language is very weak when it comes to error management (although DAX does a much better job), so I was put to a real test over this last weekend when I was asked if it was possible to change the calendar around which a report was produced. Not the standard calendar year, Jan – Dec, mantra of most reports, but instead a “Program Year” calendar, Aug – Jul. Now, at the face value this does not sound too bad, since the data comes out in calendar year format, it just means you need two reports to create the output and a Nested Join, in order to get the data from both sides of the report. But, what happens when the second report doesn’t exist yet? [And what about info that starts later in the “program year,” or drops off the list? More on that later.]

Thus was the challenge given to me by the Roundtable Team after they completed their “Effective Roundtables” class at the Philmont Training Center last week;-we like the Roundtable Analysis report, wouldn’t it be nice if (WIBNI) it was done using a program year calendar! This request came in at 2:23 p.m. on Friday, June 16th.

Phase 1

So I took a two-phased approach. Phase 1: I built a rather standard system that would use the current year’s and the previous year’s Unit Health reports. The first version was sent out at 1:18 p.m. on Saturday, June 17th. Not bad, after 22 hours 55 minutes had passed on the clock, it took me approximately 6 hours of development to knock out the report from scratch, including adding the update to use the “automatic load” option that many users like. (They put their data files in the same directory as the PowerPivot tool, and when you open the Excel spreadsheet it automatically loads the data for you.) Now on to the far more challenging Phase 2.

[At this point, I must admit, I disabled the “automatic load” since I was opening and closing the spreadsheet about 8 times an hour (since I like to save my work, and clear memory) … so I put in a VBA (Visual Basic for Applications, Microsoft’s “macro” language for Office) routine to handle running the update, once I pressed “Ctrl+Shift+U”. I have left this in during the testing phase.]

Phase 2

While the primary issue to address in Phase 2 is “what do we do when the 2nd data file is missing, or doesn’t exist,” there was a secondary issue which I had neglected to consider from Phase 1, “what happens when data starts showing up in March that was not there previously?” In Scouting, we need to remember that new units may start at any time, so now they show up in the data part way through the year, or, the do no recharter, so they drop from the data part way through the year, leaving us with null values in the data. Power Query’s M language does not consider null to be the same as zero, so we need to also consider some transformation steps (ETL – extract, transformation, load) in order to get all the data in order, and allow mathematical formulas to work.

Getting the system to work with only one data file took some “extra” work. First, a substitute data table needed to be created in memory to fill the void left by the missing file. Then there was the issue with getting the Power Query M language try statement to work.

So, in order for the file merge to work, it was necessary that there be a data table to merge with. That required that an “alternative” be created using the Power Query M language #table() function. This allowed me to create a table that would match the fields of the final output, and place some “temporary data” there so that it could be quickly identified and removed at the end of the merge process.

Next came the biggest issue, the Power Query try statement. Unlike other Power Query functions this one is not as well documented, and even looking at examples on-line from other developers was not a lot of help. Part of the process of importing a file the normal first step, after the file import is defined, is to promote the column headers using the Table.PromoteHeaders() function. After that is it simply a matter of defining data types for each field, removing blank rows, filtering out bad rows, and removing any unneeded columns (in order to save memory). Well, after much experimentation I found that the best way to use the try statement was to apply it to the Promote Headers statement, then to test for the error. If the error exists, then you use the alternate table, if not, you continue with the next step of the normal input process, taking care of the data typing for each column (referred to as “Changed Type” if you use the automatic step generation from Power Query).

Okay, we have overcome the limits of having only one file, so what happens to all the matches that don’t have data? They show up as null. Now, we have seen this ugly bug before, so we need to apply some transformations in order to wipe out the nulls. Fortunately Power Query’s M language has such a function (Table.ReplaceValue, JoinKind.FullOuter), so it was applied to every month’s attendance numbers. Instead of being null, they would be zero, and could be used in other numeric calculations.

How do you count the number of units who attended Roundtable from a column full of zeros? One of the issues to be overcome by DAX (the PowerPivot language) was the need to count the number of units that attended each month. Well, if you are using the DAX Calculate function, and you tell it to count all the rows where the number is greater than zero, but they are all zeros, it causes the Calculate() function to return a blank value. Much like in M with null, DAX does not consider blank to the be same as zero, which tends to throw off multiple calculations. So, in order to overcome this limitation, it became necessary to use the DAX If() statement in conjunction with the DAX Isblank() function to correct the error, and return the number zero when Calculate() would prefer to return blank. Another mathematical function saved for our final PowerPivot tables.

Testing

Yes, testing. While that may come as a surprise to some, considering all this work is being done by “volunteers” and not a paid IT team, I have collected a small team of “Data Dawgs” from various parts of the country who help me to validate that things are working properly. Now in addition to that team, I add the original sequesters (by customers) of this WIBNI, to see if we are meeting their needs. I know there are IT folks who would just as soon not ask their customers for their opinions, but in this case, this is where the rubber meets the road, and if it doesn’t work for them, then it simply won’t be used. The testing version of the Phase 2 work was sent out on Monday, June 19th, at 12:06 p.m. 46 hours 48 minutes after Phase 1, and 69 hours 43 minutes after the original request. Approximately 35 hours (at the keyboard) went into the development of both phases.

Documentation

I must say, I am no “saint” when it comes to documenting the code I am developing, but I have decided it is very important that anyone who takes over this role, sometime on down the line, have the understanding of what was done. So I have been taking the time to ensure that I am thoroughly documenting my code.

  • PP Tools, as previously mentioned – I used the “Calculated Columns,” “Measures,” “Memory,” and “Fields Never Used” tools to get a good understanding of the basis of the PowerPivot work.
  • Contexture’s VBA code for documenting pivot tables. By adding this VBA code to my “ThisWorksheet” area, I was able to document each PowerPivot table by simply clicking on any cell in the PowerPivot table and running this macro from the Macros menu.
    Sub ListPivotFields()
    Dim lRow As Long
    Dim wsList As Worksheet
    Dim pt As PivotTable
    Dim pf As PivotField
    Dim pi As PivotItem
    Dim strList As String
    Dim strLoc As String
    On Error GoTo errHandler
    Set pt = ActiveSheet.PivotTables(1)
    strList = "Pivot_Fields_List"
    
    Application.DisplayAlerts = False
    On Error Resume Next
    Sheets(strList).Delete
    On Error GoTo errHandler
    
    Set wsList = Sheets.Add
    lRow = 2
    
    With wsList
    .Name = strList
    .Cells(1, 1).Value = "Caption"
    .Cells(1, 2).Value = "Source Name"
    .Cells(1, 3).Value = "Location"
    .Cells(1, 4).Value = "Position"
    .Cells(1, 5).Value = "Sample Item"
    .Cells(1, 6).Value = "Formula"
    .Cells(1, 7).Value = "Description"
    .Rows(1).Font.Bold = True
    
    For Each pf In pt.PivotFields
    If pf.Caption <> "Values" Then
    .Cells(lRow, 1).Value = pf.Caption
    .Cells(lRow, 2).Value = pf.SourceName
    Select Case pf.Orientation
    Case xlHidden
    strLoc = "Hidden"
    Case xlRowField
    strLoc = "Row"
    Case xlColumnField
    strLoc = "Column"
    Case xlPageField
    strLoc = "Page"
    Case xlDataField
    strLoc = "Data"
    End Select
    .Cells(lRow, 3).Value = strLoc
    .Cells(lRow, 4).Value = pf.Position
    
    On Error Resume Next
    If pf.PivotItems.Count > 0 Then
    .Cells(lRow, 5).Value = pf.PivotItems(1).Value
    End If
    On Error GoTo errHandler
    
    'print the formula for calculated fields
    If pf.IsCalculated = True Then
    .Cells(lRow, 6).Value = _
    Right(pf.Formula, Len(pf.Formula) - 1)
    End If
    
    lRow = lRow + 1
    End If
    Next pf
    End With
    
    exitHandler:
    Application.DisplayAlerts = True
    Exit Sub
    errHandler:
    MsgBox "Could not create list"
    Resume exitHandler
    
    End Sub

If you’d like to see the documented file, it is available for download from OneDrive.

So, that’s my latest “adventure,” hope you found some enlightenment from the explanation of how it was accomplished.

PowerPivot Tools: The Next Generation

(With apologies to Star Trek …)

I wanted to advance some of the basic tools which were first created more than a year back (starting with the Unit Contact Analysis tool), which used more basic data collection techniques, and were limited to processing only one file of data at a time. In order to do that, however, I needed to ensure that I had thorough documentation of the first-generation designs.

So, to start that process, I enlisted the very handy (and free) “PP Utilities” which really helped in three major areas:

  1. It provided me a list of the DAX Measures in my PowerPivot table
  2. It provided me a list of the calculated columns that I had added to my PowerPivot table
  3. It provided me the memory usage of the model I had designed – a huge bonus.

So, I was able to use this on my first-generation PowerPivot tool to get all the details, then I went through at added “documentation” into the Notes field which it provided on the tab ¬†they created, and start documenting the interconnections between the various calculated columns and the DAX measures. That made re-creating them much easier.

So, then, I went and added in my Power Query M scripts to handle the process of looking for an merging together multiple data files. Fortunately, this was already written as part of my “Merge Tools” so that was not an issue.

Now came the interesting part – breaking the old data connections, and getting the new Power Query scripts as the proper way to add data to the PowerPivot data model. I found myself needing to “re-create” some of the PowerPivot tables, so I walked through that process, tab by tab. Then I needed to ensure that I completed the proper conditional formatting which I had done in the first-gen model. Conditional formatting in Excel can be both powerful and painful, all at the same time, and remember how to apply conditional formatting in a pivot table so that it expands and contracts with the changes in the data, just makes life all that much more interesting. Total time involved – 6 hours. The new beta tool can process, complete the analysis, and display the data for seven scout councils in about 19 seconds.

So, the latest beta release is out for review by the team, once I hear back, it will be finalized. Then I’ll need to update the overall documentation, the sample reports, and put those DAX Measures and Calculated Columns to good use and save them with the Power Query M scripts, so that all this new work is documented.

Then, once all that is done, we can look at releasing the first of The Next Generation PowerPivot tables.

So, which PowerPivot Tools do I use?

(At the recent National Annual Meeting in Orlando, I was asked this question about 50 times, so I thought maybe it would help if I just posted my answer online for anyone who was not there.)

As you might guess, even though I lead the development of the Commissioner Tools¬†PowerPivot tables, (With some great feedback and assistance from a number of commissioners around the country to help fine tune those tools. Trust me, what we have was always made better by the feedback we received.) as a good “Data Dog,” I also use these tools (or, “eat my own dog food,” if you prefer) in order to provide Commissioner Tools analysis for my Council, my Area, and also to the other Councils in my Area who may not have someone who can complete the data analysis for them. It was important in the design of these tools to ensure we had something for everyone, so we made sure to include statistical, visual, and descriptive analytics, since no one method works best for everyone.

When I am working on my Council (or any other Council I may be assisting), I employ the following tools:

I provide these as both Excel spreadsheets, and as PDF documents, for those who do not have Excel. Remember, the DC Visual Analysis report is really an interactive spreadsheet, and while you can print this report (or make a PDF), it is much better used as a spreadsheet.

Now, this being June, and since we re-register our District & Council volunteers on June 30, I also included the Commissioner Tools “Commissioners without Contacts 2017” report in the information I provide to all the District Commissioners. This can be used to assist them in their deliberations on which commissioners should continue in their roles in the coming year.

So, how long does creating all those reports take? Well, it takes me at most 5 minutes to download the 4 needed reports from Commissioner Tools:

  • Assigned Units
  • District Contact Stats 2017
  • Priority Needs Units
  • Unit Health 2017

Then for the actual loading of data into the PowerPivot tools, about 3 minutes.

So, in less than 10 minutes I have completed the initial analysis of the Commissioner Tools data from the previous month. Then I spend probably another 5-10 minutes reviewing the results, and writing up the email to send the reports out to the Council Commissioner’s Cabinet and the District Commissioners.

So, 15-20 minutes total from start to finish.

Now sure, I have probably done this 100 times or more. So, I am certain I have optimized my workflow. But even if this takes you 30 minutes the first time, you’ll get better the more you do it. The reports will become more familiar, you’ll do some prep so you have your folders for your data and the tools all ready to go their job. But you too can get this process down to a point where it comes naturally.

At the Area-level, I really limit myself to only two reports (as they really are the best for a consolidated view):

  • Unit Contact Analysis
    (to get the complete overview of how each Council is doing)
  • Units Needing Attention
    (so my Area Commissioner can work with the Council Commissioners, when I do this for the Area, I add another “slicer” on the Summary page for the Council Name)

Of course, since I am now wanting to create these two reports in a consolidated fashion for a number of Councils, I also use the “Merge Tools” to help me prepare the data:

  • Merge UH (for Unit Health reports)
  • Merge DCS (for District Contact Stats reports)
  • Merge AU (for Assigned Units reports)
  • Merge PN (for Priority Needs Units reports)

Now yes, this takes longer since I need to download the 4 reports I need from each Council in my Area, then I need to merge them together (which takes seconds), before I can feed them into the PowerPivot tools. But I normally spend about an hour total on my Area (including creating the reports for each Council).

Remember, if you get stuck, don’t call Member Care. Instead, send an email to: commissioner.support@scouting.org and one of us who volunteer to answer those emails will get back to you as quickly as we can.