I really like the challenges associate with trying to “out think” a computer system. Power Query’s M language is very weak when it comes to error management (although DAX does a much better job), so I was put to a real test over this last weekend when I was asked if it was possible to change the calendar around which a report was produced. Not the standard calendar year, Jan – Dec, mantra of most reports, but instead a “Program Year” calendar, Aug – Jul. Now, at the face value this does not sound too bad, since the data comes out in calendar year format, it just means you need two reports to create the output and a Nested Join, in order to get the data from both sides of the report. But, what happens when the second report doesn’t exist yet? [And what about info that starts later in the “program year,” or drops off the list? More on that later.]
Thus was the challenge given to me by the Roundtable Team after they completed their “Effective Roundtables” class at the Philmont Training Center last week;-we like the Roundtable Analysis report, wouldn’t it be nice if (WIBNI) it was done using a program year calendar! This request came in at 2:23 p.m. on Friday.
So I took a two-phased approach. Phase 1: I built a rather standard system that would use the current year’s and the previous year’s Unit Health reports. The first version was sent out at 1:18 p.m. on Saturday. Not bad, after 22 hours 55 minutes had passed on the clock, it took me approximately 6 hours of development to knock out the report from scratch, including adding the update to use the “automatic load” option that many users like. (They put their data files in the same directory as the PowerPivot tool, and when you open the Excel spreadsheet it automatically loads the data for you.) Now on to the far more challenging Phase 2.
[At this point, I must admit, I disabled the “automatic load” since I was opening and closing the spreadsheet about 8 times an hour (since I like to save my work, and clear memory) … so I put in a VBA (Visual Basic for Applications, Microsoft’s “macro” language for Office) routine to handle running the update, once I pressed “Ctrl+Shift+U”. I have left this in during the testing phase.]
While the primary issue to address in Phase 2 is “what do we do when the 2nd data file is missing, or doesn’t exist,” there was a secondary issue which I had neglected to consider from Phase 1, “what happens when data starts showing up in March that was not there previously?” In Scouting, we need to remember that new units may start at any time, so now they show up in the data part way through the year, or, the do no recharter, so they drop from the data part way through the year, leaving us with null values in the data. Power Query’s M language does not consider null to be the same as zero, so we need to also consider some transformation steps (ETL – extract, transformation, load) in order to get all the data in order, and allow mathematical formulas to work.
Getting the system to work with only one data file took some “extra” work. First, a substitute data table needed to be created in memory to fill the void left by the missing file. Then there was the issue with getting the Power Query M language try statement to work.
So, in order for the file merge to work, it was necessary that there be a data table to merge with. That required that an “alternative” be created using the Power Query M language #table() function. This allowed me to create a table that would match the fields of the final output, and place some “temporary data” there so that it could be quickly identified and removed at the end of the merge process.
Next came the biggest issue, the Power Query try statement. Unlike other Power Query functions this one is not as well documented, and even looking at examples on-line from other developers was not a lot of help. Part of the process of importing a file the normal first step, after the file import is defined, is to promote the column headers using the Table.PromoteHeaders() function. After that is it simply a matter of defining data types for each field, removing blank rows, filtering out bad rows, and removing any unneeded columns (in order to save memory). Well, after much experimentation I found that the best way to use the try statement was to apply it to the Promote Headers statement, then to test for the error. If the error exists, then you use the alternate table, if not, you continue with the next step of the normal input process, taking care of the data typing for each column (referred to as “Changed Type” if you use the automatic step generation from Power Query.
Okay, we have overcome the limits of having only one file, so what happens to all the matches that don’t have data? They show up as null. Now, we have seen this ugly bug before, so we need to apply some transformations in order to wipe out the nulls. Fortunately Power Query’s M language has such a function (Table.ReplaceValue, JoinKind.FullOuter), so it was applied to every month’s attendance numbers. Instead of being null, they would be zero, and could be used in other numeric calculations.
How do you count the number of units who attended Roundtable from a column full of zeros? One of the issues to be overcome by DAX (the PowerPivot language) was the need to count the number of units that attended each month. Well, if you are using the DAX Calculate function, and you tell it to count all the rows where the number is greater than zero, but they are all zeros, it causes the Calculate() function to return a blank value. Much like in M with null, DAX does not consider blank to the be same as zero, which tends to throw off multiple calculations. So, in order to overcome this limitation, it became necessary to use the DAX If() statement in conjunction with the DAX Isblank() function to correct the error, and return the number zero when Calculate() would prefer to return blank. Another mathematical function saved for our final PowerPivot tables.
Yes, testing. While that may come as a surprise to some, considering all this work is being done by “volunteers” and not a paid IT team, I have collected a small team of “Data Dawgs” from various parts of the country who help me to validate that things are working properly. Now in addition to that team, I add the original sequesters (by customers) of this WIBNI, to see if we are meeting their needs. I know there are IT folks who would just as soon not ask their customers for their opinions, but in this case, this is where the rubber meets the road, and if it doesn’t work for them, then it simply won’t be used. The testing version of the Phase 2 work was sent out on Monday at 12:06 p.m. 46 hours 48 minutes after Phase 1, and 69 hours 43 minutes after the original request. Approximately 35 hours went into the development of both phases.
I must say, I am no “saint” when it comes to documenting the code I am developing, but I have decided it is very important that anyone who takes over this role, sometime on down the line, have the understanding of what was done. So I have been taking the time to ensure that I am thoroughly documenting my code.
- PP Tools, as previously mentioned – I used the “Calculated Columns,” “Measures,” “Memory,” and “Fields Never Used” tools to get a good understanding of the basis of the PowerPivot work.
- Contexture’s VBA code for documenting pivot tables. By adding this VBA code to my “ThisWorksheet” area, I was able to document each PowerPivot table by simply clicking on any cell in the PowerPivot table and running this macro from the Macros menu.
Sub ListPivotFields() Dim lRow As Long Dim wsList As Worksheet Dim pt As PivotTable Dim pf As PivotField Dim pi As PivotItem Dim strList As String Dim strLoc As String On Error GoTo errHandler Set pt = ActiveSheet.PivotTables(1) strList = "Pivot_Fields_List" Application.DisplayAlerts = False On Error Resume Next Sheets(strList).Delete On Error GoTo errHandler Set wsList = Sheets.Add lRow = 2 With wsList .Name = strList .Cells(1, 1).Value = "Caption" .Cells(1, 2).Value = "Source Name" .Cells(1, 3).Value = "Location" .Cells(1, 4).Value = "Position" .Cells(1, 5).Value = "Sample Item" .Cells(1, 6).Value = "Formula" .Cells(1, 7).Value = "Description" .Rows(1).Font.Bold = True For Each pf In pt.PivotFields If pf.Caption <> "Values" Then .Cells(lRow, 1).Value = pf.Caption .Cells(lRow, 2).Value = pf.SourceName Select Case pf.Orientation Case xlHidden strLoc = "Hidden" Case xlRowField strLoc = "Row" Case xlColumnField strLoc = "Column" Case xlPageField strLoc = "Page" Case xlDataField strLoc = "Data" End Select .Cells(lRow, 3).Value = strLoc .Cells(lRow, 4).Value = pf.Position On Error Resume Next If pf.PivotItems.Count > 0 Then .Cells(lRow, 5).Value = pf.PivotItems(1).Value End If On Error GoTo errHandler 'print the formula for calculated fields If pf.IsCalculated = True Then .Cells(lRow, 6).Value = _ Right(pf.Formula, Len(pf.Formula) - 1) End If lRow = lRow + 1 End If Next pf End With exitHandler: Application.DisplayAlerts = True Exit Sub errHandler: MsgBox "Could not create list" Resume exitHandler End Sub
If you’d like to see the documented file, it is available for download from OneDrive.
So, that’s my latest “adventure,” hope you found some enlightenment from the explanation of how it was accomplished.