SUMMARIZE

Description

The SUMMARIZES function summarizes its input table, grouping by one or more columns.

Usage

SUMMARIZE (<Table Expression>, <groupBy_columnName>[, <groupBy_columnName>]…[, <name>, <expression>]…)

The summarize function evaluates to a new table. The new table will include any columns that are grouped upon, as well as the new expressions.

Examples

Avoid double count activity reports

If you have a programme with the goal of promoting certain nutritional practices for mothers of children five and under, and have partners reporting aggregated output indicators to you such as:

  • Number of mothers reached through door-to-door outreach (INPERSON)
  • Number of estimated mothers reached through radio broadcasts (RADIO)

You may want to estimate your overall reach by combining these two indicators. However, if you have one partner conducting door-to-door outreach and another partner running radio broadcasts in the same geographic area, you don't want to double count those indicators.

Partner District INPERSON RADIO
A North 500
A South 600 1,000
B North 2,000
B East 5,000
C West 300

In this case, if you added up all of these indicators, you would risk double counting the outputs in the Nothern district where 500 mothers were reached in person, but 2,000 were reached via the radio. It's likely that many mothers received both an in-person visit and heard the message on the radio.

One solution to this is to first add up the results at the district level, and take either the in-person numbers, or the radio numbers, which ever is greater.

We'll first use the SUMMARIZE function to compute a new table with SUMS of the two fields by district.

SUMMARIZE([Partner reports], Province, "TOTAL_INPERSON", SUM(INPERSON), "TOTAL_RADIO", SUM(RADIO))

This gives us a new table that looks like:

District TOTAL_INPERSON TOTAL_RADIO
North 500 2,000
South 600 1,000
East 5,000
West 300

Now to find the total reach, we can use the SUMX function to calculate the MAX of each row, and then sum the maxes:

SUMMARIZE([Partner reports], Province, "TOTAL_INPERSON", SUM(INPERSON), "TOTAL_RADIO", SUM(RADIO)) |>
  SUMX(MAX(TOTAL_INPERSON, TOTAL_RADIO))

This gives us a final result of 2,000 + 1,000 + 5,000 + 300 or 8,300.

Tracking changes in health status over time

Suppose you are tracking the number of sick family members across households over time. You have a form “Household Health Check-in” that contains records for the number of sick family members on a given date of reporting as follows:

Family Number of sick family members Date of reporting
Ahunna 3 2023-01-15
Bilal 1 2023-02-10
Ahunna 4 2023-01-20
Bilal 2 2023-01-15

You can define a calculated table that shows the earliest and latest reported number of sick family members for each family using the following formula:

SUMMARIZE([Household Health Check-in], [Family], 
    "Baseline", FIRST([Number of sick family members], [Date of reporting]),
    "Final",    LAST ([Number of sick family members], [Date of reporting])) |>
ADDCOLUMNS(
    "Change", Final - Baseline,  
    "Pct change", (Final - Baseline) / Baseline)   

The result would be a table:

Family Baseline Final Change Pct Change
Ahunna 3 4 1 33%
Bilal 2 1 -1 -33%
Next item
SUMX