Blank values

Missing, blank, or not-applicable data is important consideration for any data management system. This article describes in detail how ActivityInfo addresses "missing-ness" and how it compares to other systems and programming languages.

Introduction

ActivityInfo, like many data management systems, works exclusively with "typed" values. All fields and formula expressions in ActivityInfo have a type that determines the range of allowable values. A quantity field can never have the value "Bob", and a date field will never have the value 42. The same is true for a vector in R, or a column in a relational database column, or a SPSS variable.

However, there may be many reasons that a value might be absent:

  • A field might not be relevant for every record or row. For example, it might be the case that only refugees have a UNHCR registration number.
  • A field might not be required. Not all respondents to a questionnaire may have a mobile phone number.
  • A field might be added after the start of data collection, and we have no information on previously-collected subjects.
  • A survey respondent may refuse to answer specific questions, or may not know.
  • There might be an error calculating a value, for example, if you divide a number by zero, or provide an invalid regular expression to REGEXMATCH

For this reasons, each ActivityInfo formula type includes a special "blank" value. A numeric expression, for example, can be equal to any real number, or it can be blank. A date expression, can be equal to any date between 1000 CE and 9999 CE, or it can be blank.

Calculations involving blank values

In ActivityInfo, formulas, you can test for blank values using the ISBLANK. But blank values can also find their way into calculations. For example, you might have a form with two quantity fields for the count of COUNT_WOMEN and COUNT_MEN, as well as a calculated field TOTAL field with the formula COUNT_WOMEN + COUNT_MEN.

What happens if some records contain blank values for WOMEN, MEN or both? This is a design choice: in Excel, blank cells are treated as zeros in the context of arithmetic. In SQL, however, NULL plus any other number is NULL, and in R, NA plus any number is still in NA.

ActivityInfo follows Excel's convention in most case, which we judged to be more forgiving for non-technical users. So for the example above, the results might look like this:

Health Center WOMEN MEN WOMEN + MEN ISBLANK(WOMEN + MEN)
A 300 150 450 FALSE
B 160 160 FALSE
C 75 75 FALSE
D TRUE
E 220 90 310 FALSE

As you can see in the table above, the TOTAL calculated field is only blank when both WOMEN and MEN are blank.

Empty text strings

In some programming languages, there is a difference between an empty text string ("") and NULL or missing value. For example in SQL, "" IS NULL is false, and in R, is.na("") is also false.

This distinction can often trip up users without a programming background, and so ActivityInfo avoids this distinction: an empty text string "" is identical to a blank text value:

ISBLANK("") == TRUE

During data collection, any input to a text field will be stripped of leading and trailing spaces. That means if you type only spaces into a text field, the field's value will be blank:

Screenshot of data entry form with calculation
Screenshot of data entry form with calculation

Try it yourself

Leading and trailing space is not stripped from values in formulas, which means that text with only whitespace is not considered blank:

ISBLANK("  ") == FALSE