Thursday September 22, 2022

Introduction to R and ActivityInfo

  • Host
    Alexander Bertram
About the webinar

About the webinar

This Webinar is a one-hour session part of the 2022 ActivityInfo Training Webinar Series. These Webinars are ideal for users of ActivityInfo who wish to master various features and aspects of the platform for their daily work in Monitoring and Evaluation data collection activities or information management tasks.

During this session we work with the ActivityInfo API and the R programming language.

In summary, we discuss:

  • Short introduction to the R language and the connection to ActivityInfo
  • Retrieving data from ActivityInfo with R
  • Updating ActivityInfo with R

View the presentation slides of the Webinar.

View the source code used in the Webinar.

Is this Webinar for me?

  • Are you responsible for information management in your organization?
  • Do you have an admin role in the platform or wish to expand your knowledge to enrich your capabilities?
  • Are you leading report design for data collected in ActivityInfo from partner organizations or colleagues?

Then, watch our Webinar!

About the Trainer

About the Trainer

Mr. Alexander Bertram, Technical Director of BeDataDriven and founder of ActivityInfo, is a graduate of the American University's School of International Service and started his career in international assistance fifteen years ago working with IOM in Kunduz, Afghanistan and later worked as an Information Management officer with UNICEF in DR Congo. With UNICEF, frustrated with the time required to build data collection systems for each new programme, he worked on the team that developed ActivityInfo, a simplified platform for M&E data collection. In 2010, he left UNICEF to start BeDataDriven and develop ActivityInfo full time. Since then, he has worked with organizations in more than 50 countries to deploy ActivityInfo for monitoring & evaluation.

Transcript

Transcript

00:00:00 Introduction

Welcome to an introduction to R and ActivityInfo. That's quite a broad topic. So today, we're going to introduce you to R. If you've never heard of R before, or you don't know what it is, we'll give you a small taste of what it is, what it can do, and why you should use it. We'll be talking about how to connect ActivityInfo, looking step by step and going through some of the mechanics.

For those of you who know R very well, this might be useful if you want to know how to use ActivityInfo with R. I'll try to reserve time at the end for questions so we can go back and look at some other things in detail and also leave you with some additional resources. Just a couple of housekeeping notes: this webinar is being recorded; the recording will be posted to our website shortly after the webinar finishes, and we will make all of the slides and links available. If you do have questions, it would be super helpful if you put them in the Q&A box in Zoom, rather than the chat.

00:02:01 What is R?

R is an open-source programming language and environment for statistical computing. It is a language for describing, manipulating, and cleaning data. It's a language that's been designed by statisticians, and it's really designed to be easy to use with data, to be interactive, and it makes it very pleasant to work with. It has its origins in the S language, which was developed in the 1980s at Bell Labs as something easier than Fortran to write code for statistics.

Today, it's become the de facto language for statistics and data analysis worldwide. The user-contributed packages—bundles of code that you can use and reuse—and the repository called CRAN (Comprehensive R Archive Network) has over 18,000 packages today. That ranges from everything from forecasting, statistics, machine learning, fisheries statistics, agricultural statistics, to geospatial analysis. Whatever you can imagine, there's a package out there for it.

RStudio launched about a decade ago. R is just a language, and RStudio is a user-friendly integrated development environment for the R language. This product has become so successful and widely used that many people confuse it for R. It's not R itself, but it's a very nice way to use the R language. Finally, the Tidyverse is a set of those packages that has also become very popular and widely used, becoming a standard in data analysis. I'm going to be using a lot of the tidy packages in today's presentation.

00:06:20 Getting set up

You're going to need to install two separate programs. The first is R itself, and the second is RStudio. Once you've got RStudio up and running, you will see four panes. The upper left-hand area is where you can write scripts. The console is where you can type things in and get an immediate result. On the right-hand side, you have the environment which shows the variables you've defined, and the bottom right shows the list of files in your project.

A lot of the useful stuff in R comes in packages. You'll want to run commands to install the activityinfo R package, which allows you to easily retrieve and update data from ActivityInfo, and the tidyverse, which is the series of packages for data analysis.

00:10:02 Connecting to ActivityInfo

You need some way for ActivityInfo to know that this R script is allowed to access your data. We're going to use a personal API token for this. In ActivityInfo, go to the profile menu, then profile settings. On the left-hand side, you'll see a space for API tokens. You can add a new API token. You can change your token to be either read-only or read-write. If you want to update the data you have in ActivityInfo, you should choose read-write. As a rule, you should consider doing one token per laptop and application.

Once generated, copy the token. In RStudio, use library(activityinfo) to load the package. Then use the activityInfoLogin() function. It will ask for your email address and password. For the password, paste the API token you copied. Do not put your real password. This will save it to your local laptop so that you don't have to do this step again.

Once connected, you can type getDatabases() in the console. It retrieves a list of databases that the authenticated user owns or that have been shared with them. It prints out the database ID and the label of the database.

00:15:07 Retrieving data

An R script is composed of statements and expressions. You can run a statement by hitting Ctrl+Enter. In R, you can work with numbers and strings, and assign these to variables using the arrow operator (<-). One thing that might be a surprise if you come from other languages is that the dot can be part of the variable name. Everything in R is an array or a vector. Even a single number is an array of length one. R has powerful tools for working with these arrays.

Most data comes in sets, which is called a data frame in R. A data frame is like a set of named arrays. You can bring in data from ActivityInfo in R format easily. In ActivityInfo, click on 'Export from the API' and select 'Query using R'. Copy the code to the clipboard and paste it into your R session. This creates a query of exactly the columns you have in the table view. When you run it, it pulls that data from ActivityInfo into a data frame. You can then start to manipulate or plot the data in R, or write it out to a CSV file using write_csv.

00:23:36 Filtering and selecting columns

If you start to work with very large tables, you might want to filter that down before you bring it into R. You can add a filter to the queryTable function to include only the fields that you want. This requires writing a formula in ActivityInfo's formula language as a string. For example, you might want to filter by a specific sub-sector name.

If you want to combine a variable that you have in your R session with the filter string, you can use the sprintf function to format the string. This allows you to loop over variables, such as sectors, and pull out one sector at a time. You can also control the columns that you have present by modifying the select arguments in the query, which is very useful for large datasets.

00:29:44 Updating ActivityInfo with R

Now we will look at bringing data into ActivityInfo using three functions: addRecord, updateRecord, and importTable. The first two functions allow you to manipulate one record at a time, while importTable allows you to bring in a whole data frame.

Let's create a simple form in ActivityInfo with a name, age, and a select field for legal status. When designing forms for the API, I recommend using codes to make it easier to reference fields. To add data, you need the form ID, which can be found in the URL.

00:32:33 Adding and updating records

We use addRecord with the form ID. If it is not a subform, set parentRecordId to empty. You pass in a list of field values (e.g., name = 'Alex', age = 42). Running this adds the record to ActivityInfo.

To update a record, you need the internal record ID that ActivityInfo uses to track record identity. You can find this in the UI by selecting columns and dragging the record ID onto the screen. Use the updateRecord function with the form ID, the specific record ID, and the list of fields you want to update. You only have to pass in the fields that are being updated.

00:36:06 Handling reference fields

When you have a reference field, ActivityInfo stores an ID, not a name. If you want to update a reference field via the API, you must use the record ID of the referenced item. You cannot just use the name (e.g., "Province A").

To handle this, you can build a lookup table. Use queryTable to get the IDs and names of the reference form (e.g., the province list). Create a named list in R where the names are the province names and the values are the IDs. This allows you to look up the ID by name in your script before sending the update to ActivityInfo.

00:42:50 Handling geographic points

If you have a geographic point field, it requires special handling. You need to provide the value as a named list containing latitude and longitude. For example: location = list(latitude = 52, longitude = 4). ActivityInfo treats the geographic point as a single field, not two separate fields.

00:44:40 Importing tables

The importTable function helps automate importing data in bulk, similar to the ActivityInfo importer but via the API. This is useful if you have data cleaning to do first or are migrating data.

First, load the data into R (e.g., using readxl for Excel files). You may need to clean the data to match your ActivityInfo schema. This often involves renaming columns to match the field codes in ActivityInfo using dplyr::rename. You also need to ensure that values for select fields match exactly; if the source data has "Direct or Indirect" but the form only allows "Direct" or "Indirect", you must recode or filter that data.

For reference fields in an import, you again need to replace names with record IDs using a lookup table approach. Finally, when using importTable, you can optionally provide your own record IDs if you want to preserve IDs from an external system or avoid duplicates when re-running scripts.

01:02:18 Q&A

Question: Is there a way to improve the time required for import? Answer: If you are uploading a large number of records, avoid using addRecord and updateRecord individually because of the connection overhead. Use importTable. If you have more than 200 records, it uploads a file to the server and runs in the background, which is much faster. We are also working on performance improvements.

Question: Do you plan to develop functionality to create new forms directly from R? Answer: Yes, the addForm function allows you to add a form directly to ActivityInfo or update the schema from R. You need to construct the schema as a list of elements (ID, label, type, etc.) and pass it to the function. You can use getFormSchema on an existing form to see the structure required.

We have scheduled additional sessions on R, including managing large codebases and a workshop for R developers. Thank you for joining us.

Sign up for our newsletter

Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.

Which topics are you interested in?
Please check at least one of the following to continue.