Tuesday December 17, 2024

ActivityInfo R package updates

  • Host
    Alexander Bertram
  • Panelist
    Nicolas Dickinson
About the webinar

About the webinar

Are you working with R and data analysis in ActivityInfo or do you wish to learn more about the ActivityInfo R package?

With versions 4.37/4.38, the package includes various new functions focused on enhanced user management and role support. During this one hour session with Mr. Nicolas Dickinson from WASHNote, we take a look at new functions of the ActivityInfo R package.

In summary, we discuss:

  • Grant-based role support
  • Integration with the tidyverse
  • Bulk importing data
  • Adding and manipulating form schemas
  • Querying large datasets with filters
  • Documentation overview

To get started with the ActivityInfo R package please take a look at the webinar Introduction to R and the ActivityInfo R package

We'll be continuing to make improvements to the package based on your feedback so make sure to visit the GitHub repo and "star" and "watch" the repository for updates.

View the presentation slides of the Webinar.

Is this Webinar for me?

  • Are you responsible for information management in your organization and do you use the ActivityInfo R package?
  • Do you have an admin role in the platform or wish to expand your knowledge to enrich your capabilities?
  • Are you leading data analysis for data collected in ActivityInfo from partner organizations or colleagues?

Then, watch our webinar!

About the Presenters

About the Presenters

Mr. Nicolas Dickinson works on the monitoring of primarily water, sanitation and hygiene services (WASH) and works to strengthen collaboration between the public, non-profit, and private sector communities (national governments, UN agencies, service providers, wikipedia, etc.) to improve access to high quality and relevant information on services and factors influencing their sustainability. In addition to working on the ActivityInfo R package, he maintains an R package jmpwashdata that brings together 300+ excel sheets published by the Joint Monitoring Programme of UNICEF and WHO to improve access to data on WASH services.

Mr. Alexander Bertram, Executive Director of BeDataDriven and founder of ActivityInfo, is a graduate of the American University's School of International Service and started his career in international assistance fifteen years ago working with IOM in Kunduz, Afghanistan and later worked as an Information Management officer with UNICEF in DR Congo. With UNICEF, frustrated with the time required to build data collection systems for each new programme, he worked on the team that developed ActivityInfo, a simplified platform for M&E data collection. In 2010, he left UNICEF to start BeDataDriven and develop ActivityInfo full time. Since then, he has worked with organizations in more than 120 countries to deploy ActivityInfo for monitoring & evaluation.

Transcript

Transcript

00:00:01 Introduction and agenda

Nicolas Dickinson: I wanted to start right away in RStudio. RStudio is the software that you see in front of you, and generally, this is the software that people use to write R code. There is a really great introduction video that was also in the description of this webinar where Alex gives an excellent introduction covering everything from authentication to simple procedures. What we are going to do today is jump a little bit quicker into what you can do with the ActivityInfo R package. There have been some exciting changes in the last year. I have been working with Alex and the team once again on the package after some updates we did in 2023.

I will talk briefly about what changed in the last year. For those of you who already know the R package, you will hear very quickly what happened differently. I will go into a little introduction of the R package, but I won't go very deep, so make sure you hit that link in the description of this webinar. Then we will go into some form manipulation and data download, and then we will talk about grant-based roles. In fact, it won't exactly be in that order because we will be mixing these things as we build a database from scratch.

This will be useful for you to see just what you can do with the R API. Certainly, when you want something user-friendly, you should use the ActivityInfo web-based interface, which works really well online and offline. But when you need to do something like bring in data from an external system or do some data analysis, this R package can be really powerful. Specifically, if you want to automate it and have things that are repeatable and run on a regular basis, the R package with all the new additions can be super useful.

What changed? First of all, we have added grant-based roles support. If you haven't heard about that, there are a few really great webinars that cover how grant-based roles work. You will see a little bit of that today as we build some grant-based roles. There are some new tutorials and vignettes on how to work with grants and grant-based roles, as well as advanced user management, such as adding and deleting users.

The getRecords function, which is used to download data from ActivityInfo, has become more robust. It can handle column duplications and deeply nested references—references that may loop back on themselves or go very deep. There are also new billing account functions to manage databases and get more information about your databases, as well as improved credential management. For those using both the ActivityInfo server live on the web or a self-hosted version, there is a way now to store your credentials separately for each.

Coming up next, I have already started some work that is under review for being able to use ActivityInfo formulas to filter and mutate before we even get the data from the server. That is really exciting because we can use all of the formulas that ActivityInfo has available on the web interface. We are also working on column auto-completion and expansion so that you can reach into reference fields, for example, going to a subform to get a child's name.

00:05:34 Setup and libraries

We are using the tidyverse. The two main packages here that we are using are activityinfo and tidyverse. Within tidyverse, we are using dplyr, tidyr, and purrr. I have put them out here explicitly, but in principle, if you just run library(tidyverse), that will also load all these additional features that we are working with.

In order to install the package, you will need the package remotes. It is quite easy to install using the console in RStudio. It will download the package from the CRAN repository, and then we can use the remotes package to install the activityinfo package. If you have already installed activityinfo, you may not have the vignettes because you need to tell it to build them. If you want the full documentation, go ahead and install from GitHub. You can also tell it to reinstall even if you have the latest version just so that it builds the vignettes. Once installed, you can use browseVignettes() to open the browser and view the different tutorials available, such as adding and manipulating databases, advanced user management, and analyzing and visualizing ActivityInfo data.

You can also log in with a token. If you haven't ever used the API or ActivityInfo in R, you may need to go to ActivityInfo.org and get a token under Account Settings > API tokens. You can add a new token there, and you will get a long string that allows you to access the database.

To use ActivityInfo, we need to start the library and use the tidyverse. One useful feature since 2023 is the ability to turn on and off messages for debugging. If you want to know more about what is going on behind the scenes and what requests are going to the server, you can turn this on.

00:09:58 Creating a database

We are going to go ahead and create a new database using the addDatabase function.

Alexander Bertram: I want to mention that if you do not have permission to add a database yourself—if you open ActivityInfo and don't see the "Add database" button—you can start a free trial. Even if you are an existing ActivityInfo user, you can create a free trial account. That will be separate from any databases you have been invited to, but it will give you 30 days to play around and create databases.

Nicolas Dickinson: I am going to run this bit of code. addDatabase is getting a new database and giving it a label. We are using the sprintf function to replace the string with the date and time. If all is well, I should be able to go to ActivityInfo and see the new database with the date and time.

We can do things with our database object, newDB. We can assign it to another variable to get the database ID. That is going to be really handy because with the database ID, we can do all kinds of things, like creating a form.

00:12:03 Working with forms and schemas

There are a lot of ways of creating forms. I am going to show you two different ways. The slightly more classical way is to create a formSchema object and give it a label. The database ID needs to go in, and you have elements, which is a list of different fields. Just like on the web interface, you have a whole bunch of different fields. The vignette has links to the documentation for each type of field. If you want to know more about what you can do with these, you can use the question mark and the name of the function in R to open the description.

This is a simple form with a text field ("What is your name?"), two single select fields ("What is your sex?", "Are you pregnant?"), and a relevance rule. In this case, there is a reference code on the field. These codes allow us to easily write relevance rules and formulas, and they are also useful for roles. Note that the formula language used here is ActivityInfo's formula language, even though you are writing code in R.

Another way of getting this form schema is to first create a formSchema with just the label and database ID. Then, we can use the addFormField function and the pipe operator (|>) to pass the object to the next function. The handy thing about this is you can just keep adding them. We will add an anonymous feedback form and save it to our optionalForm variable, and our survey form to the surveyForm variable. I will jump back to our database and refresh it to show that the anonymous feedback form and the new survey are there.

00:16:30 Database tree and role management

A thing that you will find useful for a lot of different functions is to get the database tree. The database tree contains all the metadata, such as who owns the database. We can also use the database tree to get all the roles. This jumps into the grant-based roles.

When we get the roles, they have different columns like the ID of the role and the label. The ID of the role is something you can set in R. If you add a new role in the interface, it will assign a universal ID starting with "c". If you are managing many databases, choosing your own role IDs can be useful for aligning roles across different databases.

We have a few different objects packed into lists. For example, for data entry, there is one grant to access the database. We may want to expand different kinds of permissions. At the level of the role, we have administrative permissions like managing users, roles, and automations. These are operations at the level of the database, not under a grant.

We may also want to list our role grants. A grant is associated with a resource, which can be a database, folder, or form. We can unpack the resource IDs for those grants. In this case, the resource ID is the database ID. We haven't applied this role to a specific form yet. We can expand our operations to see permissions like view, discover, edit, add, delete, and export records. For read-only, we can only view and discover. Discovery allows the user to see the forms in the listing. Sometimes it is useful to take away discovery permission so data entry users don't see reference folders they don't need to access.

You can copy and paste this code to repeat this on your own databases and get a quick overview in a table, which you can save as a CSV file. We can also retrieve a single role using the database tree. If we filter only for roles that have the ID "read-only", we can pull that out and look at its structure, including ID, label, permissions, parameters, and grants.

00:23:57 Bulk user management

Let's look at a use case where we use R to do operations that would be tedious in the web interface, such as adding a bunch of new users. I am going to add my own email address 10 times to the database. We can inspect our users object and see a table with name and email. You could load this from a CSV file, Excel file, Active Directory, or Microsoft Azure. If you have a way to translate groups and membership in your directory to ActivityInfo databases, you can automate access permissions.

In this example, we have a default role where everyone gets read-only access. I will go to the database settings and user management to double-check that this worked. Indeed, there are invites pending for all these users with the read-only role. We can also inspect our roles and get the database users back as a table, showing invite date, delivery status, and whether the invite was accepted.

I decided to unnest the roles so we don't just have a role object but get the role ID. At the moment, there are no role parameters or role resources because these are optional resources. The vignettes contain examples for creating roles with different levels of permissions, adding optional form access, and creating partner and reporting forms.

00:28:37 Fetching and filtering records

We are using the tidyverse and pipe operators with the getRecords() function. getRecords() returns a special object that looks like a table but is actually a reference object containing metadata like fields and record counts. It is not yet data on your computer; it is a preview.

We can use collect() to get the actual data. If we get this table and only want fields that end with "name", we can use select() with ends_with(). This uses dplyr verbs similar to SQL statements. We can also use filter() and arrange() (sorting).

We can filter before downloading, which is handy if you are working with hundreds of thousands of records. This saves bandwidth and server load. getRecords() returns columns as you see them in the interface, plus an ID column and lastEditTime. You can set a style, such as minimalColumnStyle to remove underscore columns, or allColumnStyle to expand all reference columns. Expanding everything can be costly, but it allows you to see nested data like location.state_region.

You can combine select() to only get fields with an underscore and "location" in the name to reduce the number of fields while keeping reference records. If you additionally filter() on a specific location name, you get fewer records. This demonstrates how to filter large datasets before downloading.

There are some limitations regarding the order of operations. We suggest arranging first, then selecting a number of records (head/tail), and finishing with collect(). The function is lenient and will warn you if you use a different order, but it will try to reduce side effects.

00:38:53 Documentation and resources

You can find more information in RStudio by typing browseVignettes(package = "activityinfo"). There is also a user interface in the "Packages" tab where you can click on activityinfo to see the vignettes. These include tutorials on inviting users, assigning roles, using optional resources, and generating CSVs of users and roles.

Alexander Bertram: We have the same resources available on the website. If you don't want to download R first, you can go to the ActivityInfo documentation, look under the Integration section, and find the same tutorials there.

00:44:00 Billing account functions and closing

Nicolas Dickinson: Regarding billing account functions, I usually start with getDatabases to get a list of all databases with their IDs and billing account IDs. We can then use getBillingAccount with a billing account ID. A billing account represents your organization. If you are working in an organization using ActivityInfo across many countries, billing accounts can be useful for pulling in data about your organization, usage, and data governance. It shows information like the number of databases and users. getBillingAccountDatabases will give you all the databases across your whole organization if you have permission at the organization level.

Alexander Bertram: R has always been near and dear to our hearts. We are interested in hearing from you about how you are using ActivityInfo. Is the API your preferred method? Are you using Python, Power Automate, or other languages? We want to know how to best support users doing technical integration work. Let us know what you are looking for in the coming year.

Nicolas Dickinson: You can star the GitHub repository. If you want to know what is coming up, there are development branches where you can explore the latest features, including expanding into reference columns and autocomplete.

Alexander Bertram: I put a link in the chat to the Quarto document for the text, which will give you a chance to review the code. I encourage you to take a look at the intro webinar which takes you step-by-step through setting up R. Thanks everyone for joining.

Sign up for our newsletter

Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.

Which topics are you interested in?
Please check at least one of the following to continue.