Thursday April 13, 2023

ActivityInfo R package updates

  • Host
    Alexander Bertram
  • Panelist
    Ryo Nakagawara
  • Panelist
    Nicolas Dickinson
About this webinar

About this webinar

Are you working with R and data analysis in ActivityInfo or do you wish to learn more about the ActivityInfo R package?

With the support of ACDI/VOCA, we've undertaken a series of major improvements to ActivityInfo's R package. With version 4.33, the package includes various new functions. During this one hour session, together with Mr. Ryo Nakagawara from ACDI/VOCA and Mr. Nicolas Dickinson from WASHNote, we take a look at new functions of the ActivityInfo R package.

In summary, we discuss:

  • Integration with the tidyverse
  • Bulk importing data
  • Adding and manipulating form schemas
  • Querying large datasets with filters
  • Documentation overview

To get started with the ActivityInfo R package please take a look at the webinar Introduction to R and the ActivityInfo R package

We'll be continuing to make improvements to the package based on your feedback so make sure to visit the GitHub repo and "star" and "watch" the repository for updates.

Is this Webinar for me?

  • Are you responsible for information management in your organization and do you use the ActivityInfo R package?
  • Do you have an admin role in the platform or wish to expand your knowledge to enrich your capabilities?
  • Are you leading data analysis for data collected in ActivityInfo from partner organizations or colleagues?

Then, watch our webinar!

About the Presenters

About the Presenters

Mr. Alexander Bertram, Executive Director of BeDataDriven and founder of ActivityInfo, is a graduate of the American University's School of International Service and started his career in international assistance fifteen years ago working with IOM in Kunduz, Afghanistan and later worked as an Information Management officer with UNICEF in DR Congo. With UNICEF, frustrated with the time required to build data collection systems for each new programme, he worked on the team that developed ActivityInfo, a simplified platform for M&E data collection. In 2010, he left UNICEF to start BeDataDriven and develop ActivityInfo full time. Since then, he has worked with organizations in more than 50 countries to deploy ActivityInfo for monitoring & evaluation.

Mr. Ryo Nakagawara is an experienced R developer and data engineer/scientist with experience in international development and soccer analytics, currently residing in Japan. Ryo's strengths lie in building data pipelines by creating and maintaining R packages, scripts, reproducible reports, dashboards, and more. Ryo also has experience managing large codebases/projects in open-source and enterprise environments on GitHub. Outside of work, Ryo regularly contributes to both fun and serious open-source projects as well as being an editor on the "R Weekly" newsletter. You can find Ryo on Github or LinkedIn.

Mr. Nick Dickinson works on the monitoring of primarily water, sanitation and hygiene services (WASH) and works to strengthen collaboration between the public, non-profit, and private sector communities (national governments, UN agencies, service providers, wikipedia, etc.) to improve access to high quality and relevant information on services and factors influencing their sustainability. In addition to working on the ActivityInfo R package, he maintains an R package jmpwashdata that brings together 300+ excel sheets published by the Joint Monitoring Programme of UNICEF and WHO to improve access to data on WASH services.

Transcript

Transcript

00:00:00 Introduction and motivation

Alex: Hello and welcome. I see some familiar names in the chat and the participant list. I'm really excited to present the result of a couple of months of work with Ryo and Nick on some updates to the ActivityInfo R package. We see more organizations using R with ActivityInfo, and we wanted to make sure that the package was easy to use and flexible enough for those growing needs. We were very fortunate to have support from ACDI/VOCA, one of our users of ActivityInfo, who contributed financially to this project. I would like to ask Ryo to say a few words about the motivation for that and ACDI/VOCA's support.

Ryo: Thanks, Alex. At ACDI/VOCA, we have a large number of R users, both participants and staff. We use the package for sustaining database admin tasks, permission setup, querying the audit log, keeping track of users, and creating visualizations based on database metadata to track the progress of our forms.

The first priority for us in getting this project going was to fix a number of bugs that we encountered throughout our extensive use of the package in the past two years. This project was about coming together with Alex and Nick to establish what issues to work on from a list of potential improvements. We spent around six months working together, and we are excited to share this new version of the ActivityInfo R package. The improvements are not just for ACDI/VOCA, but for the broader ActivityInfo user community.

00:03:34 Documentation overview

Alex: I want to start by giving an overview of the new documentation, as that was a big part of this project. If you go to our website under Support and then R package reference, you will see that it is fully searchable. We now have vignettes online, such as an introduction to ActivityInfo and examples around adding and manipulating database forms. Each of the functions is documented here with examples on how to use them.

There is a tutorial available that I recommend you look at if you haven't used the R package or want to see what is possible. It takes you through all the functionality available in the package, from installing and authenticating to working with forms, records, and users. I also encourage you to visit the GitHub repo to find our latest releases and release notes. Please "watch" or "star" the repository to get notifications when we release new updates.

00:07:02 Getting records with the tidyverse

Nick: I want to show you a couple of functions that help us get records in an easier way and allow us to manipulate forms. To start with getting records, we have a new function called getRecords which is user-friendly and tidyverse compatible. This function is effectively a replacement for what we have been doing with queryTable.

In the example shown, I am loading the dplyr package and using the pipe operator to chain functions. I use the form ID to get records and then select columns. The final step is always to use the collect() function to download the data frame. Unlike queryTable, where you had to provide column names yourself, getRecords automatically applies the column names as you see them in the web user interface.

Once collected, you can use R functions as usual to filter, arrange, or slice the records. It is also possible to select columns, arrange data, and filter data before you download it. This is useful for very large datasets. However, there are some limitations; this only works with a few functions like select, filter, arrange, slice, and slice_tail. Currently, you can only sort on a single column before collecting.

You can also change the types of columns you get using the style argument. The default is the user interface columns plus ID columns, but you can ask for only user interface columns or define a different style, such as including columns from reference tables using prettyColumnStyle with allReferenceFields set to true.

00:16:50 Q&A: Naming conventions and database compliance

Alex: We have a question from Amadou about the use of snake_case versus camelCase. ActivityInfo uses camelCase, while the tidyverse community often uses snake_case. We decided not to break existing code, which is why we have a mix. If you are using an ActivityInfo function, it will be in camelCase, and if you are using a tidyverse function, it will be in snake_case.

Amadou: I am really excited about this package. Regarding the naming, I understand the legacy constraints. I am happy to see the progress and the unified approach. My question is whether you plan to make it fully DBI-compliant like other DBMSs?

Nick: It is inspired by dbplyr. We use the web API for this instead of SQL syntax, so we can't do it exactly like dbplyr, but we stick to the conventions and define lazy data frames using the same patterns and verbs.

Alex: Someday we hope to add a SQL interpreter in front of the API. In the meantime, this is a big step because filtering, sorting, and limiting are operations you want the server to do so you don't have to pull all the data down to the client.

00:21:54 Manipulating forms

Nick: I want to give a quick example of how you can create a form from scratch. We can chain different functions together. You create a form schema, give it a database ID and a label, and then add various fields using addFormField. There are many field schemas available, such as text fields and multi-select fields. You can provide options as a vector, which is a nice new addition.

After creating the object, you can manipulate the elements, such as sorting them to change the order of questions. You can also delete fields using deleteFormField by providing the code, label, or ID. Finally, you use addForm with your form schema to upload it.

Another pattern useful for exploration is extracting a schema from existing forms. You can get an existing form using getRecords, select specific fields, and then extract a schema from those fields to create a copy. This can be useful if you want to create a new form based on parts of an existing one.

00:27:20 API messaging and data frame outputs

Ryo: One of the newer features we worked on is API messaging. By default, messages are set to false. You can change these using the options() function. If you set activityinfo.verbose.requests to true, it will show the request message. If you turn on activityinfo.verbose.tasks, it will show both the request and the task status. This is helpful for logging and debugging.

Regarding data structures, R users prefer data frames, whereas ActivityInfo's backend uses JSON (nested lists). Now, functions like getDatabases, getDatabaseUsers, getRecordHistory, and getDatabaseResources can return a data frame instead of a list. This makes it much easier to handle the data in R. If you prefer the list format for legacy scripts, you can set the asDataFrame argument to false.

00:33:55 Creating forms from existing datasets

Ryo: We can now create form schemas directly from data frames in R. For example, if you have a dataset like the Palmer Penguins with several columns, you can use createFormSchemaFromData. This function automatically generates the form schema based on the data, including labels and required columns.

If you set upload = TRUE, it will automatically push the form to the database. This is extremely useful for large datasets with many columns, saving you from having to manually create each field in the user interface. You can pass any kind of data into this function, such as data from Excel or CSV files. Once the form is created, you can use importRecords to push the actual data into the new form.

00:41:22 Migrating field data

Ryo: Another feature is migrating field data to a newly transformed field. For example, if you have a quantity field that you decide should be a text field, you can download the schema, add the new text field, and update the form schema.

To move the data, you can use migrateFieldData. This allows you to transfer data from the old field to the new one. You can specify a transformation function to ensure the data is converted correctly, such as changing numeric values to characters. This simplifies the process significantly compared to the previous method of downloading and re-importing data.

00:46:11 Importing records

Alex: The importRecords function is a tool for pushing data into an existing form. It is useful if you have a form fed from different sources or a regular flow of data. When working with the API, I recommend assigning a field code to each of your fields.

In R, you can create a data frame and use importRecords with the form ID. You can optionally pass in the record ID to update data. If you have a key field (natural key), the package will handle updates automatically by matching on the key fields to find the record ID.

For reference fields, such as a reference to provinces, the API expects the ID of the record, not the name. If you are importing data with reference fields, you will need to match the names to their IDs before importing.

00:53:30 Conclusion and Q&A

Alex: We are really keen to have feedback from everyone using R. Please try out the latest version from GitHub and let us know what you think by opening issues.

Nick: Regarding a question about whether the new functions help build forms in exactly the same way as the UI: mostly yes. The web UI does some error checking and validation that the R package might not do as strictly yet. Also, features like auto-complete for formulas aren't there yet.

Alex: The next release of the ActivityInfo server includes more robust validation of forms to prevent invalid states. Everything you can do in the UI can be done through the API.

Question: Do you plan to release it to CRAN?

Ryo: We automate checks, so in principle, we should be able to submit. We have to be careful with CRAN's testing policies regarding API hits, but it is definitely doable and would make sense for broader exposure.

Alex: Thanks everyone for joining. Please give it a try and let us know your feedback.

Sign up for our newsletter

Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.

Which topics are you interested in?
Please check at least one of the following to continue.