Thursday February 3, 2022

Bridging the gap between databases and mobile data collection in M&E

Host

Alexander Bertram

About the webinar

The ActivityInfo M&E Webinar series is back and we start the year with a session on relational databases and mobile data collection tools used in Monitoring and Evaluation.

This Webinar is a one-hour session ideal for Monitoring and Evaluation professionals who are interested in exploring main concepts around relational databases and the link between a database and data collection apps.

Some of the points we cover are:

What a relational database is, and why it is important for complete and accurate tracking of results
Excel compared to a relational database
Differences between XLSForm (‘typical mobile data collection tool’) and the relational data model
Explain how ActivityInfo’s integrated mobile data collection tool can bridge the gap more efficiently and improve data quality

You can also download the presentation.

Is this Webinar for me?

Are you an M&E practitioner responsible for designing tools for your organization or your programmes?
Do you wish to explore the possibilities of relational databases but don't know where to start?
Are you interested in constantly improving your systems and helping your team make the right decisions based on real-time developments?
Are you considering alternatives to spreadsheets, paper-based data collection or multiple parallel systems working together?

Then, watch our Webinar!

About the speakers

Mr. Alexander Bertram, Technical Director of BeDataDriven and founder of ActivityInfo, is a graduate of the American University's School of International Service and started his career in international assistance fifteen years ago working with IOM in Kunduz, Afghanistan and later worked as an Information Management officer with UNICEF in DR Congo. With UNICEF, frustrated with the time required to build data collection systems for each new programme, he worked on the team that developed ActivityInfo, a simplified platform for M&E data collection. In 2010, he left UNICEF to start BeDataDriven and develop ActivityInfo full time. Since then, he has worked with organizations in more than 50 countries to deploy ActivityInfo for monitoring & evaluation.

Mr. Jeric Kison earned his Bachelor's Degree from York University in Canada and his MBA from the University of Oxford in the United Kingdom. He has worked with NGOs and governments across four continents on strategy and evaluation for nine years. Before joining ActivityInfo he worked as a Monitoring & Evaluation Officer at Pilipinas Shell Foundation, Inc., where he led a project to develop an organizational M&E System which included the roll-out of ActivityInfo as the organization’s new information management system. He now joins us as our Implementation Specialist, bringing together his experience on the ground and passion for data to help our customers achieve success with ActivityInfo.

Transcript

00:00:00 Introduction

Thank you so much. Jeric, can you hear me? Okay, we are actually in the same room; it's on the wrong camera, but we're actually sitting in the same room. It's really exciting. We're back in the office and really excited to start the year with another webinar with all of you from basically everywhere. I'm excited about the topic that we're going to be discussing today. This is very close to my heart.

We're going to start off with some theory. This might not be new for everybody, but it's worth starting from first principles on what exactly is the relational model. Then, we will look at why that model is so important for monitoring and evaluation. After that, we will turn from theory to practice and look at bridging that gap with mobile data collection models and specifically then with ActivityInfo.

00:01:15 What is a relational database?

Let's start with what is a relational database and what is a relational model. To start with the dictionary definition, we can go back to 1971. It was a paper by E.F. Codd who already at that time described this model as a collection of tables, rows, and columns, and relationships between those tables. He writes in his articles why this is such an important way of organizing data. Looking at this field of relational data management, there are a number of key concepts that we want to introduce today: tables, columns, rows, keys, and finally, normal form.

Let's start with tables and columns. A relational database is a collection of tables with rows and columns. If we think about a program where we're supporting a group of beneficiaries with training—maybe in an agricultural context—we might be doing some trainings on seeds, other trainings on international standards at different times with different people in different provinces. We want to be able to organize all of this information together. If we did this in a relational database, we might split this up into three different tables. For example, a table with beneficiaries. Here you see we've listed our program has three beneficiaries. These beneficiaries are in two different provinces, in North Kivu and South Kivu.

Then we record the attendance in a separate table. You can see we have one row here for a seeds training on the 16th of January that was attended by beneficiary one. We had beneficiary two, that's Faye, and then another training in February with Faye and beneficiary number three, Jeric. We've organized this into tables and columns. One thing that's important to mention with columns in a relational model is that they all have a specific type. The kinds of types that you have available depend on the database system. The classical database system might have a small set of types like text, numbers, boolean (yes or no, true or false), dates, and maybe binary for PDFs or images. This is a difference between spreadsheets. In Excel, you can have rows and columns, but you can put whatever you want in there. That is the power of Excel, but also the weakness. A database is much more strict and rigid.

The next concept I want to introduce is the primary key. In this database, every one of my tables has a primary key, and we just use numbers to identify them uniquely. A primary key gives a row its identity, gives it a name, and a way to tell it apart from other rows. If we look at beneficiaries, I have given a number here to tell each of the beneficiaries apart. You might say you could also use the name because all three of these beneficiaries have different names. In this case, the name could be a primary key too. But of course, once you get to a certain scale, you're going to have people with the same names. Even if you use first and last names and birth dates, at a certain point, you might end up with people with the same name and birth date. Sometimes you need this extra ID to set that apart, and in the relational model, we call that the primary key.

To build these relationships between these tables, we need foreign keys. You can see here that for example, this training here involves beneficiary one, and that matches the primary key in my beneficiary table. So this is the foreign key, and then you have the primary key. Foreign keys can be repeated—here we have two beneficiaries in the training table—but the primary keys are never repeated; they're always unique. Talking about keys, you can further distinguish between two types of primary keys: natural keys and surrogate keys. A natural key is something that has meaning outside of the database, like a name, a national ID, or a phone number. A surrogate key is something that has no meaning outside of the database; it's just used for bookkeeping to keep this relationship correct. Using surrogate keys is often very useful because if you have to change something, like a misspelled name, you only have to change that in one place.

The last part of this theory that I want to introduce is something called the normal form. To understand what the normal form is, we'll start out from the opposite. You can always recognize data that's not in a normal form because you can see repetition. For example, if I added the name of the beneficiary into the training table, I introduce the chance for error. If I'm labeled as "Alexander" in one place and "Alex" in another, the data has diverged. We can start to normalize this table by removing the duplicates. We get rid of the name in the training table because we can always look up the name of the beneficiary by using the foreign key matching that to the primary key.

Another thing that you can do to bring a database into normal form is to split tables up. I can split this training attendance table into two tables, so now I have just a clean list of sessions. Here I have two training sessions, seeds and standards, with the date, and we use the keys here to link this. If I need to update the date, I do that in one place and that is linked to all of the remaining data. The biggest differences here compared to Excel are that there's much more structure in a relational model, and the database enforces these data rules. Secondly, databases are often designed to accommodate a larger volume of data than Excel.

00:09:45 Why use a relational model for M&E?

Thanks so much for that, Alex. So let's talk through the principles of relational databases. I'd like to share some of the benefits of why we should apply these principles to the work that we do in monitoring and evaluation.

First off, a relational database helps us to build complete pictures of the beneficiaries that we're supporting. To illustrate, let's take an agricultural assistance program supporting farmers. You might have a list containing biographical and demographic information about each farmer. You might then have another list that captures information about all the different interventions, such as input assistance, capacity building, or financial assistance. We also schedule monitoring visits to check out the yields of these farmers. This represents a different data set. If these are structured in a relational database, we would be able to identify precisely which farmers participated in which interventions thanks to relationships established in our database. We'd be able to attribute monitoring data back to individual farmers and understand the progression in their productivity.

Secondly, a relational database helps us organize the results that our team has achieved. Consider an M&E officer looking at results across different sectors of a multi-sectoral initiative, such as WASH, education, and health. This initiative might be implemented across a range of provinces with discrete projects on the ground. Without a relational database, we might struggle to figure out which data points to combine to report on performance. With a relational database, you'll be able to do these aggregate reports because we can establish relationships between data collected at the project level, attribute that to activity in a province, and link that back to specific indicators.

The third point is the flexibility of the relational model. To illustrate this, let's take a look at a refugee program. You might have a central unit of analysis centered on refugees. We can connect different attributes back to a particular refugee, such as intake assessments, monthly visits, participation in educational programs, and psychosocial surveys. All of these can represent different attributes that we connect centrally back to a single point of analysis.

In a different example, like a rapid response mechanism, you might have IDP sites related to different sets of information. On one hand, IDP sites link to a particular alert, and on the other hand, we relate them to geographic information to understand the concentration or dispersion of the sites. We can also link alert data to assessments and specific interventions. Taking both examples, we can start to see the flexibility that relational databases offer for us with regards to M&E.

00:15:30 Bridging the gap: Data collection and databases

We're going to turn now from the theory into the practice. We've been talking about this great relational model and how useful it is for analysis, but how do we actually get the data into such a system? We have to think about the flows of this data from the field into the database.

When E.F. Codd wrote his paper back in 1971, there were no actual relational databases yet. Today, 50 years later, we have many different open-source and commercial databases available, from Postgres and MySQL to Microsoft Access, SQL Server, Oracle, and of course, ActivityInfo. But many of these databases are really just an engine; they are a tool for building larger systems. You really need to provide your own way of getting data into the system and out.

Some options are to develop your own software application in PHP or Python that pushes data into the database. This requires software development and a certain set of skills. Another option that has revolutionized M&E is mobile data collection tools, for example, the Open Data Kit (ODK). This makes it easier to develop a data collection form, deploy it to mobile phones, and collect data. However, you still have to get that into the database, which requires exporting, transforming, and loading the data. Finally, you can connect tools like Power BI and Tableau to the database.

We often see the movement of data into and out of the system as a real stumbling block. Staff is spending a lot of time transforming and recoding data. This starts with the difference between the two models. A lot of mobile data collection tools descend from a standard called XForms, which is a document model based on XML. It didn't really take off for web usage, but it spawned tools like ODK and Kobo. At its heart, it is a document model, not a relational model.

If we think about a scenario where you have a rights holder or a refugee and you want to collect data on a monthly home visit, in a relational database, we would design our table for a monthly home visit with a foreign key linking to the rights holder table. But a document model like XForms doesn't have this way of relating two forms together. You can ask people to enter the ID or their names, but this is error-prone because we don't have the database on the phone to verify the ID. Another option is to pull the list out of your database and incorporate it into the form as a choice list (CSV file). This works better, but every time you add somebody to your rights holder table, you have to send out a new copy of the form.

There are other disconnects. For example, XForms has great question types like multiple select fields. This is convenient for data collection, but if we want to put this into a classical relational database like SQL Server, you have to transform this into multiple columns with yes or no values.

I wanted to highlight a real-world case where they were able to bridge this gap using a series of tools. This is a project from Timor-Leste, a study of small-scale fisheries. They sent researchers out with Android tablets using Kobo to collect data. They pulled that data from the database through the API, pushed it into MySQL, and then connected this to a dashboard built with R and Shiny. This gave them a real-time view of the level of effort and catches. It is an impressive system, but it requires technical expertise to set up and maintain.

00:22:45 The ActivityInfo approach

Turning to our own work, I want to share how we approached bridging this gap to bring down the cost of such a system without having to develop custom software. We took the best parts of the document model—like multiple selections, attachments, signature capture, photos, and geographic capture—and we added some key new field types into ActivityInfo, specifically the subform and the reference field.

I'm going to focus mostly on the reference field today because this brings back the power of a relational model. When you're building a form in ActivityInfo, it's like building a table, but you can use all of these field types and add reference types that establish links between forms. It allows user-friendly selection of related records, but behind the scenes, we're storing that unique primary key as a foreign key and keeping the integrity of those relationships.

In ActivityInfo, we can add two forms: a beneficiary form and a province form. To the beneficiary form, we add a reference field pointing to the province form. Once you've developed your form, that serves as the data model, but you also get the user interface directly on the mobile version and on the web. If you're on a mobile phone, the whole database syncs to your phone. When you choose that beneficiary or province, you can select from that list right at the spot of collection. The device makes sure that the beneficiary actually exists, and the relationship is preserved.

When you're dealing with collection and thinking about offline, it's moving from a model where you can do offline collection to offline rectification. With ActivityInfo, you synchronize to your mobile device and have a complete copy of the data you are allowed to see. This means you can review the data, make updates, add new records, and synchronize automatically when a connection becomes available.

00:28:15 Q&A session

Question: What skills should we look for in the M&E staff to run such a system and what might the cost be to apply such a system?

Answer: The skills you need depend on your choice of technology. If you take the case study from the fisheries example, you really need a software developer to integrate all of these tools, set up the connection to the MySQL database, and develop the dashboards. The advantage is total flexibility, but the downside is time and cost. With ActivityInfo, you need a different profile. As long as you're familiar with Excel, you can set up the database and move more quickly. There is a limit to the flexibility compared to custom code, but the costs are considerably lower. You have to weigh the trade-off of whether you want ultimate flexibility or a lower cost.

Question: Do we also say primary keys are a form of coding? It seems like you're coding the beneficiaries.

Answer: Identifying beneficiaries is always a challenge. One approach is to allow the database to assign a number to every beneficiary. In ActivityInfo, you could use a serial number field. As you register people, you can put that number on a membership card. If someone forgets their card, you can search for their name. In some cases, you might have codes assigned externally, like a national ID number. Especially if you're dealing with case management or protection, you would want to avoid using the person's name everywhere for confidentiality, in which case it's useful to assign a serial number.

Question: If there is no unique ID between two forms, can we bridge between the two different forms without any unique ID?

Answer: You need some way of knowing which record you're talking about. In our example, we used a surrogate key, but in some cases, you might have a natural key, like the province name. The downside is that if the name is misspelled, your link is broken. You don't always need a surrogate key or an auto-generated key, but you do need something that is unique, or perhaps multiple unique columns like first name and last name combined.

Question: What are some free databases for data entry and management?

Answer: A number of databases are free and open source, like Postgres, MySQL, and SQLite. You can download them and run them, but generally, everything still has a cost. They have to be set up, you need a server to run them on, and you need someone with the skills to configure them. It's important to weigh the costs of licensing versus the cost of staff time and development.

Question: What data collection app facility can you recommend that applies the relational model or can actually accommodate case management?

Answer: I'm very happy to recommend our own tool, ActivityInfo. You can find tutorials on case management with ActivityInfo on our website. We have templates to get started. You can synchronize the whole database to your phone so that when your caseworker goes to the field, you bring all of that information with you.

Sign up for our newsletter

Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.