Data modelling for humanitarian and development information management systems
HostEliza Avgeropoulou
PanelistVictoria Manya
About this webinar
About this webinar
A good data model that is tailored to your project requirements is an essential part of designing and implementing an effective information management system.
During this webinar, we explain the importance of creating an effective data model when it comes to designing databases and walk you through some practical steps for how you can create a data model for your own project’s database.
In summary, we explore:
- What is data modelling?
- Why create a data model?
The data modelling process:
- Identifying data entities
- Creating entity relationship diagrams
Data modelling best practices:
Considering the role of end user experience
Tips for aligning user experience with database functionality
Creating data models that facilitate analysis
The most common data models in humanitarian and development contexts
View the presentation slides of the Webinar.
Is this Webinar for me?
- Do you wish to undestand the basics of data modelling so you can design your own databases?
- Are you looking for information and inspiration for building an information system for your organization but don't know where to start?
- Are you an ActivityInfo database administrator or is this a role you would like to take on?
Then, watch our webinar!
About the Speakers
About the Speakers
Eliza Avgeropoulou earned her BSc from Athens University of Economics and Business, and her MSc degree in Economic Development and Growth from Lund University and Carlos III University, Madrid. She brings eight years of experience in M&E in international NGOs, including CARE, Innovations for Poverty Action and Catholic Relief Services (CRS). The past five years, she has led the MEAL system design for various multi-stakeholders’ projects focusing on education, livelihoods, protection and cash. She believes that evidence-based decision making is the core of high quality program implementation. She now joins us as our M&E Implementation Specialist, bringing together her experience on the ground and passion for data-driven decision making to help our customers achieve success with ActivityInfo.
Victoria Manya has a diverse background and extensive expertise in data-driven impact, project evaluation, and organizational learning. She holds a Master's degree in local development strategies from Erasmus University in the Netherlands and is currently pursuing a Ph.D. at the African Studies Center at Leiden University. With over ten years of experience, Victoria has collaborated with NGOs, law firms, SaaS companies, tech-enabled startups, higher institutions, and governments across three continents, specializing in research, policy, strategy, knowledge valorization, evaluation, customer education, and learning for development. Her previous roles as a knowledge valorization manager at the INCLUDE platform and as an Organizational Learning Advisor at Sthrive B.V. involved delivering high- quality M&E reports, trainings, ensuring practical knowledge management, and moderating learning platforms, respectively. Today, as a Customer Education Specialist at ActivityInfo, Victoria leverages her experience and understanding of data leverage to assist customers in successfully deploying ActivityInfo.
Transcript
Transcript
00:00:00
Introduction and agenda
Thank you, Faye, for the nice introduction. I will move into describing what we're going to see today in more detail. In the first part, we will explain the process of designing a data model. We will go through what is a data model, why do we need a data model, why a data model is significant for humanitarian development interventions, and what are the components of the process when we actually design our data model.
Then I will pass it over to Victoria, and Victoria will get more in depth on the data modeling best practices. Here we will consider how the data models can facilitate analysis. We will see common examples and common data models for different use cases of humanitarian and development interventions, and best practices regarding the role of the users, and how actually we can align the user experience with the database functionality.
00:01:05
What is a data model?
So the first component, what is the data model? Imagine that the data model functions as a map. It acts as a map that includes our data structure, includes the relationships between the different data structures, all in a visual format. It provides frequently a method by which we can identify how our data is stored, how our data are organized, and how we can retrieve this data at the end of the day. So imagine the data model as your map, maybe your Google map, into how you can navigate into this data world.
Why is a data model so significant and important for humanitarian development action? First of all, we need to have in mind that humanitarian development actions are interconnected with social and natural phenomena. Here we have a high level of complexity, a complexity which goes hand in hand with the social or natural causes that these interventions try to address and solve.
The complexity can be narrowed down into three different components. First, in many cases, especially in a world where we try to address social issues by different interventions, we have a complexity that is called emergent. It is explained by the interactions between the individual components. Imagine different kind of services that we offer to our beneficiaries. The combination of the interaction between the different kinds of services leads to some result.
Second, it is not linear. This means that we need to acknowledge that we offer these services to real humans in the real world. So we operate in a real world. This means that the specific context and the specific time can actually impact the results. This can lead to small changes that have disproportionate effects. Third, it is adaptive; everything changes. The individuals, our beneficiaries, they change. The system in which we operate context-wise keeps changing. So our programs need to change, and we need to adapt the behavior.
00:03:51
Why do we need a data model?
So what are the reasons for which we say that we cannot live without the data model for these kind of programs? First of all, it's the data organization and integrity. This means that in a data model, we can structure data in a specific way. Thus it's easier to store and manipulate the information. We also have the ability to define different rules and constraints. This leads to a higher level of accuracy, reliability, consistency, and efficiency of data.
Second is security, which has become more and more important in past years. Inside the system, the data model, we always define who needs to access the data and why they need to access the data. This frequently goes hand in hand with donor requirements. This leads to increased data protection, scalability, and documentation.
Third, the data model can serve as a blueprint on how data can be expanded. Given that we have a very clear representation of the data, it's easier to maintain every data model, and as a consequence, a database. It's easier to perform different adaptations or improvements. Regarding integration, we need to have in mind that organizations may have multiple data sources. For example, if we operate in a specific project and we need to perform data entry on the donor's database, without having a clear understanding of what is the data that we gather internally, we cannot identify how we can integrate with an external system.
Finally, communication creates a common language for communication between different stakeholders, between information management people, monitoring and evaluation people working on the field, and people working in programs. At the end of the day, all those five components lead us to simplifying complex systems. This is our ultimate objective: to be able to use this data and information to efficiently monitor and understand the evolution of the situation, evaluate to which extent we have achieved our higher objectives, be accountable towards all the stakeholders, learn as part of this process, and coordinate.
00:06:49
The restaurant analogy
If we would like to find an analogy, a data model looks like a restaurant in reality. In a restaurant, we need to identify the dishes; this is called entities. In the information management world, those dishes have specific ingredients, which are called attributes. We need also to create some menus, so we need to find some combinations that make sense. We cannot combine the dessert with the main course all at once. So these menus define the relationships between the different dishes.
As part of the process, we want to be efficient, so we want to reduce data duplication. This is the art of organizing our kitchen. We don't want to have too many ingredients, because if we don't use them for our different dishes, those ingredients will be a waste of resources. Similar logic applies to database design. We want to have just the right amount of detailed information that we need.
Of course, it's important to test. We will never give a customer a dish that we haven't tasted ourselves. The same goes with the data model. We create a data model and we need to test that everything works properly. We have documentation, which is the restaurant cookbook. We want to make sure that if a chef is not there and someone else substitutes, the restaurant can go on and deliver the same quality of dishes. And of course, the important component is to evolve and adapt, just as restaurant menus need to adapt to seasonal changes. We want to evolve and adapt to all the changes that take place in the field, the programs, and as a consequence, the data model. Imagine that the data model is like a living creature.
00:09:07
The data modelling process
Step one is understanding the requirements. Imagine now that we want to open the restaurant. We need to understand who the potential customers are, what their needs are, and how much they pay for specific dishes. Similarly for the data model corresponding to development interventions, we need to have a clear understanding of the purpose and the objectives of our intervention, namely the Theory of Change. We need to have a clear understanding of the data requirements, namely our MEAL plan: what are the indicators, how we calculate them, what is the data source, and how I'm going to use the information. We need to have a clear understanding of our requirements from the different stakeholders, namely the data flow: who collects the information and how often, who accesses the information, and who analyzes it. It is a best practice to use a participatory approach involving MEAL staff and program staff because they have the experience of the field.
Step two is identifying our entities. Remember the dishes of the restaurant. In our case, an entity in information management terms is a discrete data object, and it serves as the basic building block of our database. In practice, the different data collection forms can be considered as entities. For example, let's say that we have an intervention to provide protection services to a vulnerable population. We would have "Beneficiaries" as one data collection form (entity) and "GBV follow up form" as a separate data collection form (separate entity).
Step three is identifying the attributes. Attributes are the characteristics that describe our entities. In practice, an attribute is a field, the actual questions that we will have inside the data collection form. For example, when we have beneficiaries, we want to know the name, the date of birth, the age, the sex, the family size. When we want to follow up on the GBV cases, we need the date of the follow up, who performs the follow up, and what actions we have identified.
Step four is defining relationships. This is how entities are associated among them. In practice, it means the relationships of the records that we have identified in our first table to the records of the second table. If we are monitoring the beneficiaries' GBV follow up, a beneficiary for a GBV case may have many different legal actions or health actions, meaning they are going to need multiple follow ups. This means that the relationship between our first table and the second table would be a one-to-many relationship. One record (one beneficiary) will have many different GBV follow ups. We may also have one-to-one relationships, many-to-one relationships, and many-to-many relationships.
Step five is reducing data duplication (Normalization). This is the process of organizing those entities and attributes to make the database more efficient. We want to eliminate redundancy and improve data integrity. In practice, it means that our data collection forms and their relationships should follow three main rules.
Step six is to visualize, test, document, and evolve. We create a visual representation of our data model as it is the easy way to communicate with our teams. We need to seek validation from the people that are going to use it, specifically field staff. We need to consider the reports and the analysis part of the process. We must create proper documentation which works as a common vocabulary across different teams. Finally, we must consider program changes; any adaptations should be reflected because no one will ever use an outdated database.
00:20:08
Key messages
Some key messages to keep in mind:
00:21:12
History of database models
Thank you, Eliza, for that very clear explanation. Before we go into some of the best practices, we thought it was important to understand the context in which relational database models have become so important. Historically, we've seen major categories of database types.
First, we have the hierarchical database, which used to be popular. Think of it as a way to organize your data much like a family tree or an organizational chart. You have a primary parent record, and under it, there can be one or more child records. It's akin to how your computer file system organizes your files and folders.
Next is the network database. This is quite similar to the hierarchical database but with a slight twist. Instead of just one parent having multiple children, in a network database, records can be connected to many others, creating a many-to-many relationship. However, these two database structures had their limitations. They were fixed and rigid, making it challenging to change how the database was organized or manipulate data.
Then came the relational database in the 1970s. As Eliza explained, clearly defined entities have specific roles in securely holding, organizing, retrieving, and providing access to your data. You think of these entities as the essential building blocks. Within a relational database environment, there are precise and well-documented actions that applications can perform to manipulate the data, such as adding, modifying, and extracting information. There are also integrity rules that ensure data remains consistent, accurate, and reliable throughout its lifecycle.
00:25:25
Case management database model
In the field of humanitarian and development practice, data structuring has taken various structures. These models are particularly evident in platforms like ActivityInfo and play a crucial role in case management programs, tracking assessments, and crisis intervention. One of the notable models we encounter is the case management database model. ActivityInfo provides a template that mirrors this specific model.
The model centers around individual cases or beneficiaries and encompasses fields for personal details, assistance provided, location, and case status. The rationale behind this data model is straightforward: it assists organizations in effectively managing and tracking the assistance provided to individuals affected by disasters or conflicts.
The entities here are the "Position codes" form, the "Supervisors" form, and the "Protection cases" form. The position code and supervisor forms are entities that we call reference forms in ActivityInfo. They are created to capture a standard list that will be used in multiple places across the database. A best practice is to create a folder that houses these reference forms in one central cluster. The other entity is the "Protection cases" form, around which the database is built.
00:29:18
Relationships and attributes in practice
Let's look at relationships. First, let's define our attributes. Attributes are characteristics that describe the entity. Imagine you have different pieces of a puzzle, and this puzzle is the body parts of an elephant. These different pieces are the attributes—the questions asking "what is supposed to be here?" When you mention "it's the trunk," those are the responses, which are records to your attributes.
In the reference data, we have attributes like "Full name of the code" and "Code" for position codes. For supervisors, we have an attribute labeled "Name." Between the records in the position code and the supervisor form, we have a one-to-one relationship. We also see relationships between the protection code form and the protection cases, where we have one-to-many relationships. One record, like a position code, is associated with multiple records in the protection case entity.
Between the protection cases form and other data, we have a relationship with fields called subforms. You might ask, "How do I collect data regarding attributes like confidential biodata, specific child protection cases, etc.?" You might be tempted to create a million parent forms or fields, which is confusing. Relational databases help you put all those things together. You create subforms, which are special fields embedded in your main form. This allows you to collect data that we call sub-records on an ongoing basis while storing them.
00:34:38
Fields and data types
The pivotal feature about relational databases is that they are flexible and can handle various types of data. This database employed a diverse range of field types to collect the data required to manage cases. The field types can help improve your data consistency and relevance.
00:37:56
Summary of efficiency
What makes the database efficient? The database employs a system of tables and subforms to effectively organize and store data related to the cases. This approach is carefully designed to minimize data redundancy and ensure we enhance our data integrity through normalization. In terms of data relationships, the database utilizes reference forms to establish connections between forms. This creates referential integrity and ensures that primary and foreign keys are used to link and reference data accurately.
A common feature in relational databases is Role Based Access Control. It helps you manage permissions. We have a series of webinars on this. When it comes to maintaining data accuracy, the database employs integrity constraints, including relevance and validation rules.
00:40:52
Change management
Certain barriers, like data security concerns, data confidentiality, and reluctance to leave comfort zones, are barriers to digitally transforming your M&E workflows. To address this, we have enablers like having your internal advocates as champions, ensuring ease of use, data security assurances, and open communication. ActivityInfo has support structures available, including synchronous onboarding or self-managed learning processes.
In implementing change management, we should remain grounded in the fact that change is a gradual process and its pace is attached to the value and the strategy. The earlier team members see the value of the database you have built—answering questions like "Does it save time?" or "Does it make my work more efficient?"—the earlier they adopt it.
00:43:16
Q&A Session
Do I need different forms if I work with different partners in the field?
To effectively manage data collection with different partners in ActivityInfo, you can create a reference form with a key "Partner" field that holds records for all partners. Then you can connect this reference form to the parent form using a reference field. This approach allows you to associate each data entry in the parent form with a specific partner. This streamlines your data collection and supports access restriction to data per reporting partner.
If I collect secondary information for my indicators, do I need a different data collection form for each indicator?
Typically, you don't need a separate data collection form for each indicator in an e-tracking data model. Instead, you design a parent form that includes the indicators. This form can capture all the data necessary. The second step is to create a subform within your parent form. The subform will include records of actual values reported to support each of the indicators in your parent form.
Can I embed any type of calculations within ActivityInfo, for example, to identify vulnerability scores?
Absolutely. Within ActivityInfo, you can perform calculations by utilizing built-in functionalities. To calculate scores for your needs assessment, you can create calculated fields in your data collection form. This field can be configured to perform various mathematical operations based on the data you've collected. The system will automatically compute and store these scores.
Can these relationships be established in Excel or Power BI?
Any type of relationship can be reflected in ActivityInfo because it is a relational database model. Similarly, Excel (via Power Query) and Power BI allow you to create these kinds of relationships (one-to-one, one-to-many, etc.) to analyze and use the information. Power Query acts as a common denominator. If you visualize it, Power Query and ActivityInfo both use tables with lines defining that a field from table one has a specific relationship with a field from table two.
Are there rules for ensuring data protection while working on the data model?
Absolutely. In ActivityInfo, we have functionalities tailored for this. For instance, if you have a case for gender-based violence and you do not want certain people to have access to sensitive records, you can prevent access using permissions. Even if two people are supervisors on the same database, one may not have access to the sensitive records the other is handling. We restrict access using formulas and rules.
Does giving partners access impact data privacy?
No, as long as you have the right measures in place. First, there is the software component: secure login methods (password or Single Sign-On via the organization's domain) guarantee that only designated users have access. Second, there is the device component: you should have policies on password management and use encrypted professional devices rather than personal ones. Third, you must think through the roles to ensure you haven't given access to persons that shouldn't have it. Finally, use tools like the audit log to monitor who has done what in the database.
Can you blur full names or national IDs?
You cannot "blur" the information visually in the sense of making it fuzzy, but you can hide specific fields from table exports so people cannot access that specific field. You can also restrict access to specific data and table previews filtered by parameters. So while you cannot blur, there are other ways to guarantee data privacy.
How do you marry the flexibility of the system with the rigidity of fixed, established indicators (e.g., donor indicators)?
This is common with donor-mandated indicators. The first question is how and where those indicators fit within your internal MEAL system. You must define what the indicators mean and identify who is going to collect them. This is the basis for balancing the rigidity of those indicators with the final outcome of a database. The next component is testing. If something doesn't make sense for the end user (field staff), you will not get the results you want. You may need to make minor twists, like adding an extra explanation or option that the field staff needs, to ensure the data is collected correctly.
Sign up for our newsletter
Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.