Fundamentals of data modelling in humanitarian and development contexts
HostVictoria Manya
PanelistFay Candiliari
About this session
About this session
A good data model that is tailored to your project requirements is an essential part of designing and implementing an effective information management system.
During this webinar, we explain the importance of creating an effective data model when it comes to designing databases and walk you through some practical steps for creating a data model for your own project’s database.
In summary, we explore:
- What is data modelling?
- Why create a data model?
- Fundamental principles and frameworks of data modelling
- Importance of data models in organizing and analyzing information
The data modelling process:
- Identifying data entities
- Creating entity relationship diagrams
Data modelling best practices:
- Considering the role of end user experience
- Tips for aligning user experience with database functionality
- Creating data models that facilitate analysis
- Most common data models in humanitarian and development contexts
View the presentation slides of the Webinar.
Is this Webinar series for me?
- Do you wish to understand the basics of data modelling so you can design your own databases?
- Are you looking for information and inspiration for building an information system for your organization but don't know where to start?
- Are you an ActivityInfo database administrator or is this a role you would like to take on?
Then, watch our webinar!
Other parts of this series
Other parts of this series
The Data Modelling webinar series is a sequence of two live sessions designed for professionals interested in mastering data modelling concepts and techniques. These webinars form a comprehensive course that will provide a deep understanding of data modelling, including creating and utilizing data models within ActivityInfo using Miro, and adopting best practices for effective data modelling.
The series is addressed to entry to intermediate level professionals, and it is highly recommended to attend or watch the recordings of both sessions in consecutive order to fully benefit from the course.
About the Trainer
About the Trainer
Victoria Manya has a diverse background and extensive expertise in data-driven impact, project evaluation, and organizational learning. She holds a Master's degree in local development strategies from Erasmus University in the Netherlands and is currently pursuing a Ph.D. at the African Studies Center at Leiden University. With over ten years of experience, Victoria has collaborated with NGOs, law firms, SaaS companies, tech-enabled startups, higher institutions, and governments across three continents, specializing in research, policy, strategy, knowledge valorization, evaluation, customer education, and learning for development. Her previous roles as a knowledge valorization manager at the INCLUDE platform and as an Organizational Learning Advisor at Sthrive B.V. involved delivering high- quality M&E reports, trainings, ensuring practical knowledge management, and moderating learning platforms, respectively. Today, as a Customer Education Specialist at ActivityInfo, Victoria leverages her experience and understanding of data leverage to assist customers in successfully deploying ActivityInfo.
Transcript
Transcript
00:00:03
Introduction
Welcome everyone to this webinar on the fundamentals of data modelling in humanitarian and development contexts. The webinar is a series consisting of two sessions designed to provide a comprehensive understanding, or at least a starter pack, for data modelling in this sector. The first session today will cover the foundational concepts and principles, while the second session will focus on practical applications and real-world experiences.
00:01:54
What is data modelling?
Let's begin from the basics. A data model, as the name suggests, is a visual representation of a conceptual framework that organizes and defines data elements and shows how they interact with each other. It serves as a blueprint for the structure of your data within a system, illustrating the various entities, attributes, and relationships that make up your data environment.
By mapping out your data structure and relationships in a visual manner, a data model provides a clear method for how data is stored, organized, and retrieved. This visual mapping helps in understanding complex interconnections between different data elements, making it easier to design databases that are efficient and effective. Additionally, it ensures consistency and clarity in data management, facilitates communication among stakeholders, and guides the development of robust data systems. In essence, a data model is a crucial tool providing a structured approach to organizing data to support its optimal use.
00:03:31
Types of data models
There are several types of data models, each with its unique structure and use cases. While there are many models, including the network model and hierarchical database, ActivityInfo focuses on the relational database type. The relational model is perhaps the most widely used model today because it organizes data into tables or relations, where each table consists of rows and columns.
The power of the relational model lies in its use of keys to establish relationships between tables. This model supports complex queries and transactions, making it ideal for a wide range of applications from simple databases to complex systems. Its widespread adoption is due to its flexibility, scalability, and the robustness of the Structured Query Language (SQL) for managing and querying data. Organizing data into interrelated tables using keys allows you to undertake complex queries to manage a wide range of data efficiently.
00:06:00
Importance in humanitarian and development contexts
The complexity of social issues makes it imperative to have a clear data model, especially in humanitarian and development work. Data models play a crucial role in organizing and managing vast amounts of information. The intricate nature of social issues—involving multiple stakeholders, diverse data sources, and complex relationships—demands a structured approach to data management. Developing data modelling skills is essential for effective information management in this context.
Beyond the general need to create a data model, it is important to emphasize its multifaceted benefits. A well-structured data model enhances your ability to store, retrieve, and manipulate data efficiently. By defining rules and constraints, you ensure data integrity and consistency. For example, defining a relationship between tables simplifies data collection by focusing on essential data for each entity. Beyond operational efficiencies, the data model addresses broader organizational needs by delineating access permissions, aligning with policy requirements, and enhancing data security. It also supports scalability, data integration, and communication amongst your team, allowing for a centralized system rather than scattered files.
00:09:10
Fundamental principles
There are several fundamental principles to keep in mind. The entity-relationship principle is central to data modelling, defining how entities interact and are associated. For example, in a humanitarian context, beneficiaries receive relief items, and donors fund distribution centers. Entities represent a real-world object or concept, such as beneficiaries, relief items, or donors. Attributes are the characteristics of these entities, providing detailed information like a beneficiary's ID, name, or age.
00:10:44
Normalization
Normalization is a data modelling process that organizes a database into smaller, related tables to minimize redundancy and dependency. It maintains data integrity and helps optimize storage. The primary purpose is to reduce redundancy and ensure data integrity. The process involves several steps known as Normal Forms:
00:12:50
Data integrity, scalability, and clarity
Data integrity ensures that data remains accurate, consistent, and reliable over time. It is enforced through using primary keys (like a unique beneficiary ID) and foreign keys (like a location ID linking to a location table). Validation rules further support data integrity by ensuring that data entered meets predefined criteria, such as restricting the age of beneficiaries to a reasonable range.
Scalability ensures the system can handle increasing volumes of data without significant redesign. This is essential in humanitarian contexts where the number of beneficiaries may grow rapidly. Techniques such as partitioning tables help maintain performance. Flexibility allows you to modify or accommodate changing requirements without disrupting existing data, such as adding a new attribute like "health status" without affecting existing records.
Finally, simplicity and clarity are vital. Designing user-friendly data models is crucial for both technical and non-technical staff. The model should be intuitive, with clear, straightforward relationships between tables. Comprehensive documentation, including detailed explanations of entities, attributes, and a data dictionary, supports clarity and ease of use.
00:16:52
The data modelling process
Designing a data model can be compared to creating a house blueprint. Understanding the requirements is a crucial first step. Just as an architect meets with homeowners to understand their preferences, a data modeler meets with stakeholders to gather data requirements. This involves identifying key entities, relationships, and end goals to align the model with objectives. Stakeholders must articulate their theory of change and intervention goals. Detailed data requirements are then defined, including indicators, calculation methods, data sources, and formats.
When designing a data model, you need to ask four critical questions:
00:20:15
Identifying entities
Entities represent the primary components or objects in your database, much like rooms in a house blueprint. In a house, rooms like the kitchen or bedroom are essential elements designed for specific functions. Similarly, in a data model, entities like beneficiaries, relief items, donors, and distribution centers are fundamental components. Each entity represents a discrete data object serving as a basic building block.
Just as a kitchen is distinct from a living room based on its function, entities must be distinct. Identifying your focal entity is crucial. Just as a central living room might serve as a hub, a beneficiary entity often serves as the centerpiece around which other entities are organized in humanitarian data models.
00:23:41
Defining attributes
Attributes provide detailed information about each entity, making them unique and functional. In a house, details like room dimensions and window locations define the room. Similarly, attributes like name, date of birth, age, and gender define a beneficiary entity. These attributes are crucial to accurately represent and differentiate between each entity. You must ensure you keep to the requirements that set one entity apart from another.
00:26:07
Establishing relationships
Relationships link entities together. In a house, hallways and stairs connect rooms to ensure smooth movement. In a data model, relationships link entities like beneficiaries, relief items, and donors. For example, beneficiaries receive relief items, and donors fund distribution centers. These relationships are crucial for seamless data integration across various tables and reports. Well-defined relationships facilitate comprehensive data analysis and reporting, enabling the effective combination of data from different sources.
For example, if the kitchen is the focal entity in a house, all other rooms connect to it. Similarly, relationships connect a focal entity, like a beneficiary table, to other related tables like personal details, services received, or contact information. A unique identifier in the beneficiary table links to records in the services received table, enabling easy tracking and analysis.
00:29:18
Cardinality and keys
Relationships are defined using cardinality: one-to-one, one-to-many, and many-to-many.
Referential integrity constraints are used to define these relationships. For example, a foreign key in a refugee health records table links to the refugee table. This ensures every health record corresponds to a valid refugee. If a refugee record is deleted, the database can be set to cascade delete the related health records or restrict deletion to maintain integrity.
00:34:11
Practical normalization examples
Let's look at how normalization rules play out in practice.
First Normal Form (Atomic Values): Consider follow-up dates for gender-based violence cases. In a non-normalized table, dates might be stored in a single cell as a list. This violates the rule of atomic values. To normalize, these dates are separated into individual records across two related tables: a refugee table and a follow-up session table. The session table records each date as a separate entity linked by a beneficiary ID. This avoids redundancy and simplifies querying.
Second Normal Form (Functional Dependence): Normalization ensures all values in a table are functionally dependent on the whole primary key. If a table has a composite primary key (e.g., Beneficiary ID + Session ID), but includes details that only relate to the beneficiary and not the session, this creates inconsistency. To normalize, ensure every attribute depends on the entire primary key. For example, session details should depend on the specific session ID and beneficiary ID combination.
Third Normal Form (Transitive Dependencies): We must eliminate transitive dependencies. If a "Social Worker ID" in a table depends on the "Beneficiary Name" rather than the primary key (Beneficiary ID), this is a transitive dependency. To resolve this, restructure the tables so that all non-key attributes depend only on the primary key. You might create a separate Social Worker table linking social workers directly to beneficiaries, ensuring clarity and integrity.
00:41:21
Constraints and documentation
Constraints are like building codes and regulations. You must ensure you follow validation rules and relevance rules to ensure data integrity and accuracy. Use unique fields and field properties where necessary.
Documentation is also vital. You should document all your steps to institutionalize your data model. This ensures that if you are not available, others can understand the model through the documentation, including technical terms and a data dictionary.
00:43:02
Simple data model example
To illustrate, we can look at a simple data model for a training project. We define entities: Beneficiaries, Training Courses, and Training Sessions.
We apply normalization to organize this data into separate related tables connected by lines representing relationships. This ensures no confusion and clear connections between what is related to what.
00:45:42
Questions and answers
Is there a software for creating the data model?
There are several tools you can use to create a data model. In the next session, we will be using a specific tool to create a data model practically, so I can show you then.
What does referential integrity constraints mean?
Referential integrity constraints ensure that relationships between your tables remain consistent. It prevents "orphaned records." For instance, if you have a beneficiary record (Table A) and a service received record (Table B) that references the beneficiary's name, deleting the name in Table A creates an orphaned record in Table B. You wouldn't know who received the service. Referential integrity prevents this deletion or manages it to maintain consistency.
What is a foreign key?
A foreign key is a field in one table that uniquely identifies a row of another table. It establishes a link between data in two tables. For example, in an orders table, a "Customer ID" field would be a foreign key referencing the "Customer ID" in the customers table.
What are the best practices for selecting attributes when designing a data model for individual entities?
Selecting the right attributes is crucial. Best practices include:
Is there a difference between indicators and entities?
Yes, there is a difference. An entity is a real-world object or concept (e.g., Beneficiary, Donor, Location) that can be distinctly identified and has attributes. An indicator is a specific measure or metric used to assess performance or impact (e.g., "Number of beneficiaries who received healthcare"). Indicators are used to interpret and analyze data, often appearing in reports and dashboards, whereas entities are the structural blocks of the database.
Is it a must for an entity to have different components/elements for fulfilling its criteria?
It is essential for an entity to have different components, if by components you mean attributes. These attributes define the properties of the entity and allow it to be uniquely identified and described. For example, a bathroom is identified by its specific elements (shower, toilet). If you show someone a room with a toilet and call it a kitchen, it creates confusion. Attributes provide the descriptive power necessary to facilitate relationships and functional requirements.
Sign up for our newsletter
Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.