Data modelling in practice
HostVictoria Manya
PanelistFay Candiliari
About this session
About this session
In this session, we revisit the foundational concepts of data modelling and provide practical insights into creating and using data models within ActivityInfo, leveraging the Miro platform.
We will also discuss best practices to ensure the effectiveness and reliability of your data models by considering a checklist for identifying datasets/data flows and understanding objectives of your data model.
In summary, we explore:
Creating and using data models in ActivityInfo using Miro:
- Checklist for identifying datasets/data flows and understanding objectives of your data model.
- Step-by-step guide on developing data models within ActivityInfo
- Hands-on demonstration using Miro for visualization and management
Best practices for effective data modelling:
- Strategies for optimizing data models for accuracy and efficiency
- Tips to ensure your data models meet organizational needs
View the presentation slides of the Webinar.
Is this Webinar series for me?
- Are you working in complex information systems or environments where data modeling plays a critical role?
- Are you looking for guidance on best practices for data modelling?
- Do you wish to ask questions about these topics?
Then, watch our webinar!
Other parts of this series
Other parts of this series
The Data Modelling webinar series is a sequence of two live sessions designed for professionals interested in mastering data modelling concepts and techniques. These webinars form a comprehensive course that will provide a deep understanding of data modelling, including creating and utilizing data models within ActivityInfo using Miro, and adopting best practices for effective data modelling.
The series is addressed to entry to intermediate level professionals, and it is highly recommended to attend or watch the recordings of both sessions in consecutive order to fully benefit from the course.
About the Trainer
About the Trainer
Victoria Manya has a diverse background and extensive expertise in data-driven impact, project evaluation, and organizational learning. She holds a Master's degree in local development strategies from Erasmus University in the Netherlands and is currently pursuing a Ph.D. at the African Studies Center at Leiden University. With over ten years of experience, Victoria has collaborated with NGOs, law firms, SaaS companies, tech-enabled startups, higher institutions, and governments across three continents, specializing in research, policy, strategy, knowledge valorization, evaluation, customer education, and learning for development. Her previous roles as a knowledge valorization manager at the INCLUDE platform and as an Organizational Learning Advisor at Sthrive B.V. involved delivering high- quality M&E reports, trainings, ensuring practical knowledge management, and moderating learning platforms, respectively. Today, as a Customer Education Specialist at ActivityInfo, Victoria leverages her experience and understanding of data leverage to assist customers in successfully deploying ActivityInfo.
Transcript
Transcript
00:00:03
Introduction and recap
Thank you very much, Faye. And hello everyone. Welcome once again to the session on data modeling in practice. The webinar series, like Faye mentioned, consists of two sessions designed to provide you with a comprehensive understanding of data modeling in humanitarian and development contexts. The first session, for those who were present, covered foundational concepts and principles. Today our second session will focus on more practical applications. If you missed the first session, please make sure you watch it to gain a more robust understanding of the topic and our journey as a group so far.
We will start with a brief review or overview of the foundational concepts and principles that we discussed in the first session, ensuring that everyone has a refresher. Next, we will delve into how to create and utilize data models for ActivityInfo databases, incorporating software that supports diagramming for enhanced visual collaboration and design. We will then explore best practices and end the session with questions.
In our last session, we talked a bit about these concepts. An entity is a discrete data object and the basic building block of your database, such as a person, place, or thing. We basically said an entity is like a noun. An attribute is a characteristic that describes an entity in some way. For example, the attribute of a person entity might include name, age, address, etc. We also talked a bit about relationships, which define how entities relate to each other. For instance, a beneficiary entity might have a relationship with a service entity indicating that the beneficiaries enjoy services within the project.
We also discussed cardinality, which specifies how many instances of one entity relate to instances of another entity. Common cardinalities included one-to-one, one-to-many, and many-to-many relationships. We also discussed keys, noting that a key is an attribute or a combination of attributes used to uniquely identify an entity, like a student ID. Finally, we spoke about normalization, which is the process of organizing data in a database to reduce redundancy and improve data integrity. This involves dividing large tables into smaller related tables and defining the relationships between them.
00:07:35
Database concepts in ActivityInfo
Before we get into the case study, let's establish that we are building our database in ActivityInfo. There are a few concepts you need to understand regarding how data modeling translates into ActivityInfo. ActivityInfo databases are structured to leverage relational database principles where data is organized into related tables. This design enhances data integrity and minimizes redundancy. Each table typically represents a different entity such as beneficiary, project, or distribution, and the relationships between these tables are established through foreign keys.
ActivityInfo is structured in an intuitive hierarchy. A database in ActivityInfo is a structured collection of data organized into tables, forming the foundation for your data entry, storage, and retrieval. It is essentially your entire house. Next is the folder; folders are used to organize forms and datasets into logical groups, helping you navigate and manage different sets of data.
Then we have forms. Forms in ActivityInfo are digital tables used to collect and manage data. Each form corresponds to a specific entity, for example, your beneficiaries form or your aid distribution form. The forms include various fields where users can input information. Fields are the individual entry points within a form, representing specific attributes of the entity, such as name or date of birth. Finally, records are individual entries within a form. Each record contains data for all the fields in a form, representing a single instance of the entity, like one beneficiary or one aid distribution event.
00:12:10
Reference forms and subforms
When we talk about relationships in ActivityInfo, there are certain kinds of forms that help you take advantage of the relational database structure: the reference form and the subform. The purpose of the reference form is to store and manage a standardized list of information that can be reused across multiple other forms. For example, if you have a list of locations such as cities, countries, or units, you can use reference forms to capture them. When designing a form that requires a user to select a location, instead of manually entering it each time, the form can reference the standardized list. This enhances data consistency and reduces errors.
The other type is the subform. Subforms are used to capture reoccurring or repetitive information that relates to a specific record in a parent form. For example, you could have periodic reports for a project or regular updates for a specific beneficiary case. If you are managing a list of vulnerable groups and every month you have to update information about their level of vulnerability, you would use a subform because the data reoccurs and is dependent on the parent form. This allows you to collect detailed, organized, and repetitive information that is clearly associated with the main record.
To summarize the key differences: use a reference form when you have a standardized list where data is static and reusable across various forms, like geographical locations or departments. Use subforms for repetitive entries related to a specific parent record where data is dynamic, such as weekly or monthly status updates for a specific project or beneficiary.
00:18:05
Case study: Reach Out Initiative
Let's move to our case study set in Maiduguri, Nigeria. Let's assume we all work for an organization called Reach Out Initiative. Our organization has launched an HIV/AIDS program aimed at providing comprehensive care and support to patients in the region. The program goal is to improve patient outcomes by increasing viral load suppression and retention rates. They want to achieve efficient data management through a centralized repository, accurate data collection, and enhanced reporting and accountability.
We will be using Miro to design and visualize our data for the case study, though other platforms like Lucidchart or Jamboard can also be used. Remember the four questions we need to ask before designing: Why, What, Who, and How.
First, the "Why": We need to define our objectives, which include monitoring program performance, reporting impacts to donors, improving patient care, and enhancing data accuracy. Key indicators might include the number of patients enrolled, retention rates, viral load suppression, and mortality rates.
Second, the "What": We identify what data will be captured. This includes patient demographics, clinical data (viral load, treatment adherence), program engagement, mortality rates, and geographical location.
Third, the "Who": We identify roles. This includes data entry by field staff, M&E staff, medical consultants, and project managers. We must define data owners; for example, patient data might be managed by clinic managers, while clinical data is managed by healthcare providers.
Fourth, the "How": We identify how users will utilize the data to adjust. For instance, users might need data to adjust treatment protocols if a specific line of treatment isn't working, or for donor decision-making and program coordination.
00:24:20
Designing the data model
Let's identify the entities we need. Based on the chat responses, critical pieces of data include patient data, viral load suppression, patient info (age, gender), treatment rate, location, and staff data. Age and gender are attributes of the patient entity. Location data is considered reference data because it is unchanging and can be reused. Similarly, "Units" (departments) within the organization are also reference data.
We also need to track treatment adherence, follow-ups, and consultations. These are critical for the program's success.
00:30:35
Data flow and relationships
Let's look at the data flow. The process begins with the admission phase where a data encoder enters initial admission details. This biodata moves to the Patient Form, which captures the attributes describing the patient entity. Following admission, the patient undergoes a first consultation. Another data encoder records follow-up details, and this information flows to the system. This consultation form cannot exist without the patient registry form, making it a subform.
There might be subsequent follow-up sessions where lab tests are ordered or drugs are prescribed. These are also captured in subforms. Finally, there is a check or update phase involving a consultant. If lab results come back or the patient's status changes (e.g., viral load changes), the consultant looks up the patient record and makes adjustments to medication or treatment protocols. The entire process is interconnected to ensure patient data flows seamlessly from admission through consultations and updates.
00:37:35
The database structure
Now let's look at our data model diagram. The boxes represent tables (forms), and the lines indicate relationships and cardinality. We start with the hierarchy of our entities. The largest is the Unit. Our program begins with a Unit, each Unit has a Project, each Project is implemented in a Location, and each Location focuses on the Patient.
Our first entity is "Units," a reference table containing Unit ID and Name. Next is "Projects." We have a dedicated table for projects to manage project-specific details independently, reducing confusion. Then we have "Location," which is also a reference table storing information like City and Country. Separating the location table helps with data integrity by avoiding duplication and ensuring consistency across forms.
Our focal entity is the "Patients" form. This table includes the patient code, unit ID (foreign key), weight, height, ARV line, and status. Linked to the Patient form are the subforms for "Consultations" and "Medication Adherence." These subforms allow us to track the level of adherence, missed doses, and treatment effectiveness over time. This configuration enables robust reporting, such as calculating mortality rates or cumulative treatment adherence rates.
00:44:10
Building the database in ActivityInfo
We will now translate this model into ActivityInfo. We set up our core entities: Units, Projects, and Locations. Locations are set up as a reference form. Then we have the Patients form, which includes the subforms for follow-ups. This structure replicates exactly what we had in our data model.
The configuration of forms in this manner creates the parent-child connection between entities, particularly between the patient entity and consultations. The defined cardinality helps maintain data integrity. For instance, the one-to-many relationships between units and projects, or locations and patients, allow for detailed capturing without duplicating information.
When putting together your database, remember to align with organizational goals, engage stakeholders early, design for flexibility and scalability, provide regular training, and perform regular audits to detect and correct errors.
00:48:20
Q&A and reporting
Regarding reports, ActivityInfo allows you to create notebooks and dashboards. For example, we can create a report showing monthly patient enrollment, number of patients by location (e.g., Baga vs. Chibok), patient mortality per location, and adherence to treatment protocols. You can use bar charts, line charts, or pivot tables to visualize this data. Because it is a relational database, you can drag and drop fields to design your reports according to your needs.
Regarding integration, ActivityInfo can be integrated with other software such as Power BI. For qualitative data, while ActivityInfo is largely used for quantitative data, there are provisions for string data and text responses, making it possible to conduct surveys with text responses.
Thank you all for listening and for your participation. We will share the presentation and the recording.
Sign up for our newsletter
Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.