Thursday May 19, 2022

Database design principles and designing a new database in ActivityInfo

  • Host
    Jeric Kison
About this webinar

About this webinar

Designing a new database that organizes your data in an effective way can be a daunting task. But if you follow a few basic principles and approach your design in a systematic way, a well-designed database should be within reach.

This Webinar is a one-hour session ideal for M&E and Information Management practitioners who are new to database design and are looking for guidance on how to design their database effectively. We will discuss principles of database design in general and we will show how these can be applied using the ActivityInfo platform.

View the presentation slides of the webinar

Some of the key points we cover are:

  • Principles of database design
  • Steps to creating an effective database design
  • Common database structures in ActivityInfo

Is this Webinar for me?

  • Are you an M&E or Information Management practitioner needing to establish a new database for your organization but not sure where to start?
  • Do you want to gain a deeper understanding of what makes a good database design?
  • Do you want to see examples of effective database design that you can adopt for your own organization?

Then, watch our Webinar!

About the Trainer

About the Trainer

Mr. Jeric Kison earned his Bachelor's Degree from York University in Canada and his MBA from the University of Oxford in the United Kingdom. He has worked with NGOs and governments across four continents on strategy and evaluation for nine years. Before joining ActivityInfo he worked as a Monitoring & Evaluation Officer at Pilipinas Shell Foundation, Inc., where he led a project to develop an organizational M&E System which included the roll-out of ActivityInfo as the organization’s new information management system. He now joins us as our Implementation Specialist, bringing together his experience on the ground and passion for data to help our customers achieve success with ActivityInfo.

Transcript

Transcript

00:00:00 Introduction

All right, thanks so much, Jane. Welcome everyone to today's webinar. I'm really excited to talk to you all today about designing new databases and present to you how you might approach designing a new database using ActivityInfo.

So, what's on the agenda for today? We've started off with some introductions and housekeeping. From here, we'll begin with discussing some principles of good database design to set the foundation for how we should think about this. Then, we'll go into some practical steps on how we'll apply those principles to actually design an effective database. Then, we'll take a look at how we can apply these on the ActivityInfo platform specifically. We'll go right into the platform and demonstrate some databases, and we'll spend some time at the end for Q&A.

00:01:15 Why think about database design?

Let's dive right in. In this first section, we'll talk about how to design an effective database. The first question that we might ask ourselves is this: why think about database design to begin with? Why should we even bother spending time and effort thinking about the design of our database? Perhaps you've already got an Excel spreadsheet where you've got all of the information that you might need about your project or your program. You might think, "I've already organized my data into this nice table with all the rows and columns. I've got what I need, right? I should be ready to go."

Maybe you could go with this, but I would encourage you to pause and think because there might be a few reasons why you might want to invest time and effort upfront to really think about the design. First, thinking about database design helps us to maximize the efficiency of our data workflows. Increasing efficiency leads to a minimization in the duplication of our efforts, preventing us from having to repeatedly enter data many times over. It helps us to maximize the value of each data point that we store in our database.

Secondly, thinking about database design will help us to minimize the storage and minimize the overall size of the database. This is important on a practical level because, depending on which platform you use to host your database, it could mean significant reductions in the cost of maintaining it. Furthermore, it can help improve the performance of your database. Keeping your database to a minimal size speeds up the queries needed to generate the actual bits of information you need.

Finally, thinking about database design helps to minimize the need for future restructuring. This can be a real pain if you realize later on that the structure you developed in the beginning isn't working. In some cases, you might need to spend a lot of time moving things around, so thinking about your design thoughtfully upfront will save you time and effort later on.

00:03:35 What a good database looks like

So, what does a good database look like? A good database essentially has two key characteristics. On the one hand, a good database is one where you have minimal data redundancy, where you aren't entering the same bits of data over and over again. On the other hand, a good database allows for high data integrity. This means that your data are connected to each other and relate to each other in such a way that enables you to get a holistic picture of your overall project or program.

00:04:15 Poll results

I actually wanted to take a pause and do a poll to see where we are all coming from. Looking at the results regarding your proficiency in designing databases, I can see that most of you are either a beginner or at an intermediate level. You've certainly come to the right place. Hopefully, I'll be able to share with you some practical advice on how you can get started. Even if you already have some experience, hopefully, you can pick up some new things.

regarding how long you've been using ActivityInfo, I can see the majority actually haven't used it yet. This will be an excellent opportunity for you to learn more about ActivityInfo, which is a great platform for easily creating databases. For those who have been using it for some time, hopefully, you pick up some new things about the tools that you didn't know before to improve upon your existing databases.

00:05:45 Steps to designing an effective database

Let's move on to the practical steps on how you can approach designing an effective database. I think that we don't need to overcomplicate things; you can achieve an effective database in just five easy steps.

Step one: Identify entities. Step two: Identify attributes. Step three: Identify relationships. Step four: Assign keys. Step five: Normalize.

Let's go through each of these steps in further detail. Starting with step one: identify entities. An entity is a discrete data object, which is essentially the basic building block of your database. Most commonly, entities could represent people, objects, or events. For example, if we're working in a program that does training sessions and capacity building, we might have beneficiaries, training courses, and training sessions as our entities.

Step two is to identify attributes. An attribute is a characteristic that describes your entity. For beneficiaries, you might describe them by their name, date of birth, sex, or age. Training courses might be described by the name of the course, the instructor, and the location. Training sessions might be described by the date, the participants, and the number of participants. At this stage, it is useful to think about the data types for each attribute (text, dates, quantities, or defined lists).

Step three is identifying relationships. A relationship is essentially how entities relate to each other naturally. For example, each training course can be conducted over multiple sessions. Conversely, each training session might cover only one specific training course. A single beneficiary might attend multiple training sessions, and a single training session can be attended by multiple beneficiaries.

This brings us to the concept of cardinality: how many on one side of the relationship relate to how many on the other side. There are four types: one-to-one, one-to-many, many-to-one, and many-to-many. Once you've reflected on these relationships, you should visualize them to get a better handle on how the data links together. You can draw arrows between entities and use notations to represent cardinality.

Step four is assigning keys. A key is an attribute or a combination of attributes that you use to uniquely identify that specific entity. This ensures we reduce duplication and ensure high data integrity. For a beneficiary, you might use their name (or a unique ID). For training courses, the course name. For training sessions, perhaps the date. Choose the attributes that make the most sense for your data.

Step five is database normalization. This is the process of organizing your data to eliminate redundant data and improve integrity. Let's look at an example table with columns for training name, trainer, institute, location, and participants.

First Normal Form: Each attribute should have only one value. If you have multiple participants listed in a single cell, you should expand the table to create additional rows so each participant has their own cell.

Second Normal Form: All other values in our table should be functionally dependent on the whole primary key. If our key is the training name, but we have multiple rows for the same training because of different participants, we should break this into two separate tables. One table for training courses (unique rows per course) and a second table for participants (linking back to the course).

Third Normal Form: There should be no transitive functional dependencies. Attributes shouldn't depend on each other; they should only depend on the key. For example, "Training Location" depends more on the "Training Institute" than the specific "Training Course." We can break this down further into three tables: Training Courses, Participants, and Training Institutes (where location is associated with the institute).

00:14:50 Applying principles in ActivityInfo

In this next part, I'd like to introduce you to how you can apply these principles in ActivityInfo. ActivityInfo is an end-to-end solution for managing your data. You can use it to manage data (database design), collect data, and analyze data.

It is useful to think about the hierarchy of data within ActivityInfo. It starts at the very top level with the Database, which is your overall container. Within your database, you might organize your data into several Folders. Within folders, you can create multiple Forms to collect and store data. Within a form, you have Records, which are essentially the entities. Within each record, you have different Fields that describe your record (the attributes).

There are two more functionalities to understand: Reference Forms and Subforms. Reference forms can contain information used for context (like a list of locations) that can be linked to another form. Subforms are a way to connect a unique kind of relationship, specifically a parent and child relationship (one-to-many), within a single form structure.

00:17:30 Rules of thumb and recommendations

Here are some basic rules of thumb when configuring your database in ActivityInfo:

I also have some general recommendations:

00:20:45 Demonstration

I wanted to demonstrate very quickly how we can apply those steps using the training program example in ActivityInfo. I've used the "Trainings Monitoring Database" template.

In this database, you can see a "Reference" folder containing forms for "Training Institution" and "Training Courses." Under Training Institution, we have the list of institutes and their cities. Under Trainings, we have the course names and trainers.

Back in the main folder, we have a form for "Training Sessions." Here, I can see specific sessions, which course they cover, and when they happened. I've used the subform feature to capture who actually attended that particular session. If I click through the participants list, I can see the list of participants for that specific session.

00:22:15 Q&A

Question: Why do we not combine the columns for training location and training institutions?

Answer: This relates to the Third Normal Form and transitive functional dependency. We want to keep the structure as simple as possible. The "Training Location" (e.g., a city) is functionally dependent on the "Training Institute," not necessarily the "Training Course." An income generation course could be held at different institutes in different cities. By separating them, we ensure that location is associated with the institute, and the institute is associated with the course.

Question: Can you repeat the part about cardinality?

Answer: Cardinality helps us understand how entities relate to each other. For example, a training course might relate to multiple training sessions (one-to-many), or a single session might cover a single course (one-to-one). Understanding this helps you design your forms. If you have a one-to-many relationship, you might create a reference form for the "one" side (courses) and link it to the form for the "many" side (sessions).

Question: What is the difference between ActivityInfo and Microsoft Access?

Answer: ActivityInfo is an end-to-end solution that allows you to not only organize data but also collect and analyze it in real-time. It is a web-based platform that is very user-friendly. It doesn't require a lot of training or previous database experience to get started, unlike software options like Access.

00:26:30 Conclusion

I think it was a very useful webinar. We're going to share the recording soon with the emails you used to register. Don't forget to join our demo which takes place every last Tuesday of every month. Thanks everyone for your participation today. I hope you are now more confident in designing a database using ActivityInfo. Have a great afternoon. Bye!

Sign up for our newsletter

Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.

Which topics are you interested in?
Please check at least one of the following to continue.