Learning Regular Expressions for validation rules and quality data
About the webinar
About the webinar
This Webinar is a one-hour session ideal for Monitoring and Evaluation professionals and Information Management Officers who are interested in learning more about Regular Expressions (RegEx).
In this session, we’ll take a look at Regular Expressions and how they can be used when working with data. We will also explore the use of Regular Expressions when writing Formulas in applications such as the ActivityInfo platform and present what we can achieve using them.
Some of the key points we will cover are:
- What Regular Expressions do and when to use them
- Using Regular Expressions in validation rules for data collection
- Practical examples using ActivityInfo
You can also download the presentation and test your Regular Expressions with our Regular expression tester.
Is this Webinar for me?
- Are you an M&E or IMO practitioner responsible for designing forms and databases for the programmes in your organization?
- Do you wish to get a better understanding of Regular Expressions and how they can improve data quality?
- Do you want to start using Regular Expressions in your forms to create more advanced structures?
Then, watch our Webinar!
About the Speaker
About the Speaker
Mr. Alexander Bertram, Technical Director of BeDataDriven and founder of ActivityInfo, is a graduate of the American University's School of International Service and started his career in international assistance fifteen years ago working with IOM in Kunduz, Afghanistan and later worked as an Information Management officer with UNICEF in DR Congo. With UNICEF, frustrated with the time required to build data collection systems for each new programme, he worked on the team that developed ActivityInfo, a simplified platform for M&E data collection. In 2010, he left UNICEF to start BeDataDriven and develop ActivityInfo full time. Since then, he has worked with organizations in more than 50 countries to deploy ActivityInfo for monitoring & evaluation.
Transcript
Transcript
00:00:00
Introduction
Hello everyone and welcome to another webinar. Today's webinar is "Learning Regular Expressions for validation rules and quality data." My name is Faida Liyari and I'm working with the ActivityInfo team. Among others, I work on producing various guides and documentation for the platform. Together with Mr. Alex Bertram, the technical director of BeDataDriven and founder of ActivityInfo, we will be hosting this webinar, broadcasting from The Hague, the Netherlands.
We're excited to see people joining us from all over the world. Before we start, I would like to share some housekeeping rules. Your microphone is muted, and you should all be able to see the shared screen. The webinar is being recorded, and you will receive the recording after the session. The recording will also be available on our website in a few days, where you can also find recordings of previous webinars. During the webinar, you can send your questions via the Q&A section found in the toolbar at the bottom of the screen.
I would like to introduce Mr. Alex Bertram, who will be presenting today. Mr. Bertram is a graduate of the American University's School of International Service and started his career in international assistance 15 years ago working with IOM in Afghanistan and later as an Information Management Officer with UNICEF in DR Congo. Frustrated with the time required to build data collection systems for each new programme, he worked on the team that developed ActivityInfo. In 2010, he started BeDataDriven to develop ActivityInfo full-time. Welcome, Alex.
00:01:56
What is a regular expression?
Thanks, everybody. Welcome to this webinar. It is a very technical webinar today, but I thought this was an interesting tool for people who are doing data collection and are interested in data quality. Today, we are going to look at what a Regular Expression (RegEx) is. We will go through some basic syntax and some extended syntax. I will give a couple of examples while working through these, but I really want to get your participation. If there are things you would like to see regarding how to apply Regular Expressions in your work, please put them in the chat.
A regular expression is a sequence of characters that specifies a pattern to text. This is useful in the context of M&E anytime you are collecting something as text but it needs to have a certain format. Think of phone numbers, emails, web addresses, or registration numbers and national IDs. Regular expressions are a tool to help you check that what is being entered on a mobile device or online form fits a certain pattern. If you have a good regular expression, you can avoid having to redo data collection and ensure you collect data right the first time.
Let's look at a simple regular expression. If I want to define a search pattern for an email address, it is hard for a computer to understand "some letters, then an at sign, then more letters." Regular expressions specify the pattern. There are two different types of characters involved. First, we have literal characters. For example, an 'a' and a 'b' mean exactly that letter; the text must start with an 'a' followed by a 'b'.
However, literal characters aren't enough for things like phone numbers where the specific numbers change. This is why regular expressions feature metacharacters. These are special characters that refer to broader patterns. For instance, a period (or dot) matches any character. If we have the pattern ab.d, it matches abcd, abxd, or ab9d. It combines literal characters with a metacharacter that accepts anything in that position.
00:07:00
Escaping characters
Since metacharacters like the period have special meanings, you might wonder what happens if you actually want to match a period, like in a website address. To do this, we use "escaping." You use a backslash (\) to tell the computer that you are not talking about a metacharacter, but a literal character. So, \. tells the computer to look for an actual dot.
00:08:09
Basic syntax and character classes
We can also match a range of characters using square brackets []. This is called a character class. For example, [0123456789] matches any single digit. A shorter way to write this is using a range: [0-9]. This works for letters as well; [A-Z] matches any uppercase letter. You can combine these ranges within the brackets, such as [0-9A-Za-z], which matches any alphanumeric character. Finally, you can invert the range by putting a caret (^) at the start. [^abc] means match any character that is not a lowercase a, b, or c.
Let's look at some illustrations using a testing tool. If I create a character class [0-9], entering '1' matches, but 'a' does not. If I want to match a class name like "6A", I can use a literal number followed by a character class: 6[A-Z]. This ensures the code looks right—it must be a 6 followed by an uppercase letter. If I enter "6b" (lowercase), it won't match if the range is only uppercase.
00:13:30
Quantifiers
Often, you have repeated characters. For a 10-digit phone number, you don't want to write [0-9] ten times. We use quantifiers to specify how many times a pattern should be matched. A quantifier applies to the previous character or group.
Let's look at an example. In the US, Social Security numbers have a specific format: three digits, a dash, two digits, a dash, and four digits. The pattern would look like [0-9]{3}-[0-9]{2}-[0-9]{4}. The dashes here are literals. If we wanted to make the dashes optional, we could add a question mark after them: [0-9]{3}-?[0-9]{2}-?[0-9]{4}.
00:20:13
Syntax shortcuts
There are shortcuts for common character classes to save time:
Let's apply this to Dutch mobile numbers, which start with 06 followed by 8 digits. The pattern would be 06\d{8}. This ensures the user enters exactly the right amount of numbers starting with the correct prefix.
00:23:10
Email address validation
Email addresses are complex, but we can use patterns to reduce errors. A simple pattern might be .+@.+\..+. This means:
This will match alex@bedatadriven.com or complex emails like mail.bedatadriven.co.ae. While you can find extremely long RegEx patterns online for emails, a simple one like this usually suffices to weed out major data entry errors.
00:26:40
Groups and logic
You can apply quantifiers to a whole group of patterns using parentheses (). For example, (06)+ would match "06", "0606", etc. Within groups, you can use the pipe symbol | to represent "OR".
Let's look at a real-world example from the DR Congo. Different mobile providers have different prefixes. For instance, Airtel might use 081 and Vodacom might use 082. We can group these: 08(1|2)\d{7}. This means the number must start with 08, followed by either a 1 or a 2, followed by the rest of the digits. This allows you to validate numbers across multiple providers with specific prefixes.
00:32:14
Using RegEx in ActivityInfo
In ActivityInfo, you can apply these rules in the Form Settings. It is best to treat phone numbers as text fields, not quantity fields. In the validation rules section, you can use the REGEXMATCH function. This function takes the field code and the regular expression pattern.
For an email field, you would enter REGEXMATCH(email, ".+@.+\..+"). For a Dutch phone number, you might use REGEXMATCH(phone, "06\d{8}"). When a user enters data, the system checks against this pattern. If the input doesn't match (e.g., entering "Alex" without the @ symbol), the system will show a validation error and prevent saving until it is corrected.
00:35:35
Q&A: Validation and calculated fields
Q: How do you remember all these rules? A: These expressions have been around for a long time, but they can be tricky. We can provide a "cheat sheet" with the most common patterns and quantifiers to help you.
Q: Can we create calculated fields using Regular Expressions? A: Yes. For example, you can determine if a phone number is a mobile or landline based on the prefix. You can use an IF statement combined with REGEXMATCH. Formula: IF(REGEXMATCH(phone, "^06"), "Mobile", "Landline") This checks if the number starts with 06 (using the ^ assertion for the start of the string) and labels it accordingly.
Q: Can we extract information, like the domain from an email? A: Yes, using REGEXEXTRACT. This function works similarly to Google Sheets. You provide the text and the pattern you want to extract. For example, to get the domain, you might use a pattern that looks for text following the @ symbol.
00:44:22
Q&A: Word boundaries and assertions
Q: What if I want to match the end of a word or the beginning? A: You can use assertions.
For example, if you want to search for the word "safe" in a narrative field but don't want to match "fail-safe" or "safety", you would use \b. The pattern \bsafe\b ensures that "safe" is matched only when it stands alone as a word.
00:47:45
Q&A: Other validation methods
Q: Are regular expressions the only way to set rules? A: No. ActivityInfo supports many validation types. For numeric fields, you can use simple comparisons like count > 0 or count < 20. For dates, you can compare fields, such as ensuring an End Date is greater than a Start Date. Regular expressions are specifically powerful for matching text patterns.
00:50:00
Q&A: Creating unique IDs
Q: Can we use RegEx to create a unique ID from multiple fields (e.g., Partner Name + Date)? A: You don't necessarily need RegEx for this. You can use the CONCATENATE function to combine fields into a string (e.g., CONCAT(PartnerName, "-", Month)). Furthermore, ActivityInfo allows you to set "Key Fields." If you make both "Partner Name" and "Month" key fields, the system will automatically prevent duplicate records for that specific combination, acting as a unique ID constraint.
00:56:15
Conclusion
Thank you so much for all your questions and for listening. We will share the slides and a cheat sheet after the call. Hopefully, that will be useful for you. Have a great afternoon.
Sign up for our newsletter
Sign up for our newsletter and get notified about new resources on M&E and other interesting articles and ActivityInfo news.