# Validating phone numbers with regular expressions

Regular expressions are powerful tools for describing patterns in text, including phone numbers. You can use this function to validate whether a collected phone number is well-formed.

In this tutorial, we'll look at how to build up a regular expression step-by-step to validate collected phone numbers.

## Understanding the domain

Before writing a regular expression, it's important to understand the expected format of the text. In this example, we'll look at validating mobile phone numbers from Democratic Republic of the Congo (DRC).

In the DRC, the numbering scheme for mobile phones is as follows:

• Vodacom: 81X XXX XXX or 82X XXX XXX
• Supercell: 80X XXX XXX
• Congo-Chine Telecom: 84X XXX XXX
• Celltell: 97X XXX XXX, 98X XXX XXX. or 99X XXX XXX

Where "X" is any digit, 0-9.

We can use the patterns above to write a validation rule that tests for a valid mobile number.

## A simple regular expression

Let's start by matching the Vodacom numbers. Each Vodacom mobile number starts with "81" or "82" and another digit, and then two groups of three digits:

``````REGEXMATCH(PHONE_NUMBER, "8[12][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9]")
``````

Here we use the character class notation to specific a range of characters:

• `[12]` matches either "1" or "2"
• `[0-9]` matches any character between "0" and "9"

We can shorten this expression by using repeating character classes:

``````REGEXMATCH(PHONE_NUMBER, "8[12][0-9] [0-9]{3} [0-9]{3}")
``````

The `{3}` indicates that the character before must be matched exactly three times.

## Matching any of the mobile providers

The big difference between the different providers is the first group of digits:

Provider Pattern of first group
Vodacom `8[12][0-9]`
Supercell `80[0-9]`
Congo-Chine `84[0-9]`
Celltell `9[789][0-9]`

We can match any of the providers above by grouping them together using parentheses, and using the pipe operator (|) to specify alternatives:

``````REGEXMATCH(PHONE_NUMBER, "((8[12][0-9])|(80[0-9])|(84[0-9])|(9[789][0-9])) [0-9]{3} [0-9]{3}")
``````

Actually, the only difference between the first three providers is the second digit, so we could further simply to:

``````REGEXMATCH(PHONE_NUMBER, "((8[0124][0-9])|(9[789][0-9])) [0-9]{3} [0-9]{3}")
``````

However, this will match a phone number wherever it occurs in the text. So "801 343 933" will match, but so will "TEL: 801 343 933". We'd prefer that only the phone number is entered into the text field, so we can use `^` symbol to require matching at the beginning of the field, and `\$` to require matching at the end:

``````REGEXMATCH(PHONE_NUMBER, "^((8[0124][0-9])|(9[789][0-9])) [0-9]{3} [0-9]{3}\$")
``````

## Calculating the name of the mobile providers

We can also use the regular expression to calculate the name of the mobile provider from the phone number using the `IF` function together with the `REGEXMATCH` function:

``````IF(REGEXMATCH(PHONE_NUMBER, "^8[12][0-9]"), "Vodacom",
IF(REGEXMATCH(PHONE_NUMBER, "^80[0-9]"), "Supercell",
IF(REGEXMATCH(PHONE_NUMBER, "^84[0-9]"), "Congo-Chine",
IF(REGEXMATCH(PHONE_NUMBER, "^9[789][0-9]"), "Celltell", "Other"))))
``````

## Building the form

Now that we have our two formulas, we can add a data collection form in ActivityInfo.

3. Enter the form name "Registration"

### Add the phone number field

1. Add a text field with the name "Phone Number" and the code `PHONE_NUMBER`.
2. Check "Set Validation Rules"
3. Paste the validation formula above into the validation rules box, or open the formula editor

### Add the mobile provider field

1. Add a calculated field called "Mobile Provider"
2. In the formula editor, paste the IF-statement from above.

## Testing the form

When you open the form in data entry, typing a phone number that doesn't match the format for DRC should result in a validation error:

When the input does match, the field is validated, and the calculated field shows the correct mobile provider:

You can try the form yourself.