Ars Informatica
June 23, 2017
Home
Health Care Informatics
Web-based MySQL/PHP Databasing
Programming
Web Development
Favourite Software
Hardware for the Frugal Fanatic
Graphic Design and Image Processing
Free Scripts and Software
About Us
Contact Us
Links
 
 

 Article Feed for this site

E-mail address validation using PHP preg_match

November 13, 2006

This little one-liner doesn't require clarification for those who eat, drink, and sleep PHP. It's here for PHP newcomers.

Your web page has a user-submittable form, and one of the data fields you require is that for an e-mail address (named "sender", in this example). How do you know that was entered? That it really was a valid e-mail address? Here's how:

if (!preg_match("/^[A-Z0-9._%-]+@[A-Z0-9][A-Z0-9.-]{0,61}[A-Z0-9]\.[A-Z]{2,6}$/i",
    $_POST['sender'])) echo '<b>Please enter a valid e-mail address</b>';

Only one expression handles all of the checking required to ensure an e-mail string is properly formatted. Parsing the code by hand, bit by bit, would take both you and the computer much longer.

The preg_match('pattern', 'subject') function searches a subject for a match to given pattern. This pattern is expressed as a 'regular expression' or regex, which is matched against the subject string. Most characters stand for themselves; character combinations, alternate combinations, characters or character combinations that need be excluded, special characters, etc. can all be defined.

As in the example above, quotes enclose the expression. Delimiters, i.e. the most commonly used forward slash /, demarcate the actually search pattern; pattern modifiers after the second delimiter further refine how matches are made. Above, the i indicates that the search is a case-insensitive one.

The caret ^ at the start of the pattern string denotes that the match must start with the first character of the subject string, a so-called 'anchored' pattern. The dollar $ character at the end of the pattern string requires that the end of the pattern must match the end of the subject. In other words, for the 'sender' e-mail address to be accepted as valid, it must exactly match the entire regex pattern.

Characters within square brackets, i.e. [A-Z0-9._%-] form a 'class' to which characters in the subject must match. That is, the first character in the subject must be one of characters A to Z, 0 to 9, or characters ., _, % and -. The + immediately after the character class indicates more than one character may match this class, i.e. the typical username. And the i pattern modifier, as noted above, makes the search case-insensitive: both lower-case and upper-case characters will match.

The first character class matches the username. Next, the @ character to match the one in the e-mail address.

After this, the domain name. This should start with either a letter or number, i.e. [A-Z0-9]. After the initial character, we also allow periods and hyphens within the domain name, for which we'll allow another 0 to 61 characters in length, i.e. [A-Z0-9.-]{0,61}. The minimum and maximum number of permitted matches are enclosed in curly braces.

The domain name ends in a 2- to 6-letter suffix, preceded by a period. Because periods outside character classes have a separate meaning (matches any character except 'newline'), the period is 'escaped' by a preceding backslash, i.e. \.[A-Z]{2,6}

Finally, if (!preg_match()) is short-hand for if (NOT preg_match) i.e. if no match is found, the error message is printed to screen.

Preg_match is a PERL-compatible function. PHP also supports POSIX Extended regular expression functions, which are somewhat easier to learn, though not much: the underlying principles are hardest, and those are essentially the same for both types of regex functions. POSIX regex functions do not support non-greedy matching, assertions, conditional subpatterns, they are not binary-safe, and lack some other features.

If you anticipate the need to really use regular expressions, bite the bullet, and learn PERL-Compatible Regular Expressions, or PCRE. If you don't: the POSIX equivalent to preg_match() is the eregi() function.

I've just scratched the surface of the very powerful searching/matching that regular expressions are capable of. They really shine when searching databases - need to search for variations on several names, i.e. plurals, abbreviations, word combinations. Creating the regex string may be more complicated (at least, at first), but it is far faster than running a complex hand-crafted SQL or MySQL query through a database multiple times.

A very good and comprehensive tutorial suitable for the rawest newbie may be found at http://www.regularexpressions.info/tutorial.html. For what it's worth, I'll still refer to it at times.

The PHP manual is also very useful, especially the sections on Regular Expressions Functions, Pattern Syntax, and preg_match. The user-contributed Notes at the bottom of each page provide many practical examples. Finally, MySQL has its own internal registered expressions capability. Find the manual at http://dev.mysql.com/doc/index.html; look under regex.

Finally, Ars Informatica offers a simple, modifiable contact form you can drop into your PHP pages, one that incorporates this e-mail validation, adds handling of the POSTed information, error-checking, and basic protection against e-mail injection attacks.

Copyright © 2017 Ars Informatica. All Rights Reserved.