Many email address validators will actually throw up errors when faced with a valid, but unusual, email address. Many, for example, assume that an email address with a domain name extension of more than three letters is invalid. However, new TLDs such as ".info", ".name" and ".aero" are perfectly valid but longer than three characters. Many email address validators fail to take into account that you do not necessarily need a domain name in an email address - an IP address is fine.
The first step to creating a PHP script for validating email addresses is to work out
exactly what is and is not valid. RFC 2822, that specifies what is and is not allowed in an email address, states that the form of an email address must be of the form "local-part @ domain".
The "local-part" of an email address must be between 1 and 64 characters in length and may be made up in any one of three ways. It can be made up of a selection of characters (and only these characters) from the following selection (though the period can not be the first of these):
- A to Z
- 0 to 9
- !
- #
- $
- %
- &
- '
- *
- +
- -
- /
- =
- ?
- ^
- _
- `
- {
- |
- }
- ~
- .
Or, it can be made up of a quoted string containing any characters except "\". Older email addresses may be made up differently, and may contain a combination of the above. The following are all valid as the first part of an email address:
- dave
- +1~1+
- {_dave_}
- "[[ dave ]]"
- dave."dave" (Note that this is considered an obsolete form of address - new addresses created should not be of this form, but it is still considered valid.)
The following, though similar, are all invalid:
- -- dave -- (spaces are invalid unless enclosed in quotation marks)
- [dave] (square brackets are invalid, unless contained within quotation marks)
- .dave (the local part of a domain name cannot start with a period)
The "domain" portion of the email address can also be made up in different ways. The most common form is a domain name, which is made up of a number of "labels", each separated by a period and between 1 and 63 characters in length. Labels may contain letters, digits and hyphens, however must not begin or end with a hyphen (officially, a label must begin with a letter, not a digit, however many domain names have been registered beginning with digits so for the purposes of validation we will assume that digits are allowed at the start of domain names). A domain name, technically, need be only one label. However in practice domain names are made up of at least two labels, so for the purposes of validation we will check for two. A domain name may not be over 255 characters in total. A domain portion of an email address may also be an IP address, which can in turn be enclosed in square brackets.
In order to check that email addresses conform to these guidelines, we'll need to use regular expressions. First, we need to match the three possible forms of the local part of an email address, using the two patterns below (we'll add in escape characters later, when we put the function together):
^[A-Za-z0-9!#$%&'*+-/=?^_`{|}~][A-Za-z0-9!#$%&'*+-/=?^_`{|}~\.]{0,63}$
^"[^(\|")]{0,62}"$
We can use the two patterns we've defined here to check for obsolete local parts of email addresses too, saving ourselves from needing a third pattern.
Next, we need to check the domain portion of the email address. It can either be an IP address or a domain name, so we can use the two patterns here to validate it:
^\[?[0-9\.]+\]?$
^[A-Za-z0-9][A-Za-z0-9-]*[A-Za-z0-9](.[A-Za-z0-9][A-Za-z0-9-]*[A-Za-z0-9])+$
The above pattern will match any valid domain name, but will also match an IP address, so we only need the above to check the "domain" portion of the email.
Putting it all together gives us the following function. Call it like any normal function, and you will get back a value of "true" if the string entered is a valid email address, or "false" if the input was an invalid email address.
function check_email_address($email) {
// First, we check that there's one @ symbol, and that the lengths are right
if (!ereg("^[^@]{1,64}@[^@]{1,255}$", $email)) {
// Email invalid because wrong number of characters in one section, or wrong number of @ symbols.
return false;
}
// Split it into sections to make life easier
$email_array = explode("@", $email);
$local_array = explode(".", $email_array[0]);
for ($i = 0; $i < sizeof($local_array); $i++) {
if (!ereg("^(([A-Za-z0-9!#$%&'*+/=?^_`{|}~-][A-Za-z0-9!#$%&'*+/=?^_`{|}~\.-]{0,63})|(\"[^(\\|\")]{0,62}\"))$", $local_array[$i])) {
return false;
}
}
if (!ereg("^\[?[0-9\.]+\]?$", $email_array[1])) { // Check if domain is IP. If not, it should be valid domain name
$domain_array = explode(".", $email_array[1]);
if (sizeof($domain_array) < 2) {
return false; // Not enough parts to domain
}
for ($i = 0; $i < sizeof($domain_array); $i++) {
if (!ereg("^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|([A-Za-z0-9]+))$", $domain_array[$i])) {
return false;
}
}
}
return true;
}
Using the function above is relatively simple, as you can see:
if (check_email_address($email)) {
echo $email . ' is a valid email address.';
} else {
echo $email . ' is not a valid email address.';
}
You can now validate email addresses entered into your site against the specifications that define email addresses (more or less - domain names that start with a number are supposed to be invalid, but do exist).
Finally, please do remember that because an email
looks valid does not mean it is in use. Using a script for validating email addresses is a good start to email address validation, but though it can tell you an email address is technically valid it cannot tell you if it is in use. You might benefit from checking in more depth, for example seeing if a domain name is registered. Even better, fire off an email to the address given by a user and get them to click a link to confirm it is real - the only way to be 100% sure.