eregi

(PHP 4, PHP 5)

eregi — 不区分大小写的正则表达式匹配

说明

int eregi ( string $pattern , string $string [, array &$regs ] )

本函数和 ereg() 完全相同，只除了在匹配字母字符时忽略大小写的区别。

Example #1 eregi() 例子


<?php
$string = 'XYZ';
   if (eregi('z', $string)) {
    echo "'$string' contains a 'z' or 'Z'!";
}
?>

参见 ereg()，ereg_replace()，eregi_replace()，stripos() 和 stristr()。

User Contributed Notes

info at o08 dot com 15-Apr-2010 06:12


because eregi is not recommended after php 5, you can replaced it with stristr if just for simple search.



For editors with regular express function:

eregi\(([^,]*),([^)]*)\)

stristr(\2,\1)

julia_kean25 at yahoo dot com 10-Feb-2009 11:15


I use this in my app_config.php file to sanitize each request:



<?php

// app_config.php



    /**

    *    SANITIZE REQUEST

    */



    function sanitize_request($methods, $array)

    {

        // methods: trim ; addslashes ; stripslashes ; etc...

        // array : $_GET ; $_POST ; etc...



        foreach ($methods as $function) {

            $array = array_map($function, $array);

        }

        return $array;

    }



    if ( ! get_magic_quotes_gpc() )

    {

        $methods = array('trim', 'addslashes');

        $_GET = sanitize_request($methods, $_GET);

        $_POST = sanitize_request($methods, $_POST);

        $_COOKIE = sanitize_request($methods, $_COOKIE);

        $_REQUEST = sanitize_request($methods, $_REQUEST);

    }

?>



it currently only trims and adds slashes to the request but it would be nice to have the possibility to add the striptags function too.

jrgpmaster at gmail dot com 13-Sep-2008 10:04


Here is a simple way of checking if the visitor if your page is a search engine or a normal person. It does this by checking if the user agent returned by $_SERVER['HTTP_USER_AGENT'] contains one of the keywords search engine's user agents usually contain.



<?php



    //check if user is a bot of some sort

function is_bot()

{

    $bots = array('google','yahoo','msn');

    //takes the list above and returns (google)|(yahoo)|(msn)

    $regex = '('.implode($bots, ')|(').')';

    /*uses the generated regex above to see if those keywords are contained in the user agent variable*/      

    return eregi($regex, $_SERVER['HTTP_USER_AGENT']);

}



?>

sumit270 at gmail dot com 12-Jun-2008 06:45


Prevent XXS attack



<?php

// Prevent any possible XSS attacks via $_GET.

foreach ($_GET as $check_url) {

    if ((eregi("<[^>]*script*\"?[^>]*>", $check_url)) || (eregi("<[^>]*object*\"?[^>]*>", $check_url)) ||

        (eregi("<[^>]*iframe*\"?[^>]*>", $check_url)) || (eregi("<[^>]*applet*\"?[^>]*>", $check_url)) ||

        (eregi("<[^>]*meta*\"?[^>]*>", $check_url)) || (eregi("<[^>]*style*\"?[^>]*>", $check_url)) ||

        (eregi("<[^>]*form*\"?[^>]*>", $check_url)) || (eregi("\([^>]*\"?[^)]*\)", $check_url)) ||

        (eregi("\"", $check_url))) {

    die ();

    }

}

unset($check_url);

?>

Jeff Morris 30-May-2008 08:06


Email Address RegEx -- The Final Frontier?



Inspired by bobocop's stalwart effort. Cheers for that matey!



Contrary to most folks' expectation, a quoted @ character is permitted in the

local part of an email address. So strictly speaking bobocop's test result for

'@exam@exam.com' is ...inconclusive?



The RFC prohibits control characters in the address. So it's no coincidence

that most header-related exploits try to inject control characters into the

fields sent to the server. If we're validating client-side, we need to ensure

user input is restricted to the printable code set. And in the spirit of not

trusting anything inbound, we need to filter again server-side. It's handy

to have the same regex working at both ends.



My variant of bobocop's regex is listed below. Note the mask for the local

part matches any printable character *excluding the dot*. The dot is reserved

as a label separator. Bobocop's regex enforces that role while ensuring the

local part does not start or end with a dot.



Outside of the 7-bit ASCII and dot rules, the RFC says 'anything goes' in the

local part. Them's the breaks folks.



All we need to realise is that our endeavours are limited, and the nearest

we'll get to validating an email address is finding an MX record in DNS.

Whatever, don't go probing mail servers with test emails, you might get more

than you bargained for. That's sp@mmer territory, that is.

If you want to positively vet a mail server, consider running a check against

sbl-xbl.spamhaus.org. Search for the checkdnsrr function page on this site and

read the comments for good info.



Anyhoo, here's the modded regex builder:



//the variables

$local            = '[\x20-\x2D\x2F-\x7E]';

$alnum            = 'a-z0-9';

$domain            = "([$alnum]([-$alnum]*[$alnum]+)?)";



//the array

$arr            = array();

$arr['start']    = '^';

$arr['local']    = "$local+(\.$local+)*";

$arr['at']        = '@';

$arr['domain']    = "($domain{1,63}\.)+";

$arr['tld']        = "[$alnum]{2,6}";

$arr['end']        = '$';



//the regex

$regex            = implode('',$arr);



/**



$regex evaluates to:



^[\x20-\x2D\x2F-\x7E]+(\.[\x20-\x2D\x2F-\x7E]+)*@

(([a-z0-9]([-a-z0-9]*[a-z0-9]+)?){1,63}\.)+[a-z0-9]{2,6}$



(regex split into 2 lines due to line length limits)



Add virgules front and back for the javascript equivalent.



I'm running this in an AJAX app right now and it does what it says on the tin.



If you're uncomfortable with the character length limits on domain and tld

names, change them to taste.



**/

ted devito 03-May-2008 01:29


RE: validate a url

--------------------

based on "ian at hyperborea dot co dot uk" below...

original date: 10-Nov-2004 03:15



I added a test for http(s?) and ftp as well as a trailing slash on urls that don't specify a page. 



now it allows...

http://test.com/

https://www.test.com



$domain = "(http(s?):\/\/|ftp:\/\/)*([[:alpha:]][-[:alnum:]]*[[:alnum:]])

    (\.[[:alpha:]][-[:alnum:]]*[[:alpha:]])+";

$dir = "(/[[:alpha:]][-[:alnum:]]*[[:alnum:]])*";

$trailingslash  = "(\/?)";

$page = "(/[[:alpha:]][-[:alnum:]]*\.[[:alpha:]]{3,5})?";

$getstring = "(\?([[:alnum:]][-_%[:alnum:]]*=[-_%[:alnum:]]+)

    (&([[:alnum:]][-_%[:alnum:]]*=[-_%[:alnum:]]+))*)?";

$pattern = "^".$domain.$dir.$trailingslash.$page.$getstring."$";

mbfreight atthe gmail place 03-Jan-2008 11:53


keran at kiwi-interactive dot com wrote (5ish years ago) 07-Mar-2003 08:21



$feedback = "Error: $email isn't a valid mail address!";

return $feedback;

-- and -- 

$feedback = "Error: $domain isn't a valid domain!";

return $feedback;



I've been crushed with patching up XSS and anytime you get user input, it's best to just not show it back to them if possible. The auditor loves throwing these at me: >"><script>alert(123)</script><"  in the url, in forms, everywhere. Some looking around and you can find and build an amazing testing string. 



I have found that using htmlentities($user_input) isn't enough, either. There are a few tricks that can help like

        // from http://us3.php.net/manual/en/function.strip-tags.php

    while($input != strip_tags($input)) {

            $input = strip_tags($input);

        }



In my case, I'm starting off by testing for <[tag]> as well as keyword() and then do some preg_replace ing.

shwetank dot sharma86 at gmail dot com 07-Dec-2007 02:13


we define some notficatio for making the Expression for eregi(exp, string)



so first of all syntax

   [ ]  this brakit  used to define chracters

   eg  [a-z], [0-9]

  { } this brakit used ti define range

  eg  {1,3} 

if you wanna to make a expresion which take which take maximum 

three digit no

        "^[0-9]{1,3}$"



and if make exp for only three digit no than

 

       "^[0-9]{3}$"



ok if any problem mail me

c00lways at gmail dot com 11-Mar-2007 03:10


hodsfords:



i love your expression,

and i've came out with a solutions which does not need to set the number of times {1,3} for the domain.

it can accept unlimited number of times, but @ least 1 time .com / .xxx



$exp = "^[a-z0-9]+[a-z0-9\?\.\+-_]*" . 

@[a-z0-9_-]+(\.[a-z0-9_-]+)*\.[a-z]+$";

m at tthew dot org dot uk 07-Nov-2006 11:47


This example checks for a valid IP address or CIDR notation address range. (Thanks Walo for just the start I needed.)



The reg exp is too long to post in the code. It is:

^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}(/[0-9]{1,2}){0,1}$

(substitute for EXPR)



<?php



function checkValidIp($cidr) {



    // Checks for a valid IP address or optionally a cidr notation range

    // e.g. 1.2.3.4 or 1.2.3.0/24



   if(!eregi("EXPR", $cidr)) {

       $return = FALSE;

   } else {

       $return = TRUE;

   }

    

    if ( $return == TRUE ) {



        $parts = explode("/", $cidr);

        $ip = $parts[0];

        $netmask = $parts[1];

        $octets = explode(".", $ip);



        foreach ( $octets AS $octet ) {

            if ( $octet > 255 ) {

                $return = FALSE;

            }

        }



        if ( ( $netmask != "" ) && ( $netmask > 32 ) ) {

            $return = FALSE;

        }



    }



    return $return;



}



?>

benjohnson{-at-}maine{-dot-}rr{-dot-}com 11-Sep-2006 11:33


It's probably worth noting that eregi() (and most likely, the related variations) appears to have a 255-character limit with respect to the length of the input it will attempt to parse.



If you try to do something like



if (!eregi("^[a-zA-Z0-9]{0,256}$", $text)) { ...



eregi() will return FALSE, irrespective of whether or not the input matches the pattern.

markus dot sipila at no dot spam dot iki dot fi dot invalid 02-Aug-2006 11:29


One more comment about email validation and usability of validators.



The fact that RFC 2822 allows broader set of characters in email addresses than typically used makes things quite challenging usability wise. 



A very common usability problem with email validators is that they do not accept all valid addresses (such as foo{bar}.baz!@example.com. Almost as  common problem is that the validator only checks that the syntax is valid and passes addresses like foo#@example.com without any warning. Even though foo#@example.com is syntactically valid it might just as well be a typo of foo@example.com.



I resolved this usability challenge by doing the validation in two phases. In the first phase the address is validated so that it can't include exotic characters like { or |. Most addresses pass this validation. 



If they don't, they are validated with the other validator that allows all RFC-compliant addresses. In this case the validator shows a message that the address is syntactically valid but it recommends to double check it for typos.



An example without regexps:

<?php

if (eregi($normal, $email)) {

  echo("The address $email is valid and looks normal.");

}



else if (eregi($validButRare, $email)) {

  echo("The address $email looks a bit strange but it is syntactically valid. You might want to check it for typos.");

}



else {

  echo("The address $email is not valid.");

}

?>



The full article with the regexps and demo can be found at http://www.iki.fi/markus.sipila/pub/emailvalidator.php

tim at rocketry dot org 30-Apr-2006 06:32


The easiest way I've found to validate a properly formed email address is this:



if(!eregi("^[[:alnum:]][a-z0-9_.-]*@[a-z0-9.-]+\.[a-z]{2,4}$", $_POST['EmailAddress'])) {

     echo "<p>Not a valid email address</p>\n";

}



It basically just wants to see some alphanumeric characters + an @ sign + a . + 2 to 4 alpha characters. So far it has done what I need for quite a while now.. Hope that helps someone. :)



Tim

info at stenschke dot com 03-Feb-2006 01:39


i needed a function to find hyperlinks containing a url as text of the hyperlink, exceeding a given maximum length. 

here my function to finds too long hyperlinks and insert <br />s where needed into the linktext:



    function breakTooLongLinks($text,$maxLen) {

        //find hyperlinks that contain too many chars & insert <br>s where neccessary        

        $pattern= '[>]www[.].*/*(.doc|.pdf|.htm|.html|.shtml|.php|.asp)(</a>|</A>)';

        $match= eregi($pattern, $text, $regs);

        if ($match) {

            foreach ($regs as $link) {

                if (strlen($link)>$maxLen) {

                    $linkParts= explode('/',$link);

                    $linkRepl= array(); $replI=0; $curLinkPart='';

                    foreach($linkParts as $linkPart) {

                        $curLinkPart.= $linkPart.'/';

                        if (strlen($curLinkPart)>$maxLen) { 

                            $linkRepl[]= $curLinkPart;

                            $curLinkPart=''; 

                        }

                    }

                    $linkRepl= implode('<br />',$linkRepl);                    

                    $text= str_replace($link, $linkRepl, $text);

                }

            }

        }

        return $text;

    }

opedroso at swoptimizer dot com 07-Jan-2006 02:13


I you are trying to match accented characters (e.g. you want to match "Caf? Ol?"), you can use the following regex:



"[a-z?-? ]+"



This takes advantage of the fact that in the iso-8859-1 "Latin 1", the "?" is the first accented char and the "?", the last one.



By the way, the regex above will match this string as a single match:



"aeiou ????? ????? ????? ?? ? ????? ?"



Beware that your local code page may include a few more char codes in the range "[?-?]", so it may match more than you expect.

Jason Smart knarlin at yahoo dot com dot au 15-Oct-2005 07:28


There seems to be a lot of confusion about escaping special characters within a bracket expression enclosed in []



The reference is the regex man page at http://www.tin.org/bin/man.cgi?section=7&topic=regex , Linked from the "Regular Expression Functions (POSIX Extended)" page in the PHP manual at http://php.planetmirror.com/manual/en/ref.regex.php



The simple rule is there should be no backslash `\' in a bracket expression unless you are matching it literally.

ALL special characters (the few exceptions are listed below), lose their special significance in a bracket expression.

This includes .[$()|*+?{\ all of which have special meaning outside of a bracket expression, inside they will match LITERALLY!



The backslash character can never escape anything in a bracket expression including itself! It is always interpreted literally.



Here are the exceptions



^ at the start of the bracket expression will match on all but the set of characters within the bracket expression



- (hyphen) Indicates range, so [a-z] will match a single character in the range a-z (case insensitive for eregi).  

  To match a literal `-' it must be at the start or end of the bracket expression or the end of a range, or enclosed in [. .] to be the start of a range.

  Examples:

  [0-9-]

  [-0-9]

  Both the above will match a single character from the set comprising the digits and hyphen `-'

  

  [!--] is the same as [!"#$%&'()*+,-]  

  i.e. it matches a single character from the range of characters from `!' to hyphen `-'

  

  [[.-.]-9] is the same as [-./0123456789] or [./0123456789-]

  and will match a single digit from the range of characters from hyphen `-' to digit 9  



] The closing bracket indicates the end of a bracket expression but not if it immediately follows the opening bracket `[' or `[^' in that case the closing bracket is matched literally.

  Examples:

  [][0-9] is the same as []0-9[]

  and matches a single character from the set of digits and a literal closing bracket `]' and opening bracket `]'

  

  [^][0-9] is the same as [^]0-9[]

  and matches a single character from the group of all characters except the set of digits and literal `]' or `['



Look at the regex man page for more on the above and the following special cases



[.characters.]

  A collating element

[=characters=]

  An equivalence class

[:alnum:]

  A character class

  

That's it for special cases! ALL other characters match literally within a bracket expression and can never be escaped with `\'



Here is a list of some of the postings here that have mistakenly tried to use special characters within a bracket expression.

28-Jan-2002 12:56, 09-Feb-2002 06:28, 24-Sep-2002 01:21, 08-Mar-2003 02:21, 11-Apr-2003 03:22 (uses `|' as if it was `or' it is not, it is a literal `|'), 27-Jun-2003 04:03, 06-Sep-2003 07:17, 20-Apr-2004 04:55, 24-Nov-2004 09:12, 25-Nov-2004 02:11, 06-Jan-2005 08:54, 24-Feb-2005 06:05



It seems people are copying each others mistakes so this needs to be cleared up. Most of the mistakes attempt to escape special characters within a bracket expression with backslash. That never works! A simple fix is to simply remove any backslashes within the bracket expression that are not meant to be matched literally.



The attempt to use | as 'or' within a bracketed expression should be put in parentheses instead.



I used a simple script attached to a form to test these results. Here's the script.

<?php

  if(isset($_POST['pattern']) && isset($_POST['teststring'])) {

    $pattern = stripslashes($_POST['pattern']);

    $teststring = stripslashes($_POST['teststring']);

    $match = ereg($pattern, $teststring, $regs);

    if(!$match) $match = 0;

    echo('<p>The regular expression is: '.$pattern.'</p>');

    echo('<p>The string for testing is: '.$teststring.'</p>');

    echo('<p>The number of matching characters is: '.$match.'</p>');

  }

?>

Just use a form with input fields named "pattern" and "teststring"



Hope this helps and clears up some misconceptions

Jason Smart

Monash University Student

manuel at cvam dot com dot ar 28-Jul-2005 12:47


Please note that event though email pattern submitted by bobocop at bobocop dot cz below is RFC compliant, actual domain names are not! 



They do start with numbers (as in 1800flowers.com, etc), so the second line, domain, should be:

<?php

$domain = '([a-z0-9]([-a-z0-9]*[a-z0-9]+)?)';

?>

(everything else just like bobocop's comment from 02-May-2005 11:09)

somebody at somewhere dot com 10-May-2005 06:35


Say you want to validate a random sized string (at least one character though), and make sure it DOESN't contains anything BUT these characters: [:space:]a-zA-Z0-9_.-



function valid_name($name)

{

    

   // return FALSE if it contains characters which 

   // which ARNT on the specified list

   if(ereg('[^[:space:]a-zA-Z0-9_.-]{1,}', $name))

   {

      return false;

   } 

   else 

   {

      return true;

   }



}

bobocop at bobocop dot cz 02-May-2005 05:09


<?php

// Completely update for match RFC 2822 and RFC 1035

// http://www.faqs.org/rfcs/rfc2822.html

// http://www.faqs.org/rfcs/rfc1035.html



// Example results:



$email[] = 'foo@example.com';                      // matched

$email[] = 'foo.bar@example.co.uk';                // matched

$email[] = 'foo_bar@example.com';                  // matched

$email[] = '_foo_bar@example.com';                 // matched

$email[] = 'foo@example.example';                  // matched

$email[] = '%#a+f.*&654_-._@ee.xx';                // matched

$email[] = 'foo@abc-123.xx';                       // matched

$email[] = 'a@a.a.a.a.aa';                         // matched

$email[] = 'a@a9.aa';                              // matched

$email[] = 'a!b#c$d%e^f&g*h\'i+j-k{l|m}n_/@op.qr'; //matched



$email[] = '';                                     //separator



$email[] = 'foo@-example.com';                     // not matched

$email[] = 'foo@example-.com';                     // not matched

$email[] = '%#af.*&@a%#b.xx';                      // not matched

$email[] = 'a@a.99.00.a.aa';                       // not matched

$email[] = '_-._@-.--';                            // not matched

$email[] = 'any..thing@bla.bla';                   // not matched

$email[] = '@.';                                   // not matched

$email[] = '@.com';                                // not matched

$email[] = '@exam@exam.com';                       // not matched

$email[] = ' @ .com';                              // not matched

$email[] = '.bar@example.com';                     // not matched

$email[] = 'foo.@example.com';                     // not matched

$email[] = 'foo@example.x';                        // not matched



$atom = '[-a-z0-9!#$%&\'*+/=?^_`{|}~]';    // allowed characters for part before "at" character

$domain = '([a-z]([-a-z0-9]*[a-z0-9]+)?)'; // allowed characters for part after "at" character



$regex = '^' . $atom . '+' .         // One or more atom characters.

'(\.' . $atom . '+)*'.               // Followed by zero or more dot separated sets of one or more atom characters.

'@'.                                 // Followed by an "at" character.

'(' . $domain . '{1,63}\.)+'.        // Followed by one or max 63 domain characters (dot separated).

$domain . '{2,63}'.                  // Must be followed by one set consisting a period of two

'$';                                 // or max 63 domain characters.



foreach ($email as $example) {

    if (strlen($example) == 0):

        echo '&nbsp;<br>';

    else:

      if (eregi($regex, $example)):

       echo $example . ' matched<br>';

      else:

       echo '<strong>'. $example . ' not matched</strong><br>';

      endif;

    endif;

}

?>

Sergio Santana: ssantana at tlaloc dot imta dot mx 02-Mar-2005 01:32


Sometimes we require checking the syntax of 

floating point numbers. Here is a php-code that 

performs this task.



// To call this program:

//   php floats.php

//--------------------------

// Example of the use of regular expressions for checking 

// the syntax of floating point numbers.

<?php

// This opens standard in ready for interactive input..

$STDIN = fopen("php://stdin","r");



for(;;) { // FOREVER-LOOP

  echo "Float>";

  $s = trim(fgets($STDIN,256));

  if ($s) {

    $le = 

      eregi(

       "-?(([0-9]+e-?[0-9]+)|((([0-9]+\.[0-9]*)|(\.[0-9]+))" .

       "(e-?[0-9]+)?))",

         $s, $regs);

    if ($le==strlen($s))

      echo "Yes it's float\n";

    else if ($le > 0)

      echo "$le chars of the string correspond to a float\n";

    else

      echo "No, it isn't float\n";

  }

  else

    break;

}



fclose($STDIN);



?>

krum at estnet dot bg 23-Feb-2005 09:05


I was searching for a fast way to check all fields of a form, so I can

include the $_POST information directly to the sql query. So I wrote

another script in my class in order to check the submitted data. So here it

is:

<?

function check_post () {

 foreach ($_POST as $key => $val) {

  $$key = $val;

 }

 $data = Array();

 $data["^[a-z0-9!@#$%^&*]"] = Array ("first_name", "last_name",

 "nick", "pass", "description", "user", "message", "event", "name", "title",

 "text");

 $data['^[-!#$%&\'*+\\./0-9=?A-Z^_`a-z{|}~]+'

.'@'.'[-!#$%&\'*+\\/0-9=?A-Z^_`a-z{|}~]+\.'

.'[-!#$%&\'*+\\./0-9=?A-Z^_`a-z{|}~]+$'] = Array("mail");

 $data["^[0-9]{1,4}"] = Array("day", "month", "owner", "type", "forum_id", "topic_id");

 foreach ($data as $key => $val) {

  foreach ($val as $k => $c) {

   if ( isset($$c) ) {

    if ( !eregi($key, $$c) ) {

     echo "You have used an not allowed character in " .$$c. " field\n";

     exit;

    }

   }

  }

 }

}

?>

I hope it will help someone ^_^

sven.heinicke.org 03-Feb-2005 08:49


The  gethostbyname($regs[2]) == $regs[2] is a bad idea.  Some domain names have a MX record but not an IP number.  The first example I found is jhu.edu domain:



sven@chef:~$ host jhu.edu

jhu.edu A record currently not present

sven@chef:~$ host -t mx jhu.edu

jhu.edu                 MX      10 smtp.johnshopkins.edu

cristian81 AT katamail DOT com 24-Nov-2004 12:12


The same "checkipaddress" function posted by "lueck at gerwan dot de", with the add of an optional jolly char suport. Returns TRUE or FALSE



Results:



check_ip_address('125.32.15.0') is TRUE

check_ip_address('125.32.15.*') is FALSE

check_ip_address('125.32.15.*', '*') is TRUE

check_ip_address('*.*.*.*', '*') is TRUE

check_ip_address('125.-.15.2', '-') is TRUE



<?php

function check_ip_address($checkip, $jolly_char='') {

    if ($jolly_char=='.')        // dot ins't allowed as jolly char

        $jolly_char = '';

    

    if ($jolly_char!='') {

        $checkip = str_replace($jolly_char, '*', $checkip);        // replace the jolly char with an asterisc

        $my_reg_expr = "^[0-9\*]{1,3}\.[0-9\*]{1,3}\.[0-9\*]{1,3}\.[0-9\*]{1,3}$";

        $jolly_char = '*';

    }

    else

        $my_reg_expr = "^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$";

    

    if (eregi($my_reg_expr, $checkip)) {

        for ($i = 1; $i <= 3; $i++) {

            if (!(substr($checkip, 0, strpos($checkip, ".")) >= "0" && substr($checkip, 0, strpos($checkip, ".")) <= "255")) {

                if ($jolly_char!='') {            // if exists, check for the jolly char

                    if (substr($checkip, 0, strpos($checkip, "."))!=$jolly_char)

                        return false;

                }

                else

                    return false;

            }

            

            $checkip = substr($checkip, strpos($checkip, ".") + 1);

        }

        

        if (!($checkip >= "0" && $checkip <= "255")) {        // class D

            if ($jolly_char!='') {            // if exists, check for the jolly char

                if ($checkip!=$jolly_char)

                    return false;

            }

            else

                return false;

        }

    }

    else

        return false;

    

    return true;

}

?>

ian at hyperborea dot co dot uk 10-Nov-2004 12:15


I was looking for something to validate a URL, and as usual turned to 'php.net' as it provides an invaluable source of information. I couldn't find exactly what I wanted so came up with something of my own which I hope will be of use to others. 



<?php

function validateURL($URL) {

    $domain = "([[:alpha:]][-[:alnum:]]*[[:alnum:]])

(\.[[:alpha:]][-[:alnum:]]*[[:alpha:]])+";

    $dir = "(/[[:alpha:]][-[:alnum:]]*[[:alnum:]])*";

    $page = "(/[[:alpha:]][-[:alnum:]]*\.[[:alpha:]]{3,5})?";

    $getstring = "(\?([[:alnum:]][-_%[:alnum:]]*=[-_%[:alnum:]]+)

(&([[:alnum:]][-_%[:alnum:]]*=[-_%[:alnum:]]+))*)?";

    $pattern = "^".$domain.$dir.$page.$getstring."$";

    return eregi($pattern, $URL);

}

?>

Be sure to enter each pattern on a single line.



The function accepts a URL and validates the domain, directory path, page and any GET parameters. You can see I have split the patterns for validating each section for easier reading and maintaining.



Domain Name:

This can be made of of many parts each seperated by a '.' (dot). each part must start with an alpha char. followed by any alphanumeric chars., or '-'. With the exception of the first part each part must end in an alpha char.. 



e.g. www.google.co.uk and www9.g-88gle.co.uk are valid, but 9ww.google.3o.uk and www.google4.co.uk3 are not.



Directory Path:

Each directory must start with a '/' followed by an alpha char., then any alpha numeric char., or '-', and ending in an alphanumeric char.



e.g. /path1/to-page is valid, but /path1-/2-page is not.



Page:

This is like the directory path but it must end with a '.' (dot) followed by 3-5 alpha chars.



e.g. /phpinfo.php and /page.shtml are valid, but /phpinfo.php4 and /pageshtml are not.



Get String:

This follows the page and begins with a '?'. This is then followed by key/value pairs each seperated by an '&'. 



e.g. ?key1=value1&key2=value2



Each key, or value must start and end with an alpahnumeric char. I have allowed for a '%' to be specified for URL encoded chars., and also '_' (underscore) is allowed.



Well thats it hope it is of some use. It is NOT perfect, for instance in the 'getstring' I would like to ensure that encoded characters (i.e. %20) consist of a % symbol followed by two characters with values '0-9A-F' but couldn't get anything I tried to work.

I would welcome any suggestions or improvements, there are bound to be some.

 

Note: The patterns are based on an earlier post by 'fusillo at NOSPAM dot netgang dot it' for matching domain names. Thanks to him.

gabriel at gabrielharrison co uk 01-Nov-2004 01:19


There is a bug in the regex that means it doesn't work with a domain suffex that has two parts eg: .co.uk 

I'm not good enough with regexs to sort this - sorry.



There can also be a problem if the domain doesn't have a domain entry without the www. subdomain. To solve this modify the code with:



elseif( gethostbyname($regs[2]) == $regs[2] || 

gethostbyname('www' . $regs[2]) == 'www' . $regs[2])

chadkeffer at mail dot com 01-Jul-2004 01:42


There's seems to be some confusion on just what is allowed in an email address and thus how to build your regular expression.  RFC 2822 replaced RFC 822 over 3 years ago, so make sure you're referencing the correct RFC.  Also, don't assume all servers comply with the RFC.

gullevek at gullevek dot org 23-Jun-2004 10:58


for the comment from 'joseph dot ragsdale at xconduit dot com'



The regex has one bug thought. It elminates <user>@jp.t.ne.jp addresses which are completly valid as it needs at least 2 chars in the domain part.



This regex fixes it:



^[_a-z0-9-]+(\.[_a-z0-9-]+)*

@[a-z0-9-]+(\.[a-z0-9-]{1,})*\.([a-z]{2,}){1}$



[but the regex in one line]

kidproto at hotmail dot com 20-Apr-2004 08:55


"^[a-zA-Z0-9_\-]+(:?\.svg|\.wmf|\.zip|\.tar\.gz|\.tgz)$"



That is the proper file check for a well formatted beginning and ending filename for those select files. The one in this list is incorrect for at least php 4.3.0 which is what I'm using.



The (:?) makes sure that the parentheses aren't used for storing the matched regex.

fusillo at NOSPAM dot netgang dot it 03-Jan-2004 11:24


This is a simple function that uses eregi() function to validate a domain name (according to the RFC 1034).



function check_host($host) {

if (

   eregi("^[[:alpha:]]+([-[:digit:][:alpha:]]*

          [[:digit:][:alpha:]])*(\.[[:alpha:]]+

          ([-[:digit:][:alpha:]]*[[:digit:][:alpha:]])*)

          *$",$host)

  ) {

       return 1;

    } else return 0;

};



NOTE: This expression must be written on a single line to work

I had to brake it on separate lines to post on the site.



Returns 1 if $host is an RFC compliant domain name else returns 0.



This function is part of the code of blackboxed linux distribution coming on your screen late genuary.



-fusillo-

alexm(_A_T_)factory7.com 11-Dec-2003 09:30


A little interesting and hopefully on topic point I found while browsing the internet at: http://www.clemburg.com/x1713.html#AEN1717



According to RFC 822 (see http://www.rfc-editor.org/), email addresses

are case-sensitive in the part before the "@" sign, except for the

special address "POSTMASTER", which is not case-sensitive. The part

after the "@" sign is a host name, and is not case-sensitive.



Note that many organizations will implement an aliasing facility that

recognizes alternative forms for an email address (e.g., providing

"John.Doe@my.organization.com", "john.doe@my.organization.com",

"doe@my.organization.com", "Doe@my.organization.com" etc.). However,

this is not required by the standard as specified in RFC 822.



so you may want to keep that in mind....

i.e. breakup the username/alias (ereg) and domain (eregi) parts of the email

joseph dot ragsdale at xconduit dot com 06-Nov-2003 11:31


I'm posting this as a - hopefully - helpful example to those who are learning regular expressions. Apologies to the documentation team and moderators for posting yet another email address validation example, but it's hard to argue against it as most everyone will see regular expressions as being a solution to this problem.



<?php

// Example results:



$email[] = 'foo@example.com';       // matched

$email[] = 'foo.bar@example.co.uk'; // matched

$email[] = 'foo_bar@example.com';   // matched

$email[] = '_foo_bar@example.com';  // matched

$email[] = 'foo@example.example';   // matched

$email[] = '_-._@-.--';             // matched



$email[] = '@.';               // not matched

$email[] = '@.com';            // not matched

$email[] = ' @ .com';          // not matched

$email[] = '.bar@example.com'; // not matched

$email[] = 'foo.@example.com'; // not matched

$email[] = 'foo@example.x.y';  // not matched



$regex =

  '^'.

  '[_a-z0-9-]+'.        /* One or more underscore, alphanumeric, 

                           or hyphen charactures. */

  '(\.[_a-z0-9-]+)*'.   /* Followed by zero or more sets consisting 

                           of a period and one or more underscore, 

                           alphanumeric, or hyphen charactures. */

  '@'.                  /* Followed by an "at" characture. */

  '[a-z0-9-]+'.         /* Followed by one or more alphanumeric 

                           or hyphen charactures. */

  '(\.[a-z0-9-]{2,})+'. /* Followed by one or more sets consisting 

                           of a period and two or more alphanumeric 

                           or hyphen charactures. */

  '$';



foreach ($email as $example) {

  if (eregi($regex, $example)) {

    echo $example . ' matched<br>';

  } else {

    echo $example . ' not matched<br>';

  }

}

?>

banerian at u dot washington dot edu 17-Sep-2003 03:55


It should be noted that in the function validateEmail, the final verification relies on the mail server responding to the test $email with a reply code of 250.   Nowadays, many/most/all servers will reply 250/Ok for any  user@thier.domain regardless of whether or not the userid actually exists.  mail servers tend to accept the mail (if it passes other checks), and if the user really does not exist, it just "bounces" the mail back to sender.  thus the validateEmail function can give false positives...an unfortunate victim of having to deal with spam.

Hodfords 06-Sep-2003 11:17


Most of the Ereg functions for emails that I have seen do not

work for emails like :-



joe@something.co..uk



And emails such as 

joe@something.co.uk+1234



This eregi which we put together, I think works, it should be just one line, but php.net did not let us submit, because it said that even after using wordwrap() the line was too long :-



$ereg_string = 

"^[\'+\\./0-9A-Z^_\`a-z{|}~\-]+@

[a-zA-Z0-9_\-]+(\.[a-zA-Z0-9_\-]+){1,3}$";



Adjust the number "3" right at the end to whatever number

you want. 



A long domain name such as "@finance.uk.yahoo.co.uk" which

has 5 parts will require the number to be higher.

tomi at vacilando dot net 26-Jun-2003 08:03


Keran's solution is a good one but I found it does not work for a small number of domains -- because of a bug in gethostbyname() that causes some valid domains not to resolve properly. The workaround is to test such domains with "www." added to the beginning. This script does all that and works perfectly for me. Enjoy!



// Function to validate emails.

function validate_email($email_raw)

    {

    // replace any ' ' and \n in the email

    $email_nr = eregi_replace("\n", "", $email_raw);

    $email = eregi_replace(" +", "", $email_nr);

    $email = strtolower( $email );

    // do the eregi to look for bad characters

    if( !eregi("^[a-z0-9]+([_\\.-][a-z0-9]+)*". "@([a-z0-9]+([\.-][a-z0-9]+))*$",$email) ){

      // okay not a good email

      $feedback = 'Error: "' . $email . '" is not a valid e-mail address!';

      return $feedback;

      } else {

      // okay now check the domain

      // split the email at the @ and check what's left

      $item = explode("@", $email);

      $domain = $item["1"];

      if ( ( gethostbyname($domain) == $domain ) )

        {

          if ( gethostbyname("www." . $domain) == "www." . $domain )

             {

               $feedback = 'Error: "' . $domain . '" is most probably not a valid domain!';

               return $feedback;

             }

        // ?

           $feedback = "valid";

           return $feedback;

        } else {

        $feedback = "valid";

        return $feedback;

        }

      }

    }

steve at brainwashstudios dot com 10-Apr-2003 07:22


why not match some filenames?



if (!ereg("^[a-ZA-Z0-9]+[/.gif|/.jpg|/.png]$",$match){

 echo "Invalid Filename";

}else{

 if (file_exists($match)){

  echo "blah";

 }

}



I suppose that would work off the top of my head but I have yet to test it, so you may want to check it first.

keran at kiwi-interactive dot com 08-Mar-2003 05:21


I couldn't get any of the email validation items above to actually work (maybe I'm thick) :)



So I adapted a couple and came up with this, and it seems to work



// function to validate email

function validate_email($email_raw) {



    // replace any ' ' and \n in the email 

    $email_nr = eregi_replace("\n", "", $email_raw);

    $email = eregi_replace(" +", "", $email_nr);

    

    // do the ergei to look for bad characters

  if( !eregi("^[a-z0-9]+([_\\.-][a-z0-9]+)*".

"@([a-z0-9]+([\.-][a-z0-9]+))*$",$email) ){

    // okay not a good email

    $feedback = "Error: $email isn't a valid mail address!";

    return $feedback;

    } else {

        // okay now check the domain

        // split the email at the @ and check what's left

        $item = explode("@", $email);

        $domain = $item["1"];

        if( gethostbyname($domain) == $domain) {

            $feedback = "Error: $domain isn't a valid domain!";

            return $feedback;

        } else {

            $feedback = "valid";

            return $feedback;

        }

    }



}

slavo at polovnictvo-mn dot sk 20-Feb-2003 02:30


This is what I found (by Jon S. Stevens jon@clearink.com with

Copyright 1998 Jon S. Stevens, Clear Ink)



function validateEmail ($email){

   global $SERVER_NAME;

   $return = array(false,  "" );

   list ($user, $domain) = split( "@", $email, 2);

   $arr = explode( ".", $domain);

   $count = count ($arr);

   $tld = $arr[$count - 2] .  "." . $arr[$count - 1];

   if(checkdnsrr($tld,  "MX")) {

      if(getmxrr($tld, $mxhosts, $weight)) {

         for($i = 0; $i < count($mxhosts); $i++){

               $fp = fsockopen($mxhosts[$i], 25);

            if ($fp){

               $s = 0;

               $c = 0;

               $out =  "";

               set_socket_blocking($fp, false);

               do {

                  $out = fgets($fp, 2500);

                  if(ereg( "^220", $out)){

                           $s = 0;

                     $out =  "";

                     $c++;

                  }

                  else if(($c > 0) && ($out ==  "")){

                     break;

                  }

                  else {

                     $s++;

                  }

                  if($s == 9999) {

                     break;

                  }

               } while($out ==  "");

               set_socket_blocking($fp, true);

               fputs($fp,  "HELO $SERVER_NAME\n");

               $output = fgets ($fp, 2000);

               fputs($fp,  "MAIL FROM: <info@" . $tld .  ">\n" );

               $output = fgets($fp, 2000);

               fputs($fp,  "RCPT TO: <$email>\n");

               $output = fgets($fp, 2000);

               if(ereg(  "^250", $output )) {

                  $return[0] = true;

               }

               else {

                  $return[0] = false;

                  $return[1] = $output;

               }

               fputs ($fp,  "QUIT\n");

               fclose($fp);

               if($return[0] == true){

                  break;

               }

            }

         }

      }

   }

   return $return;

}



----

hope it helps you...

webmaster at textedit dot co dot uk 19-Jan-2003 08:03


I notice a lot of queries involving regular expressions and is_float() regarding currency validation.



If you have a version of php that uses the std C ctype.h functions then it will ALWAYS work with the following code.



function check_price($price)

{

$data=split('[.]',$price);



    if ( count($data) != 2 )

        return "false";



    if ( ctype_digit($data[0]) && ctype_digit($data[1]) && $data[0][0] != '0')

        return "true";

    else

          return "false";

}



Peter Lorimer



http:www.textedit.co.uk

webmaster@textedit.co.uk

X-Istence.com 18-Jan-2003 03:48


To check email, i use the following code:



if (!eregi ("^([a-z0-9_]|\\-|\\.)+@(([a-z0-9_]|\\-)+\\.)+[a-z]{2,4}$", $email) {

echo "Invalid Email Adress";

}

else {

echo "Valid Email Adress";

}

respectthepinguin at yahoo dot com 04-Jan-2003 08:30


validating an email can be pretty tough... however this....

^([[:alnum]]|_|\.|-)+@([[:alnum]]|\.|-)+(\.)([a-z]{2,4})$

makes this task much easier. (:)

BKDotCom at hotmail dot com 30-Sep-2002 08:56


in regards to using gethostbyname() to validate email addresses:

Large flaw:

Doesn't work if the host has a NS & MX record, but no A record.

henrik jensen<hj at this_netwerk dot dk> 27-Jul-2002 07:05


Looking for at regex to check if a file is an image file, this seems to work?



Note: it does not check for illegeal filesystem names - it only looks at the filename extension.



if (eregi ("(.)+\\.(jp(e){0,1}g$|gif$|png$)",$filename)){

    // This is an imagefile

}



[ remove this_ from emailaddress ]

re at lloc dot de 06-Mar-2002 04:36


An additional note to one of these

expressions above: I use



"^[a-z0-9]+([_.-][a-z0-9]+)*@([a-z0-9]

+([.-][a-z0-9]+)*)+\\.[a-z]{2,4}$"

tuxx at tuxx-home dot at 27-Feb-2002 12:11


A small note to one of these expressions above:

Inside a character class ([...]) one does not need to escape the fullstop, therefore



[-\\._] 



would become



[-._]

steve at fish2find dot co dot uk 26-Feb-2002 04:40


hope this helps with some validation problems, simple text validation:            

            //$firstname = "somename"; // valid

    or  //$firstname = "somenam3"; // not valid



            



            //lets validate a field entry & trim off any white spaces

            $firstname = trim($firstname);

            

        //set lenght of the field to a max 12 characters

            $len = "0,12";

            $field = $firstname;

            

            //call function

            if (is_valid($field, $len)){

            

            //if field entry valid then set

            $valid_firstname = $field;

            //set record to yes

            $valid_record = "TRUE";

                  echo "$field is a valid name";

            }

            else{

            //set record to null

            $valid_record = "FALSE";

                  echo " $field is not a valid name";

            }

            //process to see if valid record

            if ($valid_record == "FALSE"){

                  echo " & is not a valid record";

            }

            else{ //assume record is valid

                  echo " & is a valid record";

            }

            

                                          

            //validate field entry function

            

            function is_valid($field, $len) {

            if(eregi("^[[:alpha:]]{{$len}}$", $field)) return                TRUE;

            else return FALSE;

            }

ruben dot no dot spam at artek dot no dot spam dot es 08-Feb-2002 09:28


Well, this can be improved a little. According to the previous, these email addresses would be correct:

  user@domain.e

  user@domain.123

  user@domain-ltd



I suggest this regexp:



if( !eregi( "^" .

            "[a-z0-9]+([_\\.-][a-z0-9]+)*" .    //user

            "@" .

            "([a-z0-9]+([\.-][a-z0-9]+)*)+" .   //domain

            "\\.[a-z]{2,}" .                    //sld, tld 

            "$", $email, $regs)

   )

...

mmc at nospam dot dk 27-Jan-2002 03:56


I'm not sure about "\." being the same as "." above.



Anyway, the mentioned regex would not recognize .museum-names, and generally isn't future safe. 

Also, it doesn't verify that usernames and hostnames cannot start with "-._".



I would recommend a more general (=future safe) expression and then instead check the hostname. An example:



if( !eregi("^[a-z0-9]+([_\\.-][a-z0-9]+)*"

    ."@([a-z0-9]+([\.-][a-z0-9]+))*$",

    $mail, $regs) )

{

    echo "Error: '$mail' isn't a valid mail address!\n";

}

elseif( gethostbyname($regs[2]) == $regs[2] )

{

    echo "Error: Can't find the host '$regs[2]'!<br>\n";

}



Note: I had to split the regex for it to fit this note.

Also note: The reason I'm using gethostbyname() and not getmxrr() or such is that getmxrr() doesn't work on Win2000/XP.