PCRE - Perl Compatible Regular Expressions
PCRE (Perl Compatible Regular Expressions) is very powerful and confusing. It is able to match a string against a mask to get specific data that you want, it also has a replacement mechanism to find a specific section of a string and replace it. When properly used they are relatively simple to understand and fairly easy to use. Given their complexity, of course, they are also much more computationally intensive than the simple search-and-replace functions therefore only use them where a simpler function is not available.
A PCRE is a string that sets some matching rules, Regular expressions come into play when we don’t know the exact string but we are aware of an aspect in the string we want to use.
I will try and keep Regular Expressions short and simple so read PHP Manual for complete knowledge of all its in and outs and additional functions.
About Regular Expressions
PCRE: Delimiters
This is a charictor that starts and ends the regular expression, it must be a character that is not used in the string we are matching against. The most common delimiter is / but I find that | is far less used therefore less escaping is needed.
PCRE: Metacharacters
This is a set of characters that represent data in the string we are going to apply the regular expression to.
The term “metacharacter” is a bit of a misnomer—as a metacharacter can actually be composed of more than one character. However, every metacharacter represents a single character in the matched expression. Here are the most common ones:
. Match any character
ˆ Match the start of the string
$ Match the end of the string
\s Match any whitespace character
\d Match any digit
\w Match any “word” character
Regular Expression: /ab[cd]e/
The above PCRE uses delimiters / in its contents has regular characters it has to match ab and e. It also has in the middle [cd] this means that when matching against a string the character can be C or D. This regular expression will return true if the string in which it is being applied is ABCE or ABDE.
Regular Expression: /ab[c-e\d]/
Similar to above the delimiters are / instead of specifying C or D in this example it uses a range. The characters can be C to E (C,D,E) it is then followed by the metacharicter \d this allows the last character to be any digit (0 to 9). This regular expression will return true if the string in which it is being applied is abc, abd, abe OR ab1, ab2, ab3 and so on.
PCRE: Quantifiers
A quantifier does what it sounds like it outlines the quantity of times a particular character or metacharacter can appear n a matched string:
* The character can appear zero ormore times
+ The character can appear one or more times
? The character can appear zero or one times
{n,m} The character can appear at least n times, and no more than m. Either parameter can be omitted to indicated a minimum limit with nomaximum, or a maximum limit without aminimum, but not both.
Regular Expression: /ab?c/
This will return true with string of AC or ABC
Regular Expression: /ab{1,3}c/
This will return true with string of ABC, ABBC and ABBBC
Regular Expressions In Practace:
Now that you hopefully understand the basics of Regular Expressions lets get into using them:
Matching and Extracting Strings:
PHP has preg_match() function to match regular expression against a given string, it returns true if a match is found, a third perimeter can be used to pass extracted data.
$name = "Davey Shafik"; // Simple match $regex = "/[a-zA-Z\s]/"; if (preg_match($regex, $name)) { // TRUE Valid Name } // Match with subpatterns and capture $regex = ’/^(\w+)\s(\w+)/’; // uses / as delimiters, it wants to match from start of string ^ then checks for any word \w followed by space \s followed by any word \w $matches = array(); if (preg_match ($regex, $name, $matches)) { //regex and name are self explanatory. $matches returns list of all data captured var_dump ($matches); } //Vardump will display: array(3) { [0]=> string(12) "Davey Shafik" [1]=> string(5) "Davey" [2]=> string(6) "Shafik" }
As you can see, the first element of the array contains the entire matched string, while the second element (index 1) contains the first captured subpattern, and the third element contains the second matched subpattern.
Performing Multiple Matches
The preg_match_all() function allows you to perform multiple matches on a given string based on a single regular expression. For example:
$string = "a1bb b2cc c2dd";
$regex = "#([abc])\d#";
$matches = array();
if (preg_match_all ($regex, $string, $matches)) {
var_dump ($matches);
}
$string = "a1bb b2cc c2dd"; $regex = "#([abc])\d#"; $matches = array(); if (preg_match_all ($regex, $string, $matches)) { var_dump ($matches); }
This script outputs the following:
array(2) { [0]=> array(3) { [0]=> string(2) "a1" [1]=> string(2) "b2" [2]=> string(2) "c2" } [1]=> array(3) { [0]=> string(1) "a" [1]=> string(1) "b" [2]=> string(1) "c" } }
As you can see, all the whole-pattern matches are stored in the first sub-array of the result, while the first captured subpattern of every match is stored in the corresponding slot of the second sub-array.
Useing PCRE to Replace Strings
Use str_replace(); where you can as its more efficient. However using preg_replace(), you can replace text that matches a pattern we specify.In the example below, we use this technique to replace the entire matched pattern with a string that is composed using the first captured subpattern ($1).
$body = "[b]Make Me Bold![/b]"; $regex = "@\[b\](.*?)\[/b\]@i"; $replacement = ’<b>$1</b>’; $body = preg_replace($regex, $replacement, $body);
Their we have it, we have coded some BBCode parser. The body is the submitted content, the regex specifies basicly if [b] and [/b] are used grab the bit in the middle, then replace that with the replaced with the real html.
We can also use arrays for mass replace, mass regex and mass replacements as shown in this array full example:
$subjects[’body’] = "[b]Make Me Bold![/b]"; $subjects[’subject’] = "[i]Make Me Italics![/i]"; $regex[] = "@\[b\](.*?)\[/b\]@i"; $regex[] = "@\[i\](.*?)\[/i\]@i"; $replacements[] = "<b>$1</b>"; $replacements[] = "<i>$1</i>"; $results = preg_replace($regex, $replacements, $subjects);
This code will simply replace BBCode with HTML code for output. Great











