Dec
04
2008
0

PCRE - Perl Compatible Regular Expressions

PCRE (Perl Compatible Regular Expressions) is very powerful and confusing. It is able to match a string against a mask to get specific data that you want, it also has a replacement mechanism to find a specific section of a string and replace it. When properly used they are relatively simple to understand and fairly easy to use. Given their complexity, of course, they are also much more computationally intensive than the simple search-and-replace functions therefore only use them where a simpler function is not available.

A PCRE is a string that sets some matching rules, Regular expressions come into play when we don’t know the exact string but we are aware of an aspect in the string we want to use.

I will try and keep Regular Expressions short and simple so read PHP Manual for complete knowledge of all its in and outs and additional functions.

About Regular Expressions

PCRE: Delimiters

This is a charictor that starts and ends the regular expression, it must be a character that is not used in the string we are matching against. The most common delimiter is / but I find that | is far less used therefore less escaping is needed.

PCRE: Metacharacters

This is a set of characters that represent data in the string we are going to apply the regular expression to.
The term “metacharacter” is a bit of a misnomer—as a metacharacter can actually be composed of more than one character. However, every metacharacter represents a single character in the matched expression. Here are the most common ones:
. Match any character
ˆ Match the start of the string
$ Match the end of the string
\s Match any whitespace character
\d Match any digit
\w Match any “word” character

Regular Expression: /ab[cd]e/
The above PCRE uses delimiters / in its contents has regular characters it has to match ab and e. It also has in the middle [cd] this means that when matching against a string the character can be C or D. This regular expression will return true if the string in which it is being applied is ABCE or ABDE.

Regular Expression: /ab[c-e\d]/
Similar to above the delimiters are / instead of specifying C or D in this example it uses a range. The characters can be C to E (C,D,E) it is then followed by the metacharicter \d this allows the last character to be any digit (0 to 9). This regular expression will return true if the string in which it is being applied is abc, abd, abe OR ab1, ab2, ab3 and so on.

PCRE: Quantifiers

A quantifier does what it sounds like it outlines the quantity of times a particular character or metacharacter can appear n a matched string:
* The character can appear zero ormore times
+ The character can appear one or more times
? The character can appear zero or one times
{n,m} The character can appear at least n times, and no more than m. Either parameter can be omitted to indicated a minimum limit with nomaximum, or a maximum limit without aminimum, but not both.

Regular Expression: /ab?c/
This will return true with string of AC or ABC

Regular Expression: /ab{1,3}c/
This will return true with string of ABC, ABBC and ABBBC

Regular Expressions In Practace:

Now that you hopefully understand the basics of Regular Expressions lets get into using them:

Matching and Extracting Strings:

PHP has preg_match() function to match regular expression against a given string, it returns true if a match is found, a third perimeter can be used to pass extracted data.

$name = "Davey Shafik";
 
// Simple match
$regex = "/[a-zA-Z\s]/";
if (preg_match($regex, $name)) {
	// TRUE Valid Name
}
 
// Match with subpatterns and capture
$regex =/^(\w+)\s(\w+)/’; // uses / as delimiters, it wants to match from start of string ^ then checks for any word \w followed by space \s followed by any word \w
$matches = array();
if (preg_match ($regex, $name, $matches)) { //regex and name are self explanatory. $matches returns list of all data captured
	var_dump ($matches);
}
 
//Vardump will display:
array(3) {
[0]=>
string(12) "Davey Shafik"
[1]=>
string(5) "Davey"
[2]=>
string(6) "Shafik"
}

As you can see, the first element of the array contains the entire matched string, while the second element (index 1) contains the first captured subpattern, and the third element contains the second matched subpattern.

Performing Multiple Matches
The preg_match_all() function allows you to perform multiple matches on a given string based on a single regular expression. For example:

$string = "a1bb b2cc c2dd";
$regex = "#([abc])\d#";
$matches = array();
 
if (preg_match_all ($regex, $string, $matches)) {
	var_dump ($matches);
}

This script outputs the following:

array(2) {
	[0]=>
	array(3) {
		[0]=>
		string(2) "a1"
		[1]=>
		string(2) "b2"
		[2]=>
		string(2) "c2"
	}
	[1]=>
	array(3) {
		[0]=>
		string(1) "a"
		[1]=>
		string(1) "b"
		[2]=>
		string(1) "c"
	}
}

As you can see, all the whole-pattern matches are stored in the first sub-array of the result, while the first captured subpattern of every match is stored in the corresponding slot of the second sub-array.

Useing PCRE to Replace Strings

Use str_replace(); where you can as its more efficient. However using preg_replace(), you can replace text that matches a pattern we specify.In the example below, we use this technique to replace the entire matched pattern with a string that is composed using the first captured subpattern ($1).

$body = "[b]Make Me Bold![/b]";
$regex = "@\[b\](.*?)\[/b\]@i";
$replacement =<b>$1</b>’;
$body = preg_replace($regex, $replacement, $body);

Their we have it, we have coded some BBCode parser. The body is the submitted content, the regex specifies basicly if [b] and [/b] are used grab the bit in the middle, then replace that with the replaced with the real html.

We can also use arrays for mass replace, mass regex and mass replacements as shown in this array full example:

$subjects[’body’] = "[b]Make Me Bold![/b]";
$subjects[’subject’] = "[i]Make Me Italics![/i]";
$regex[] = "@\[b\](.*?)\[/b\]@i";
$regex[] = "@\[i\](.*?)\[/i\]@i";
$replacements[] = "<b>$1</b>";
$replacements[] = "<i>$1</i>";
$results = preg_replace($regex, $replacements, $subjects);

This code will simply replace BBCode with HTML code for output. Great

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • LinkedIn
  • Live
  • StumbleUpon
  • Technorati
  • TwitThis
Written by Adam in: 07. Strings and Patterns |
Dec
04
2008
0

Strings and Patterns: Formatting

Formatting

Their are different methods of formatting strings in PHP some are to handle data types (e.g. Currency) and others are more complex. With currency many countries format it differently, where they put the commas for example if your website is hosted in Europe its likely that that the sysadmn has set local information to Europe. Therefore if US number formatting is required use:

setlocale (LC_MONETARY, ’en_US’);

Their are many more setlocale settings you can change such as Money, Numbers, Time and Error Messages (php.net setlocale). To save us looking up every setting we can use a generic umbrella setlocal setting, In this example we set them all in United Kingdom (Great Britain) format:

setlocale (LC_ALL, 'en_gb');

setlocale affects the entire process inside which it is executed, rather than the
individual script

Formatting: Numbers

We use number formatting to output numbers with formatting to separate its digits into thousands and decimal points. This is done with the number_format(); its not effected by locale (mentioned above), the function can have 1,2 or 4 arguments.
If only one argument is given, the default formatting is used: the number will be rounded to the nearest integer, and a comma will be used to separate thousands.
If two arguments are given, the number will be rounded to the given number of decimal places and a period and comma will be used to separate decimals and thousands.
If four arguments are given, the number of decimal places given, and number_format() will use the first character of the third and fourth arguments as decimal and thousand.

$number = 1000000.119; //The number
$numberofdecimals = 2; //What do you want Quantity of digits after decimal points
$decimalpointseperator = "."; //What do you want to separate the decimal point by?
$thousandseperator = ","; //What do you want to separate the thousands by?
echo number_format($number, $numberofdecimals, $decimalpointseperator, $thousandseperator); //shows 100,0000.11
//-------------------------------------------
echo number_format("100000.698"); // Shows 100,001
echo number_format("100000.698", 3, ",", " "); // Shows 100 000,698

Formatting: Currency

As we mentioned before Currency is When using money_format(), we must specify the formatting rules we want to use by passing the function a specially-crafted string that consists of a percent symbol (%) followed by a set of flags that determine the minimum width of the resulting output, its integer and decimal precision and a conversion character that determines whether the currency value is formatted using the locale’s national or international rules.

setlocale(LC_MONETARY, "en_US");
echo money_format(%.2n’, "100000.698"); // Outputs $100,000.70
//-----
echo money_format(%.2i’, "100000.698"); // Outputs USD 100,000.70
//-------------------
setlocale(LC_MONETARY, "ja_JP.UTF-8");
echo money_format(%.2n’, "100000.698"); // Outputs ¥100,000.70
//-----
echo money_format(%.2i’, "100000.698"); // Output JPY 100,000.70

As you can see formatting is useful, you will notice that both currency rounded the number. Remember that setlocale has many different changes not just the prefix symbol, it also changes rounding. In this example if we don’t specify how many decimals we want:

setlocale(LC_MONETARY, "en_US");
echo money_format(%i’, "100000.698"); //Output USD 100,000.70 (as expected)
setlocale(LC_MONETARY, "ja_JP");
echo money_format(%i’, "100000.698"); //Output JPY 100,001 (japanise round up, they dont use cents)
 
<h1>Formating: General</h1>
As numeracy uses money_format(); and number_format(); regular strings can use <strong>printf();</strong> and related functions (sprintf(); fprintf();)
printf(); function prints data to the scripts output
sprntf(); function returns data to a variable
fprintf(); function saves data to a file
 
The formatting string usually contains a combination of literal text—that is copied directly into the function’s output—and specifiers that determine how the input should be formatted. The specifiers are then used to format each input parameter in the order in which they are passed to the function (thus, the first specifier is used to format the first data parameter, the second specified is used to format the second parameter, and so on).
 
Generally formating always starts with a percent symbol, you can still format showing percent by escaping it (%%). After the percentage symbol t is followed by a specification token which identifies type:
<strong>Sign Specifier (+ or -)</strong> determine how signed numbers are to be rendered
<strong>Padding specifier</strong> that indicates what character should be used to make up the required output length, should the input not be long enough on its own
<strong>alignment specifier</strong> An alignment specifier that indicates if the output should be left or right aligned
<strong>numeric width specifier</strong> A numeric width specifier that indicates theminimumlength of the output
<strong>precision specifier</strong> A precision specifier that indicates how many decimal digits should be displayed
for floating-point numbers
 
Common Type Specifiers:
b Output an integer as a Binary number.
c Output the character which has the input integer as its ASCII value.
d Output a signed decimal number
e Output a number using scientific notation (e.g., 3.8e+9)
u Output an unsigned decimal number
f Output a locale aware float number
F Output a non-locale aware float number
o Output a number using its Octal representation
s Output a string
x Output a number as hexadecimal with lowercase letters
X Output a number as hexadecimal with uppercase letters
 
<pre lang="php">$n = 123;
$f = 123.45;
$s = "A string";
 
printf ("%d", $n); // prints 123
printf ("%d", $f); // prints 1
 
// Prints "The string is A string"
printf ("The string is %s", $s);
 
// Example with precision
printf ("%3.3f", $f); // prints 123.450
 
// Complex formatting
function showError($msg, $line, $file){
	return sprintf("An error occured in %s on "."line %d: %s", $file, $line, $msg);
}
showError ("Invalid deconfibulator", __LINE__, __FILE__);

Parsing Formatted Input

The function sscanf(); is not used for formatting data, it instead is is used to parse formatted input.

The sscanf() family of functions works in a similar way to printf(), except that, instead

$data =123 456 789’;
$format =%d %d %d’;
var_dump (sscanf ($data, $format));

When this code is executed, the function interprets its input according to the rules specified in the format string and returns an array that contains the parsed data:

array(3) {
[0]=>
int(123)
[1]=>
int(456)
[2]=>
int(789)
}

Note that the data must match the format passed to sscanf() exactly—or the function will fail to retrieve all the values. For this reason, sscanf() is normally only useful in those situations in which input follows a well-defined format (that is, it is not provided by the user!).

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • LinkedIn
  • Live
  • StumbleUpon
  • Technorati
  • TwitThis
Written by Adam in: 07. Strings and Patterns |
Dec
02
2008
0

Strings and Patterns: Replacing

Sometimes you need to replace a section of a string wth a different values the functions that do this are:
str_replace();
str_ireplace(); case-insensitive
sbstr_replace();

echo str_replace("World", "Reader", "Hello World"); //Outputs Hello Reader
echo str_ireplace("world", "Reader", "Hello World"); //Outputs Hello Reader (notice its in wrong case)

The functions as you see take three parameters a needle, the replacement string and the haystack. PHP will find the needle in the haystack and substitute every instance with the replacement. Their can be an additional parimiter added to the functions this is a counter of how many replacements their where:

$a = 0; // Initialize
str_replace (’a’, ’b’, ’a1a1a1’, $a);
echo $a; // outputs 3

You can also use arrays with these functions for mass replace like so:

In the first example, the replacements are made based on array indexes—the first element of the search array is replaced by the first element of the replacement array, and the output is “Bonjour Monde”. In the second example, only the needle argument is an array; in this case, both search terms are replaced by the same string resulting in “Bye Bye”.

If you do not know the needle, but do know the character location of what you want to replace you can use substr_replace(); unlike the previous functions the first perimeter is the Haystack, the second is the replacement, the third is the start position and the optional forth is the amount of characters. Here is the function in an examples:

echo substr_replace("Hello World", "Reader", 6); //Outputs Hello Reader, as the 6th character is before the "W" the start of the word World
echo substr_replace("Canned tomatoes are good", "potatoes", 7, 8); //output Canned potatoes are good

As you can see in the first example no fourth perimeter was used, so it replaced the whole end of the string. The second example started at character 7 (the space before tomatoes) and the fourth perimeter was 8 the total length of the word potatoes.

Combining substr_replace() with strpos() can prove to be a powerful tool.

$user = "davey@php.net";
$name = substr_replace($user, "", strpos($user,@);
echo "Hello " . $name;

This will output “Hello davey”
By using strpos() to locate the first occurrence of the @ symbol, we can replace the rest of the e-mail address with an empty string, leaving us with just the username, which we output in greeting.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • LinkedIn
  • Live
  • StumbleUpon
  • Technorati
  • TwitThis
Dec
02
2008
0

Strings and Patterns: Search

PHP Search

PHP has multiple ways for searching from simple to complex (faster and slower)
needle: what we want to find in a string.
haystack: the string itself.

The first two functions are strpos() and strstr(). The former allows us to search for the needle in the haystack, if it finds a result it returns the position of the needle (what character) if it is not found it returns False.

Their can be up to three perimeters n strpos:
strpos($haystack, $needle, $startposition);
The first one is the main string what we want to search in, the second is what we want to find and the third is the starting point in the string to start searching. In the example bellow we will miss the third perimeter therefore start at character index 0:

$haystack = "abcdefg";
$needle = ’abc’;
if (strpos ($haystack, $needle) !== false) {
	echo ’Found’;
}

Remember that in strings the first character has a index of 0, the second of 1. Therefore if the needle is found in the haystack and returns 0 you must treat it as found!

Remember strstr() what just mentioned, it works very similar to strpos() in that is searches for a needle in a haystack. Except that where STRPOS returns a character value or false, strstr returns the found result and any characters after the result is found, as shown in the bellow example:

$haystack =123456’;
$needle =34’;
echo strstr ($haystack, $needle); // outputs 3456

strstr() is slower than strpos() and strstr() can not be given a start point to start searching for the perimeter.

PHP Comparing

Comparasons (as you know) is the most common operation performed on strings. One problem we have is that when converting strings to compare them ther may be some problems, concider the following:

$string = ’123aa’;
if ($string == 123) {
// The string equals 123
}

You and I would expect ths to return False, but since PHP first transparently converts the contents of
$string to the integer 123, thus making the comparison true. The best way to compare the above is to use the type comparison like such ===, this compares the value and the data type. In addition to comparison operators, you can also use the specialized functionsstrcmp() and strcasecmp() to match strings. These are identical, with the exception that the former is case-sensitive, while the latter is not. In both cases, a result of zero indicates that the two strings passed to the function are equal:

$str = "Hello World";
if (strcmp($str, "hello world") === 0) {
// We won’t get here, because of case sensitivity
}
if (strcasecmp($str, "hello world") === 0) {
// We will get here, because strcasecmp()
// is case-insensitive
}
Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • LinkedIn
  • Live
  • StumbleUpon
  • Technorati
  • TwitThis
Written by Adam in: 07. Strings and Patterns | Tags: , ,
Dec
02
2008
1

Strings and Patterns: Extracting

PHP has a function substr(); which allows extracting a substring from a larger string. It can have up to three parimiters:
substr($mainstring, $startpoint, $length);
The Mainstring is the string in which we want to get the contents out of, startpoint is the index number (remember first character is 0, second character is 1) and the length is the amount of characters we want to return leaving this blank will not set a limit on the length.

The starting index can be specified as either a positive
integer (meaning the index of a character in the string starting from the beginning)
or a negative integer (meaning the index of a character starting from the end). Here
are a few simple examples:

$x =1234567’;
echo substr ($x, 0, 3); // outputs 123
echo substr ($x, 1, 1); // outputs 2
echo substr ($x, -2); // outputs 67
echo substr ($x, 1); // outputs 234567
Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • LinkedIn
  • Live
  • StumbleUpon
  • Technorati
  • TwitThis
Written by Adam in: 07. Strings and Patterns | Tags: ,
Nov
28
2008
0

Strings and Patterns: Matching

The PHP 5 Zend Certification exam section Strings and Patterns requires knowledge of Matching. It is presumed that matching isto the use of strspan();

You can use strspan(); to match a strings constrictors against allowed (whitelist) constrictors. The function does not reuturn true or false instead returns the amount of constrictors that match the whitelist like so:

$string = ’133445abcdef’;
$mask =12345’;
echo strspn ($string, $mask); // Outputs 6

Their are third and fourth perimeters that can be passed to the function, this sets out the start and end point like such:

$string = ’1abc234’;
$mask = ’abc’;
echo strspn ($string, $mask, 1, 4);

This will output 3, this is because we skip the first character 1 and continue for up to 4 character. Only the first three character meet the mask, so we output 3.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • LinkedIn
  • Live
  • StumbleUpon
  • Technorati
  • TwitThis
Written by Adam in: 07. Strings and Patterns |
Nov
28
2008
0

Strings and Patterns: Quoting

In the PHP5 Zend Certification has a section under Strings and Patterns called Quoting. The Zend Study Guide does not contain the word Quoting neither does php manual.

Is it referring to Quotation mark, Grave accent and Apostrophe. What do you guys think?

Single Quotes (single quotation mark) are used with simple strings where all characters are used literally.

 echo 'simple string';
echo 'not so simple '.$variable;

Double Quotes escalate complex strings for use with complex characters (such as control characters \x2a) and embedding variables within a string.
Including the use of braces {$variable} within double quotes:

$variable = "variable";
echo "simple string useing double quotes <br>";
echo "double quote useing $variable <br>";
echo "double quote useing multiple {$variable}s <br>";
echo "double quote with \" double quotes \" within it <br>";
echo "double quote with control characters \x2a <br>";
Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • LinkedIn
  • Live
  • StumbleUpon
  • Technorati
  • TwitThis
Written by Adam in: 07. Strings and Patterns |

WordPress Powered, Theme by TheBuckmaker.com | Add to Technorati Favorites. | RSS and Comments RSS