Skip to main content

Custom Tags Parsing Using Regular Expressions

In the last post, we had created a simple custom tag parsing script using PHP string functions. In this post, we are going to continue our discussion on custom tag parsing but rather using Regular Expressions. Here we will see how regular expressions can used to parse strings, we will also see where to and where not to use Regular Expressions. Before continuing, I expect that you have a working knowledge of Regular Expressions if not please first check out this websites.

Let us first create the previous custom tag parsing script using expressions:

<form name="form1" method="get" action="">
  <p>
    <!-- textarea should display previously wriiten text -->
    <textarea name="content" cols="35" rows="12" id="content"><? if (isset($_GET['content'])) echo $_GET['content']; ?></textarea>
  </p>
  <p>
    <input name="parse" type="submit" id="parse" value="Parse">
  </p>
</form>
<?

if(isset($_GET['parse']))
{
    
$content $_GET['content'];
    
//convert newlines in the text to HTML "<br />"
    //required to keep formatting (newlines)
    
$content nl2br($content);
    
    
//PHP function 'eregi_replace' replaces all occurences of the expression with the one mentioned
    //'\\1' is the string matched (one in parentheses '()' in the regular expression
    //it's a 'eregi_replace' thing not PHP's

    
$content eregi_replace('\.b\.(.+)\./b\.''<strong>\\1</strong>'$content);
    
$content eregi_replace('\.i\.(.+)\./i\.''<i>\\1</i>'$content);
    
    
//now the variable $content contains HTML formatted text
    //display it
    
echo '<hr />';
    echo 
$content;
}
?>

But should we use regular expressions here, answer is NO, because, first regular expressions run slower and they add a fair bit of complexity where the same thing could have been done easily using just string functions.

The reason for me staring this post with something contradicting to the theme of the post is because people tend to avoid regular expressions thinking that the same thing can be done otherwise (I just gave them one more chance!). Well it may be case sometimes but in many other cases where complex string manipulation is required with efficiency there is but one choice, regular expressions. The next example will illustrate this.

For this example we will parse ‘*’ (asterisk) and ‘_’ (underscore) for bolding and italicizing text (as in Google Talk / IM applications). The following text:

Hello *World*. Hello _World_.

Will be parsed and displayed as:

Hello World. Hello World.

It is quite obvious that both tags’ start and end tags are the same. Now let us see how this can be implemented (using regular expressions).

<form name="form1" method="get" action="">
  <p>
    <!-- textarea should display previously wriiten text -->
    <textarea name="content" cols="35" rows="12" id="content"><? if (isset($_GET['content'])) echo $_GET['content']; ?></textarea>
  </p>
  <p>
    <input name="parse" type="submit" id="parse" value="Parse">
  </p>
</form>
<?

if(isset($_GET['parse']))
{
    
$content $_GET['content'];
    
//convert newlines in the text to HTML "<br />"
    //required to keep formatting (newlines)
    
$content nl2br($content);
    
    
//match anything between the tags but not the tag itself
    //otherwise '*hello* world *hello*'
    //will be print 'hello* world *hello' in bold
    //and not 'hello(in bold) world hello(again in bold)'

    
$content eregi_replace('\*(.[^*]+)\*''<strong>\\1</strong>'$content);
    
$content eregi_replace('\_(.[^_]+)\_''<i>\\1</i>'$content);
    
    
//now the variable $content contains HTML formatted text
    //display it
    
echo '<hr />';
    echo 
$content;
}
?>

If we try to implement this using string functions it will take quite a lot more lines of extra coding but I leave that to you.

Previous Posts:

Popular posts from this blog

Fix For Toshiba Satellite "RTC Battery is Low" Error (with Pictures)

RTC Battery is Low Error on a Toshiba Satellite laptop "RTC Battery is Low..." An error message flashing while you try to boot your laptop is enough to panic many people. But worry not! "RTC Battery" stands for Real-Time Clock battery which almost all laptops and PCs have on their motherboard to power the clock and sometimes to also keep the CMOS settings from getting erased while the system is switched off.  It is not uncommon for these batteries to last for years before requiring a replacement as the clock consumes very less power. And contrary to what some people tell you - they are not rechargeable or getting charged while your computer or laptop is running. In this article, we'll learn everything about RTC batteries and how to fix the error on your Toshiba Satellite laptop. What is an RTC Battery? RTC or CMOS batteries are small coin-shaped lithium batteries with a 3-volts output. Most laptops use

The Best Way(s) to Comment out PHP/HTML Code

PHP supports various styles of comments. Please check the following example: <?php // Single line comment code (); # Single line Comment code2 (); /* Multi Line comment code(); The code inside doesn't run */ // /* This doesn NOT start a multi-line comment block /* Multi line comment block The following line still ends the multi-line comment block //*/ The " # " comment style, though, is rarely used. Do note, in the example, that anything (even a multi-block comment /* ) after a " // " or " # " is a comment, and /* */ around any single-line comment overrides it. This information will come in handy when we learn about some neat tricks next. Comment out PHP Code Blocks Check the following code <?php //* Toggle line if ( 1 ) {      // } else {      // } //*/ //* Toggle line if ( 2 ) {      // } else {      // } //*/ Now see how easy it is to toggle a part of PHP code by just removing or adding a single " / " from th

Introduction to Operator Overloading in C++

a1 = a2 + a3; The above operation is valid, as you know if a1, a2 and a3 are instances of in-built Data Types . But what if those are, say objects of a Class ; is the operation valid? Yes, it is, if you overload the ‘+’ Operator in the class, to which a1, a2 and a3 belong. Operator overloading is used to give special meaning to the commonly used operators (such as +, -, * etc.) with respect to a class. By overloading operators, we can control or define how an operator should operate on data with respect to a class. Operators are overloaded in C++ by creating operator functions either as a member or a s a Friend Function of a class. Since creating member operator functions are easier, we’ll be using that method in this article. As I said operator functions are declared using the following general form: ret-type operator#(arg-list); and then defining it as a normal member function. Here, ret-type is commonly the name of the class itself as the ope