Regular Expression

                                                                                                                         

 

Regular Expression

  • A regular expression is used for string pattern matching.
  • A pattern is a sequence of characters to be searched for in a character string.
  • patterns are normally enclosed in slash characters:

/def/

  • We already have seen a simple example of pattern matching in the library function split.

@array = split(/ /, $line);

  • Here the pattern / / matches a single space, which splits a line into words.

 

Regular Expression Operator :

Function Operator
Match Regular Expression – m//
Substitute Regular Expression – s///
Transliterate Regular Expression tr///

 

Match-Operator Precedence:

  • The match operators have a defined precedence. By definition, the =~ and !~ operators have higher precedence than multiplication and division, and lower precedence than the exponentiation operator **.

 

Match Operators:

  • PERL defines special operators that test whether a particular pattern appears in a character string.
  • The basic method for applying a regular expression is to use the pattern binding operators =~ and !~.

$result = $var =~ /abc/;

  • The result of the =~ operation is one of the following:
    • A nonzero value, or true, if the pattern is found in the string
    • 0, or false, if the pattern is not matched

Example:

#!/usr/local/bin/perl

print (“Ask me a question politely:\n”);

$question = <STDIN>;

if ($question =~ /please/) {

print (“Thank you for being polite!\n”);

}

else {

print (“That was not very polite!\n”);

}

 

Special character in pattern matching:

  • Perl supports a variety of special characters inside patterns.
  • Which enables you to match any of a number of character strings.
  • These special characters are what make patterns useful.

1.  +  Character:

  • The special character + means “one or more of the preceding characters”.

Example :the pattern /ab+c/ matches any of the following:

abc

abbc

abbbc

Example : To count the number of word in a sentence.

#!/usr/bin/perl

$wordcount = 0;

$line = <STDIN>;

while ($line ne “”) {

chop ($line);

@words = split(/ +/, $line);

$wordcount += @words;

$line = <STDIN>;

}

print (“Total words: $wordcount\n”);

 

 

2.  []  Special Characters:

  • The [] special characters enable you to define patterns that match one of a group of alternatives.

Example:

/d[eE]f/

  • pattern matches def or dEf
  • We can combine [] with + to match a sequence of characters of any length.

Example:

/d[eE]+f/

# Can match following patterns:

#deef

#deEf

#deeEEf

 

3.  *  and  ?  Special Characters:

  • Perl also defines two other special characters that match a varying number of characters: * and ?.
  • The * special character matches zero or more occurrences of the preceding character.

Example:

/de*f/

  • matches df, def, deef, and so on.
  • The ? character matches zero or one occurrence of the preceding character.

Example:

/de?f/

  • matches either df or def.
  • it does not match deef, because the ? character does not match two occurrences of a character.

 

 

4. Escape Sequences for Special Characters:

  • If you want your pattern to include a character that is normally treated as a special character, precede the character with a backslash \.

Example:

# To check for one or more occurrences of * in a string , use the following pattern:

/\*+/

# The backslash preceding the * tells the Perl interpreter to treat the * as an ordinary character, not as the special character meaning

 

5. Matching Any Letter or Number:

/a[0-9]c/ # For any numeric value
/[a-z]/ # For any character (lower case)
/[A-Z]/ # For any character (lower case)

6. Metacharacter & Metasymbol:

  • The ^ metacharacter matches the beginning of the string and the $ metasymbol matches the end of the string.
    • /^$/# nothing in the string (start and end are adjacent)
    • /\d+$/ # string that ends with one or more digits

 

Special character in pattern matching Table:

Character Description
. a single character
\s a whitespace character (space, tab, newline)
\S non-whitespace character
\d a digit (0-9)
\D a non-digit
\w a word character (a-z, A-Z, 0-9, _)
\W matches a single character in the given set
* zero or more of the previous thing
+ one or more of the previous thing
? zero or one of the previous thing
{3} matches exactly 3 of the previous thing
{3,} matches 3 or more of the previous thing
{3,6} matches between 3 and 6 of the previous thing

 

Special character in pattern matching

Examples:

Expression Matches
ab?c an a followed by an optional b followed by a c; that is, either abc or ac
a.c an a followed by any single character (not newline) followed by a c
a\.c a.c exactly
[abc] any one of a, b and c
[Aa]bc either of Abc and abc
[abc]+ any (nonempty) string of a’s, ’s and ’s (such as a, abba, acbabcacaa)
[^abc]+ any (nonempty) string which does not contain any of a, b and c (such as defg)
\d\d any two decimal digits, such as 42; same as \d{2}
abc\b abc when followed by a word boundary (e.g. in abc! but not in abcd)
perl\B perl when NOT followed by a word boundary (e.g. in perlert but not in perl stuff)

 

Evaluating a Pattern Only Once:

  • The o option enables you to tell the Perl interpreter that a pattern is to be evaluated only once.

Example:

$var = 1;

$line = <STDIN>;

while ($var < 10) {

$result = $line =~ /$var/o;

$line = <STDIN>;

$var++;

}

  • The first time the Perl interpreter sees the pattern /$var/, it replaces the name $var with the current value of $var, which is 1; this means that the pattern to be matched is /1/.
  • Because the o option is specified, the pattern to be matched remains /1/ even when the value of $var changes. If the o option had not been specified, the pattern would have been /2/ the next time through the loop.

 

Substitution Operator:

  • The substitution operator, s///, is really just an extension of the match operator .
  • That allows you to replace the text matched with some new text.

Syntax:

s/PATTERN/REPLACEMENT/;

Example:

#!/usr/bin/perl

$string = ‘The cat sat on the mat’;

$string =~ s/cat/dog/;

# can replace all occurrences of dog with cat.

print “Final Result is $string\n”;

Result:

Final Result is The dog sat on the mat.

 

Substitution Operator Modifiers:

Modifier Description
i Makes the match case insensitive
m Specifies that if the string has newline or carriage return characters, the ^ and $
operators will now match against a newline boundary, instead of a string
boundary
o Evaluates the expression only once
s Allows use of . to match a newline character
x Allows you to use white space in the expression for clarity
g Globally finds all matches
cg Allows the search to continue even after a global match fails

 

Translation:

  • Perl also provides another way to substitute one group of characters for another: the tr translation operator.

Syntax:

tr/SEARCHLIST/REPLACEMENTLIST/

Example:

#/usr/bin/perl

$string = ‘I was staying in radisson’; $string =~ tr/a/o/;

print “$string\n”;

Result:

I wos stoying in rodisson

 

Translation Operator Modifiers:

Modifier Description
c Complement SEARCHLIST
d Delete found but unreplaced characters
s Squash duplicate replaced characters.

Example:

#!/usr/bin/perl

$string = ‘the cat sat on the mat.’;

$string =~ tr/a-z/b/d;

print “$string\n”;

Result:

b b b

 

Example:

#!/usr/bin/perl

$string = ‘The cat sat on the mat.’;

$string =~ tr/a-z/b/d;

print “$string\n”;

Result:

T b b b

 

 

Example:

#!/usr/bin/perl

$string 1= ‘THE cat sat on the mat.’;

$string1=~tr/a-z/b/c;

$string2=‘the cat sat on the mat’;

$string2=~tr/a-z/b/c;

print “string1:$string1 \n string2:$string2\n”;

Result:

String1:bbb bcatbsatbonbthebmat.

String2:thebcatbsatbonbthebmat.

 

 

Example:

  • The last modifier, /s, removes the duplicate sequences of characters that were replaced,

#!/usr/bin/perl

$string = ‘blood’;

$string1 = ‘food’;

$string2=‘the cat sat on the mat’;

$string1 =~ tr/a-z/a-z/s;

$string2=~tr/a-z/b/s;

print “string1:$string1\n”; print “string2:$string2”;

Result:

string1:blod

String2:b b b b b b

 

Matching Boundaries:

  • The \b matches at any word boundary.
  • The \B assertion matches any position that is not a word boundary.
  • The \w includes the characters for a word.
  • The \W the opposite, this normally means the termination of a word.

Example:

$string = “Cats go Catatonic\n When given Catnip”;

Results with Expression :

/\bcat\b/ # Matches ‘the cat sat’ but not ‘cat on the mat’

/\Bcat\B/ # Matches ‘verification’ but not ‘the cat on the mat’

/\bcat\B/ # Matches ‘catatonic’ but not ‘polecat’

/\Bcat\b/ # Matches ‘polecat’ but not ‘catatonic’

 

Grouping Matching:

  • From a regular-expression point of view, there is no difference between except, perhaps, that the former is slightly clearer.

$string =~ /(\S+)\s+(\S+)/;

OR

$string =~ /\S+\s+\S+/;

  • Benefit of grouping is that it allows us to extract a sequence from a regular expression.
  • When groups are used in substitution expressions, the $x syntax can be used in the replacement text. Thus, we could reformat a date string using this:

 

Example:

#!/usr/bin/perl

$date = ’08/08/1990′;

$date =~ s#(\d+)/(\d+)/(\d+)#$3/$1/$2#;

print “$date”;

Result:

08/08/1990

 

A program that ensures that a string consists of nothing but digits.

#!/usr/local/bin/perl

$string = “The number 1 appears in this string:;

$string =~ tr/0-9//cd;

print (“$string\n”);

Result:

1

 

A simple white space cleanup program.

#!/usr/local/bin/perl

@input = “This      is a line      of input.”

                 “Here is      another     line.”;

$count = 0;

while ($input[$count] ne “”) {

             $input[$count] =~ s/^[ \t]+//;

             $input[$count] =~ s/[ \t]+/ /g;

             $count++;   }

print (“Formatted text:\n”);

print (@input);

Result:

This is a line of input.

Here is another line.

 

                                                                                                                         

 

Leave a Reply

Your email address will not be published. Required fields are marked *