Perl: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Aleksander Stos
mNo edit summary
 
(33 intermediate revisions by 11 users not shown)
Line 1: Line 1:
'''Perl''' is a dynamic, [[Programming language|interpreted programming language]] created by [[Larry Wall]] and was first released in 1987. Wall's original intention was to combine the best features of a variety of other languages, including [[C programming language|C]], [[Shell|Unix shell]] scripting, [[lisp]], [[awk]], [[sed]], and Unix tools such as the [[grep]] family, into a succinct and easy to understand language for system administration.
{{subpages}}
Perl has since evolved into one of the most flexible and powerful scripting languages, with a huge following and professional support.
{{TOC|right}}
The Perl interpreter has been ported to many [[Operating system|operating systems]],
allowing to execute the same Perl code in different environments, usually without any changes.


Perl is widely adopted for its powerful string processing capability, which is mainly owed to Perl's benchmark [[regular expression]] engine [6].  
'''Perl''' is a dynamic, interpreted [[Programming language|programming language]] created by [[Larry Wall]] and first released in 1987. Wall combined features of a variety of other languages, including [[C programming language|C]], [[Shell|Unix shell]] scripting, [[Lisp]], [[awk]], [[sed]], and Unix tools such as the [[grep]] family, into a succinct language for system administration.  
In version 5, Perl introduced a set of important abstractions, which allow the creation, export, and import of objects and methods, leading to a vast public library of well-maintained modules and packages ([[CPAN]]).
Perl evolved into a flexible and powerful scripting language and garnered a substantial following with professional support.
Due to its strengths in manipulating strings and the large amount of publicly available modules, Perl adopts itself easily as a "glue" language between different worlds, for example database access and web programming. Another main use is system administration. Many system scripts of today's Linux distributions are written in Perl, and even some commercial Unix systems install Perl by default.
Perl interpreters now exist for most [[Operating system|operating systems]],  
and programs can usually be moved between different operating systems without needing to be changed.


At the same time Perl strives to be easy, elegant, and fun to use. Perl's motto is "There's more than one way to do it". This is of course true for many programming languages; but while most enforce as much structure as possible, Perl has taken the opposite route: leave as much choice as possible to the programmer and don't require what is not absolutely necessary. As an example, Perl does not enforce prior declaration of variables (but you can choose to do so) or exact data typing. Instead, the interpreter handles variables flexibly as required, and is mostly right about it. Purists may have difficulties with such a concept, but many a C (or C++) programmer never wrote another line in these languages once they migrated to Perl.
One of Perl's advantages is its excellent string processing abilities. Perl's powerful [[regular expression]] engine has become an unofficial benchmark against which other programming languages' engines are measured.  Due to its excellent support for strings and the large amount of publicly available modules, Perl has been widely used as a "glue" language between different kinds of technologies such as database access and web programming.  Many system scripts for Linux distributions are written in Perl, and some commercial Unix systems install Perl by default. Perl is currently in version 5, a mature version which allows the creation, export, and import of objects and methods, and has an extensive public library of well-maintained modules and packages ([[CPAN]]).
 
Perl won many supporters due to its approach of leaving much choice to the programmer and not requiring anything that is not absolutely necessary. As an example, Perl does not require declaration of variable types prior to use. Instead, the interpreter decides the type of a variable based on how it is being used (and is generally quite successful at doing so).  Such loose typing is nowadays called [[Duck typing]] ("if it walks like a duck, and quacks like a duck, it must be a duck"). Perl's motto became "there's more than one way to do it", and this policy had an important influence on the newer [[Ruby (programming language)]].


==Examples==
==Examples==
Line 38: Line 39:
</table>
</table>


<br>The second statement in the second example shows an aspect that is often discouraged for the sake of clarity, the option to write very compact (''terse'') code. In the middle statement, the 'e' in $g is replaced with an 'a'.
The second statement in the second example shows an aspect that is often discouraged for the sake of clarity, the option to write very compact (''terse'') code. In the middle statement, the 'e' in $g is replaced with an 'a'.
<br>In [[usenet]] days it was customary to sign one's posting in a Perl thread by a one-liner that produced the string "Just another Perl hacker," (JAPH), the master of which was [[Randal L. Schwartz]], author of several Perl books. The third line is one of his simpler examples, but already too involved to explain in the context of an introductory article. You can see the analysis [[Just another perl hacker|here]].
Both examples 2 and 3 contain '''Regular Expression''' matches, which are introduced further down.
Real world Perl programs are usually stored in files and passed as parameters to the Perl interpreter. Some mechanisms of this form of invocation depend largely on the [[perl example 4|host's operating system]].


==Syntax Highlights==
In [[usenet]] days it was customary to sign one's posting in a Perl thread by a one-liner that produced the string "Just another Perl hacker," (JAPH), the master of which was [[Randal L. Schwartz]], author of several Perl books. The third line is one of his simpler examples, but already too involved to explain in the context of an introductory article. You can see the analysis [[Just another perl hacker|here]].
It is impossible to give a complete overview of Perl's syntax here. The "Camel" book [1] has over 1000 pages in its 3rd edition. But a few highlights may give some idea of the character of Perl to the interested programmer.
Both examples 2 and 3 contain ''regular expression'' matches, which are introduced further down.
Real world Perl programs are usually stored in files and passed as parameters to the Perl interpreter. Some mechanisms of this form of invocation depend largely on the host's operating system, see a separate example for more details.
 
==Syntax highlights==
It is impossible to give a complete overview of Perl's syntax here. The "Camel" book [1] has over 1000 pages in its 3rd edition. But a few highlights may give some idea of the character of Perl to the interested reader.


===Variables===
===Variables===
====Data Types====
====Data types====
*'''Scalars''' are the fundamental data type. A scalar stores a single, simple value, usually a string or a number, or a reference to another variable. A scalar is prepended by '$', e.g. <code>$var = 1;</code>
*'''Scalars''' are the fundamental data type. A scalar stores a single, simple value, usually a string or a number, or a reference to another variable. A scalar is prepended by '$', e.g. <code>$var = 1;</code>
*'''Arrays''' are ordered '''lists''' of scalars, where each element can be accessed by an index (integer). An array is prepended by a '@', e.g. <code>@list</code>. All indexing in Perl starts with 0, i.e. <code>$list[0]</code> (scalar) is the first element of <code>@list</code>.
*'''Arrays''' are ordered lists of scalars, where each element can be accessed by an index (integer). An array is prepended by a '@', e.g. <code>@list</code>. All indexing in Perl starts with 0, i.e. <code>$list[0]</code> (scalar) is the first element of <code>@list</code>.
*'''Hashes''' are unordered '''sets''' of key/value pairs, where the value (scalar) is accessed using a key (string). A hash is prepended by '%', e.g. <code>%colors</code>. A value is addressed by using the key in braces, e.g to assign a value to <code>%colors</code> for the key 'ball':<br> <code>$colors{'ball'} = 'green';</code>
*'''Hashes''' are unordered sets of key/value pairs, where the value (scalar) is accessed using a key (string). A hash is prepended by '%', e.g. <code>%colors</code>. A value is addressed by using the key in braces, e.g to assign a value to <code>%colors</code> for the key 'ball':<br> <code>$colors{'ball'} = 'green';</code>
*'''Globs''' (or 'typeglobs') are symbol tables.  They associate a reference to another variable with a global name.  A glob is prepended by '*', e.g. <code>*colors</code>.  The most common use of a glob is as a filehandle, such as <code>open *FILE, $filename</code>.  Other uses include aliasing global variables, <code>*colour = \$color</code>.


All variables in Perl are of these three data types. Other data types are abstractions such as filehandles, subroutines, symbol table entries, etc. Perl keeps variables of each type separately, so that it is always clear which value you want to access. Example: <code>@color, $color, %color</code> are different variables, so $color, $color[2] or $color{'ball'} hold different values.
All variables in Perl are of these four data types. Other data types are abstractions such as filehandles, subroutines, symbol table entries, etc. Perl keeps variables of each type separately, so that it is always clear which value you want to access. Example: <code>@color, $color, %color, *color</code> are different variables, so $color, $color[2] or $color{'ball'} hold different values.


====Scope====
====Scope====
Since version 5, the '''scope''' of a variable can be differentiated through the "lexical" new '<code>my</code>' and '<code>our</code>' declarators. Before, only '''global''' and '''local''', declared by the '<code>local</code>' operator, variables existed. A global variable is visible throughout its '''namespace''', while a local variable will overlay an existing global with a temporary value.   
 
But in Perl, a ''local'' variable has some confusing properties; for example, it can be accessed from a subroutine that is called from within its scope. The reason for this is that the global variable is overlayed in symbol tables, which are hidden, but truly global. The new "lexical" '<code>my</code>' declarator does not create symbol table entries, so anything outside its scope (e.g. a subroutine) cannot know its name. A lexical variable is created when declared, Perl keeps track of its usage. It ceases to exist after all references to it have ended, which is  
Perl has two scopes which determine the visibility and accessibility of variables, 'global' and 'lexical'.  In addition there are special modifications of variables within these scopes:  'local', 'our' and 'state'.
[[Perl closure|usually]] <!-- Needs Work! -->
 
the case when the program exits the scope.
 
=====Global=====
 
A 'global' variable is associated with a 'package' (what other languages would call a 'namespace').  It is visible to all parts of the program.  The full name of a global variable contains it's package and name like so:
 
<code>
    $Some::Package::foo = 42;      # a variable named "foo" in the package "Some::Package"
</code>
 
Within it's own namespace, the package part can be dropped.
 
<code>
    package Some::Package;
    $foo = 42;                      # equivalent to $Some::Package::foo = 42
</code>
 
If no package is declared the namespace is 'main'.  Unless declared otherwise, all variables and subroutines are global.  This is unfortunate as best practice is to use lexical variables whenever possible.  File-scoped lexicals can replace most uses of global variables.
 
 
=====Lexical=====
 
Lexical scope is determined by the surrounding block. It is designed to encapsulate variable usage so only the very narrow portion of the code which needs access to that variable can access it.  This makes code much easier to read and understand by reducing the code which could possibly effect a variable down to the block in which it is declared.
 
Any block will do:  the block of a map, grep or sort routine; the braces around an if or else block or while loop. The braces containing a subroutine's code encompass a lexical scope.
 
    sub foo {          # beginning of a lexical scope
        my $var = 42;      # declaring a lexical variable $var
        print $var;        # prints 42
    }                  # end of a lexical scope
   
    print $var;        # $var is now undefined
 
There are exceptions to this rule, mostly for convenience.  For example, the conditional of an 'if' or 'while' is considered to be part of the following block.
 
It is wholly apart from namespaces, a lexical variable has no namespace associated with it.  A lexical scope can contain multiple namespaces and vice-versa.  They also nest, the inner scope can see the outer scope's variables.  For example...
 
    {                              # begin outer scope
        my $outer = "outer";            # declare lexical $outer in the outer scope
       
        {                              # begin inner scope
            my $inner = "inner";            # declare lexical $inner in the inner scope
           
            print $inner;                  # prints "inner"
            print $outer;                  # prints "outer"
        }                              # end inner scope
       
        print $inner;                  # $inner is now undefined
        print $outer;                  # prints "outer"
    }                              # end outer scope
   
    print $outer;                  # $outer is now undefined
 
Ultimately, there is an implicit lexical scope around the entire file.  Any lexicals declared outside any enclosing braces are known as "file-scoped lexicals" and can be seen by the entire file from that point down. File-scoped lexicals often replace globals.
 
Variables declared lexically are automatically cleaned up when the scope is left if there are no more references to that variable.
 
Lexical variables cannot normally be accessed by code outside it's scope, although there are tricks to get at them so lexical variables should not be considered secure.
 
Lexical scope was introduced in Perl 5.  It is best practice to declare your variables as lexical in the narrowest possible scope, rather than at the beginning of a routine as in C.


===Operators===
===Operators===
Line 68: Line 129:
</code>
</code>


In this example a file is opened, the '&lt;' and '&gt;' around the file handle 'FILE' in the second line are '''one''' operator, telling Perl to read the file line by line and assigning each line to successive elements of the array <code>@line</code>. Thus, <code>$line[0]</code> will contain the complete first line of the file, <code>$line[1]</code> the second, etc.
In this example a file is opened, the '&lt;' and '&gt;' around the file handle 'FILE' in the second line are ''one'' operator, telling Perl to read the file line by line and assigning each line to successive elements of the array <code>@line</code>. Thus, <code>$line[0]</code> will contain the complete first line of the file, <code>$line[1]</code> the second, etc.


===Statements and Declarations===
===Statements and Declarations===
'''Statements''' are the parts of a script that are executed during runtime. They can be assignments, built-in functions, calls to subroutines, control structures, etc. Statements may be enclosed in '''blocks''' delimited by "curly" braces: '{' '}'. A block may stand by itself, but usually it is dependent on some controlling expression, such as an "if" statement, a loop, or the <code>eval</code> function. A block defines a new '''scope''' within a surrounding scope.  
Statements are the parts of a script that are executed during runtime. They can be assignments, built-in functions, calls to subroutines, control structures, etc. Statements may be enclosed in '''blocks''' delimited by "curly" braces: '{' '}'. A block may stand by itself, but usually it is dependent on some controlling expression, such as an "if" statement, a loop, or the <code>eval</code> function. A block defines a new '''scope''' within a surrounding scope.  
<br>Unless enclosed in braces, Perl statements must be separated from one another by either a comma or a semicolon. The following statements are syntactically correct:
 
Perl statements are usually terminated by a semicolon, though there are other implicit means in which Perl knows a statement ends, such as the closing of a block.
 
The following statements are syntactically correct:
  <code>
  <code>
  { $a = 1 }      # legal because closing brace follows
  { $a = 1 }      # legal because closing brace follows
{ $a = 1; }      # better, "good practise"
  $a = 1, $b = 2;
  $a = 1, $b = 2;
  $a = 1; $b = 2;
  $a = 1; $b = 2;
Line 81: Line 144:
'''Declarations''' are syntactically similar to statements but are evaluated during compile time, i.e. when the interpreter reads the script and creates its internal representation.
'''Declarations''' are syntactically similar to statements but are evaluated during compile time, i.e. when the interpreter reads the script and creates its internal representation.


Unlike many programming languages, Perl does not require a variable to be declared. It will "spring into existence" (as a global variable) the first time it is used in a statement at runtime. But you can enforce the explicit declaration of variables by the "<code>use strict</code>" '''pragma''' (a compiler directive); any undeclared variable (in the scope of the ''pragma'') will then produce a compile time error:
Unlike many programming languages, Perl does not require a variable to be declared. It will "spring into existence" (as a global variable) the first time it is used in a statement at runtime. But you can enforce the explicit declaration of variables by the "<code>use strict</code>" '''pragma''' (a compiler directive); any undeclared variable (in the scope of the pragma) will then produce a compile time error:
  <code>
  <code>
  ## preventing the accidental invocation of a variable is considered "good practise"
  ## preventing the accidental invocation of a variable is considered "good practise"
Line 93: Line 156:
  }
  }
  </code>
  </code>
A global variable must now be declared by using its full package name explicitly. But the main beneficiary is the "my" operator, and thus cleaner programming. By the gentle force of lazyness a great number of local variables get declared instead of sloppily created globals!
A global variable must now be declared by using its full package name explicitly. But the main beneficiary is the "my" operator, and thus cleaner programming. By the gentle force of laziness a great number of local variables get declared instead of sloppily created globals!


===Subroutines===
===Subroutines===
'''Subroutines''' in Perl are declared by the reserved word 'sub', an (optional) name, and a body enclosed by braces. A named subroutine is global in its namespace. Unnamed ("anonymous") subroutines can obviously not be called elsewhere, but they have benefits for special and rather complicated purposes.
Subroutines in Perl are declared by the reserved word 'sub', an (optional) name, and a body enclosed by braces. A named subroutine is global in its namespace. Unnamed ("anonymous") subroutines can obviously not be called elsewhere, but they have benefits for special and rather complicated purposes.
The calling statement may pass parameters to the subroutine and will receive a return value. If the subroutine was declared before its first use, it can be called by its name, just like a builtin function, otherwise it has to be called with a prepended '&'. Examples:
The calling statement may pass parameters to the subroutine and will receive a return value. If the subroutine was declared before its first use, it can be called by its name, just like a builtin function, otherwise it has to be called with a prepended '&'. Examples:
  <code>
  <code>
Line 112: Line 175:
Perl allows to declare details of function arguments in subroutine prototypes, e.g. whether the subroutine expects a scalar or a reference. But the name seems to be badly chosen because they don't work as compile-time type checking of function arguments, as programmers may expect<ref name=perl_sins>raised by Tom Christiansen, discussed here: [http://www.perl.com/pub/a/1999/11/sins.html Perl's Sins])</ref>.
Perl allows to declare details of function arguments in subroutine prototypes, e.g. whether the subroutine expects a scalar or a reference. But the name seems to be badly chosen because they don't work as compile-time type checking of function arguments, as programmers may expect<ref name=perl_sins>raised by Tom Christiansen, discussed here: [http://www.perl.com/pub/a/1999/11/sins.html Perl's Sins])</ref>.


===Regular Expressions===
===Regular expressions===
'''Regular Expressions''' are a well-defined syntax by which a specialized program (the so-called '''Regular Expression engine''') within Perl
Regular expressions are a well-defined syntax by which a specialized program (the so-called ''regular expression engine'') within Perl is directed to process text strings and produce certain side effects based on the findings.  
is directed to process text strings and produce certain side effects based on the findings.  
This process is usually called ''pattern matching''.  Regular Expression engines were built into many of Unix' standard programs from the beginning, but especially in the early days they differed slightly in their features, or sometimes even not so slightly.  
This process is usually called "pattern matching".  Regular Expression engines were built into many of Unix' standard programs  
 
from the beginning, but especially in the early days they differed slightly in their features, or sometimes even not so slightly.  
When Larry Wall created Perl, he unified these flavors and added several "shorthand" patterns that made it easier to define and apply them. In later versions it became possible  
<br>
When Larry Wall created Perl, he unified these flavors and added several "shorthand" patterns that made  
it easier to define and apply them. In later versions it became possible  
to comment patterns inline, to choose "non-greedy" match strategies,
to comment patterns inline, to choose "non-greedy" match strategies,
and [[Unicode]] and [[Posix]] were integrated into the engine. Perl itself (as opposed to the Regular Expression engine) provides several  
and [[Unicode]] and [[Posix]] were integrated into the engine. Perl itself (as opposed to the Regular Expression engine) provides several  
standard variables which will hold certain parts of the latest pattern match, such as <code>$&</code> for the match itself,  
standard variables which will hold certain parts of the latest pattern match, such as <code>$&</code> for the match itself,  
<code>$`</code> for the part ("left" of) before the match, <code>$'</code> the part after it, <code>$1</code>, <code>$2</code>, ... for cached parts of a match, etc.  
<code>$`</code> for the part ("left" of) before the match, <code>$'</code> the part after it, <code>$1</code>, <code>$2</code>, ... for cached parts of a match, etc.  
These, together with some [[Perl enhancements for readability| enhancements]] for readability, make it very easy and efficient to manipulate strings.
These, together with some [[Perl/Enhancements for readability| enhancements]] for readability, make it very easy and efficient to manipulate strings.
As a result, for many years Perl's Regular Expression engine was considered the most  
As a result, for many years Perl's Regular Expression engine was considered the most  
advanced to be found in any programming language.
advanced to be found in any programming language.


In Perl, the application of Regular Expressions can be very casual.
In Perl, the application of regular expressions can be very casual.
The match operator '<code>=~</code>' also allows to match a variable "in place", i.e. it can be used as an assignment operator.  
The match operator '<code>=~</code>' also allows to match a variable "in place", i.e. it can be used as an assignment operator.  
In the above code snippet "<code>$g =~ s/e/a/;</code>",
In the above code snippet "<code>$g =~ s/e/a/;</code>",
Line 145: Line 205:
For those interested in the use of Regular Expressions, [6] is highly recommended.
For those interested in the use of Regular Expressions, [6] is highly recommended.


===Namespace, Scope===
===Namespace and scope===
By declaring a '''package''', a separate symbol table is created for all globals (variables, subroutines, etc.). This '''namespace''' continues until a new package is declared or the file ends. All symbols which are created outside of an explicit package automatically belong to default namespace 'main'. A variable or subroutine of a different namespace can be addressed by the '<code><namespace>::</code>' qualifier.
By declaring a '''package''', a separate symbol table is created for all globals (variables, subroutines, etc.). This '''namespace''' continues until a new package is declared or the file ends. All symbols which are created outside of an explicit package automatically belong to default namespace 'main'. A variable or subroutine of a different namespace can be addressed by the '<code><namespace>::</code>' qualifier.
A '''scope''' is delimited by curly braces. Inside a scope a symbol from outside of this scope may be overlayed by "local" or lexical variables. As soon as the program leaves this scope, the previous scope is valid again.
A '''scope''' is delimited by curly braces. Inside a scope a symbol from outside of this scope may be overlaid by "local" or lexical variables. As soon as the program leaves this scope, the previous scope is valid again.
<br>An example:
<br>An example:
<code>
<code>
Line 177: Line 237:
</code>
</code>


===Modules, Objects===
===Modules, objects===
Although the ''package'' declaration itself goes back to the earlier days of Perl, many of the features introduced in version 5 gave it a whole new significance. It had become much easier to isolate parts of a program from one another and even assign them '''object''' character by "blessing" them into the package, automatically changing the package into a '''class'''. All subroutines inside this package become "methods", even their call behavior changes significantly. Of course, purists are not entirely happy with the way Perl has implemented some of the classic features, but looking at the large number of modules implementing Perl's flavor of object oriented programming, it can't be so wrong.
Although the ''package'' declaration itself goes back to the earlier days of Perl, many of the features introduced in version 5 gave it a whole new significance. It had become much easier to isolate parts of a program from one another and even assign them '''object''' character by "blessing" them into the package, automatically changing the package into a '''class'''. All subroutines inside this package become "methods", even their call behavior changes significantly. Of course, purists are not entirely happy with the way Perl has implemented some of the classic features, but looking at the large number of modules implementing Perl's flavor of object oriented programming, it can't be so wrong.


<!-- this part is more or less a stump, I'd be more than happy if someone works on this -->
<!-- this part is more or less a stump, I'd be more than happy if someone works on this -->
==External links==
*[http://www.perl.org/ www.Perl.org] - Perl Foundation's website
*[http://www.cpan.org/ www.CPAN.org] - Public Module repository
*[http://www.perlmonks.org/ www.perlmonks.org] - Forums and Solutions
*[http://www.perl.com/ www.perl.com] - O'Reilly's Perl website
*[http://perl.oreilly.com/ perl.oreilly.com] - publishing house for the "Standard Works" of Perl
*[http://www.wall.org/~larry/perl.html Larry Wall's web page]
==Literature==
The literature on Perl is numerous and growing. This is the list that Larry Wall recommends on his web page:
*[1] Larry Wall, Tom Christiansen, Jon Orwant: ''Programming Perl'' - (the Camel Book). O'Reilly Media, Inc.; 3 edition (July 14, 2000). ISBN 0596000278. The standard reference.
*[2] Randal L. Schwartz, Tom Phoenix, brian d foy: ''Learning Perl'' - (the Llama Book). ISBN 1565922840. First trainig.
*[3] Tom Christiansen, Nathan Torkington: ''Perl Cookbook'' - (the Ram Book). O'Reilly Media, Inc.; 2 edition (August 21, 2003). ISBN 0596003137. Perl solutions for standard problems, example applications of Perl features.
*[4] Randal L. Schwartz, Erik Olson, Tom Christiansen: ''Learning Perl on Win32 Systems'' - (the Gecko Book). O'Reilly Media, Inc.; 1 edition (August 1, 1997). ISBN 1565923243.
*[5] Simon Cozens: ''Advanced Perl Programming'' - (the Panther Book). O'Reilly Media, Inc.; 2 edition (June 1, 2005). ISBN 0596004567. High-level Perl tutorial.
*[6] Jeffrey E. F. Friedl: ''Mastering Regular Expressions'' - (the Owls Book). O'Reilly Media, Inc.; 3 edition (August 8, 2006). ISBN 0596528124. All you ever need to know about Regular Expressions, not Perl specific
Also:
* Johnson, Andrew L.  ''Elements of Programming with Perl.''  Greenwich, CT: Manning Publications, 2000.  ISBN 1884777805.  An excellent Perl text that doubles as an introduction to computer programming--not that programmers widely recommend Perl as a first programming language to study.
* Conway, Damian  ''Object Oriented Perl''. Manning Publications, 2000. ISBN 1884777791. A Comprehensive Guide to Concepts and Programming Techniques.


==Notes and references==
==Notes and references==
<references/>
<references/>[[Category:Suggestion Bot Tag]]
 
 
[[Category:CZ Live]]
[[Category:Computers Workgroup]]

Latest revision as of 16:01, 2 October 2024

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Code [?]
Addendum [?]
 
This editable Main Article is under development and subject to a disclaimer.

Perl is a dynamic, interpreted programming language created by Larry Wall and first released in 1987. Wall combined features of a variety of other languages, including C, Unix shell scripting, Lisp, awk, sed, and Unix tools such as the grep family, into a succinct language for system administration. Perl evolved into a flexible and powerful scripting language and garnered a substantial following with professional support. Perl interpreters now exist for most operating systems, and programs can usually be moved between different operating systems without needing to be changed.

One of Perl's advantages is its excellent string processing abilities. Perl's powerful regular expression engine has become an unofficial benchmark against which other programming languages' engines are measured. Due to its excellent support for strings and the large amount of publicly available modules, Perl has been widely used as a "glue" language between different kinds of technologies such as database access and web programming. Many system scripts for Linux distributions are written in Perl, and some commercial Unix systems install Perl by default. Perl is currently in version 5, a mature version which allows the creation, export, and import of objects and methods, and has an extensive public library of well-maintained modules and packages (CPAN).

Perl won many supporters due to its approach of leaving much choice to the programmer and not requiring anything that is not absolutely necessary. As an example, Perl does not require declaration of variable types prior to use. Instead, the interpreter decides the type of a variable based on how it is being used (and is generally quite successful at doing so). Such loose typing is nowadays called Duck typing ("if it walks like a duck, and quacks like a duck, it must be a duck"). Perl's motto became "there's more than one way to do it", and this policy had an important influence on the newer Ruby (programming language).

Examples

For short programs, a Perl script can be invoked directly from the command line, using the '-e' option:

Code

Result

$ perl -e 'print "Hello, world!\n"' Hello, world
$ perl -e '$g='Hello'; $g=~s/e/a/; print "$g, world!\n"' Hallo, world
$ perl -e 'print grep(s/^\d+(.*)/$1 /, sort(split(/ /,"8hacker, 4Perl 1Just 2another")));' Just another Perl hacker,

The second statement in the second example shows an aspect that is often discouraged for the sake of clarity, the option to write very compact (terse) code. In the middle statement, the 'e' in $g is replaced with an 'a'.

In usenet days it was customary to sign one's posting in a Perl thread by a one-liner that produced the string "Just another Perl hacker," (JAPH), the master of which was Randal L. Schwartz, author of several Perl books. The third line is one of his simpler examples, but already too involved to explain in the context of an introductory article. You can see the analysis here. Both examples 2 and 3 contain regular expression matches, which are introduced further down. Real world Perl programs are usually stored in files and passed as parameters to the Perl interpreter. Some mechanisms of this form of invocation depend largely on the host's operating system, see a separate example for more details.

Syntax highlights

It is impossible to give a complete overview of Perl's syntax here. The "Camel" book [1] has over 1000 pages in its 3rd edition. But a few highlights may give some idea of the character of Perl to the interested reader.

Variables

Data types

  • Scalars are the fundamental data type. A scalar stores a single, simple value, usually a string or a number, or a reference to another variable. A scalar is prepended by '$', e.g. $var = 1;
  • Arrays are ordered lists of scalars, where each element can be accessed by an index (integer). An array is prepended by a '@', e.g. @list. All indexing in Perl starts with 0, i.e. $list[0] (scalar) is the first element of @list.
  • Hashes are unordered sets of key/value pairs, where the value (scalar) is accessed using a key (string). A hash is prepended by '%', e.g. %colors. A value is addressed by using the key in braces, e.g to assign a value to %colors for the key 'ball':
    $colors{'ball'} = 'green';
  • Globs (or 'typeglobs') are symbol tables. They associate a reference to another variable with a global name. A glob is prepended by '*', e.g. *colors. The most common use of a glob is as a filehandle, such as open *FILE, $filename. Other uses include aliasing global variables, *colour = \$color.

All variables in Perl are of these four data types. Other data types are abstractions such as filehandles, subroutines, symbol table entries, etc. Perl keeps variables of each type separately, so that it is always clear which value you want to access. Example: @color, $color, %color, *color are different variables, so $color, $color[2] or $color{'ball'} hold different values.

Scope

Perl has two scopes which determine the visibility and accessibility of variables, 'global' and 'lexical'. In addition there are special modifications of variables within these scopes: 'local', 'our' and 'state'.


Global

A 'global' variable is associated with a 'package' (what other languages would call a 'namespace'). It is visible to all parts of the program. The full name of a global variable contains it's package and name like so:

   $Some::Package::foo = 42;       # a variable named "foo" in the package "Some::Package"

Within it's own namespace, the package part can be dropped.

   package Some::Package;
   $foo = 42;                      # equivalent to $Some::Package::foo = 42

If no package is declared the namespace is 'main'. Unless declared otherwise, all variables and subroutines are global. This is unfortunate as best practice is to use lexical variables whenever possible. File-scoped lexicals can replace most uses of global variables.


Lexical

Lexical scope is determined by the surrounding block. It is designed to encapsulate variable usage so only the very narrow portion of the code which needs access to that variable can access it. This makes code much easier to read and understand by reducing the code which could possibly effect a variable down to the block in which it is declared.

Any block will do: the block of a map, grep or sort routine; the braces around an if or else block or while loop. The braces containing a subroutine's code encompass a lexical scope.

   sub foo {           # beginning of a lexical scope
       my $var = 42;       # declaring a lexical variable $var
       print $var;         # prints 42
   }                   # end of a lexical scope
   
   print $var;         # $var is now undefined

There are exceptions to this rule, mostly for convenience. For example, the conditional of an 'if' or 'while' is considered to be part of the following block.

It is wholly apart from namespaces, a lexical variable has no namespace associated with it. A lexical scope can contain multiple namespaces and vice-versa. They also nest, the inner scope can see the outer scope's variables. For example...

   {                               # begin outer scope
       my $outer = "outer";            # declare lexical $outer in the outer scope
       
       {                               # begin inner scope
           my $inner = "inner";            # declare lexical $inner in the inner scope
           
           print $inner;                   # prints "inner"
           print $outer;                   # prints "outer"
       }                               # end inner scope
       
       print $inner;                   # $inner is now undefined
       print $outer;                   # prints "outer"
   }                               # end outer scope
   
   print $outer;                   # $outer is now undefined

Ultimately, there is an implicit lexical scope around the entire file. Any lexicals declared outside any enclosing braces are known as "file-scoped lexicals" and can be seen by the entire file from that point down. File-scoped lexicals often replace globals.

Variables declared lexically are automatically cleaned up when the scope is left if there are no more references to that variable.

Lexical variables cannot normally be accessed by code outside it's scope, although there are tricks to get at them so lexical variables should not be considered secure.

Lexical scope was introduced in Perl 5. It is best practice to declare your variables as lexical in the narrowest possible scope, rather than at the beginning of a routine as in C.

Operators

The list of operators is too numerous to fully reproduce here. Besides the common assignment operator '=' and the arithmetic operators "+ - / *", Perl has a large amount of complex operators, some of which may only work in conjunction with certain data types or constructs. As an example this piece of code:

open FILE, "/etc/passwd";  # open file
@line=<FILE>;              # read the complete file in one gulp
close FILE;                # finished

In this example a file is opened, the '<' and '>' around the file handle 'FILE' in the second line are one operator, telling Perl to read the file line by line and assigning each line to successive elements of the array @line. Thus, $line[0] will contain the complete first line of the file, $line[1] the second, etc.

Statements and Declarations

Statements are the parts of a script that are executed during runtime. They can be assignments, built-in functions, calls to subroutines, control structures, etc. Statements may be enclosed in blocks delimited by "curly" braces: '{' '}'. A block may stand by itself, but usually it is dependent on some controlling expression, such as an "if" statement, a loop, or the eval function. A block defines a new scope within a surrounding scope.

Perl statements are usually terminated by a semicolon, though there are other implicit means in which Perl knows a statement ends, such as the closing of a block.

The following statements are syntactically correct:


{ $a = 1 }       # legal because closing brace follows
$a = 1, $b = 2;
$a = 1; $b = 2;

Declarations are syntactically similar to statements but are evaluated during compile time, i.e. when the interpreter reads the script and creates its internal representation.

Unlike many programming languages, Perl does not require a variable to be declared. It will "spring into existence" (as a global variable) the first time it is used in a statement at runtime. But you can enforce the explicit declaration of variables by the "use strict" pragma (a compiler directive); any undeclared variable (in the scope of the pragma) will then produce a compile time error:


## preventing the accidental invocation of a variable is considered "good practise"

$server = 'Citizendium';  ## ok in non-strict surrounding scope
{                         ## start of new scope
  use strict 'vars';      ## pragma
  $page = 'Perl';         ## this will generate an error
  $main::page = 'Perl';   ## this is a legal global variable
  my $var = 1;            ## 'my' declaration makes it legal
}

A global variable must now be declared by using its full package name explicitly. But the main beneficiary is the "my" operator, and thus cleaner programming. By the gentle force of laziness a great number of local variables get declared instead of sloppily created globals!

Subroutines

Subroutines in Perl are declared by the reserved word 'sub', an (optional) name, and a body enclosed by braces. A named subroutine is global in its namespace. Unnamed ("anonymous") subroutines can obviously not be called elsewhere, but they have benefits for special and rather complicated purposes. The calling statement may pass parameters to the subroutine and will receive a return value. If the subroutine was declared before its first use, it can be called by its name, just like a builtin function, otherwise it has to be called with a prepended '&'. Examples:


sub tst1 { print $_,"\n"; }  ## subroutine declared before the call

tst1('tst1');                ## output 'tst1'
tst2('tst2');                ## error, not declared
&tst2('tst2');               ## output 'tst2'

my $var = sub {'hello'};     ## anonymous subroutine

sub tst2 { print $_,"\n"; }  ## subroutine declared after the call

Perl allows to declare details of function arguments in subroutine prototypes, e.g. whether the subroutine expects a scalar or a reference. But the name seems to be badly chosen because they don't work as compile-time type checking of function arguments, as programmers may expect[1].

Regular expressions

Regular expressions are a well-defined syntax by which a specialized program (the so-called regular expression engine) within Perl is directed to process text strings and produce certain side effects based on the findings. This process is usually called pattern matching. Regular Expression engines were built into many of Unix' standard programs from the beginning, but especially in the early days they differed slightly in their features, or sometimes even not so slightly.

When Larry Wall created Perl, he unified these flavors and added several "shorthand" patterns that made it easier to define and apply them. In later versions it became possible to comment patterns inline, to choose "non-greedy" match strategies, and Unicode and Posix were integrated into the engine. Perl itself (as opposed to the Regular Expression engine) provides several standard variables which will hold certain parts of the latest pattern match, such as $& for the match itself, $` for the part ("left" of) before the match, $' the part after it, $1, $2, ... for cached parts of a match, etc. These, together with some enhancements for readability, make it very easy and efficient to manipulate strings. As a result, for many years Perl's Regular Expression engine was considered the most advanced to be found in any programming language.

In Perl, the application of regular expressions can be very casual. The match operator '=~' also allows to match a variable "in place", i.e. it can be used as an assignment operator. In the above code snippet "$g =~ s/e/a/;", the contents of variable $g gets substituted (the 's' operator) by the result of the expression on the right hand side, applied on its original value. If $g contains 'geek', the first 'e' will be replaced by an 'a'. If you want to replace all 'e', the statement must be written as $g =~ s/e/a/g; (appended 'g' for "globally"). Here is a more explicit example, for editing convenience it is called as a script file 'tst':

# file 'tst'
$string  = 'Hello, world';
$pattern = '[ae]';           ## [] defines a class of characters, 'e' OR 'a' will match
$string  =~ m/$pattern/;     ## 'm' for 'match only', nothing is assigned to $string
print "match: \'$&\' before: \'$`\' after: \'$'\' \n";

Run as

$ perl tst

it will produce

match: 'e' before: 'H' after: 'llo, world'

For those interested in the use of Regular Expressions, [6] is highly recommended.

Namespace and scope

By declaring a package, a separate symbol table is created for all globals (variables, subroutines, etc.). This namespace continues until a new package is declared or the file ends. All symbols which are created outside of an explicit package automatically belong to default namespace 'main'. A variable or subroutine of a different namespace can be addressed by the '<namespace>::' qualifier. A scope is delimited by curly braces. Inside a scope a symbol from outside of this scope may be overlaid by "local" or lexical variables. As soon as the program leaves this scope, the previous scope is valid again.
An example:

## program tst
$var = 'global';                      ## global: 'main' assumend
{
  package test;
  $var         = 'package global';
  local $local = 'local';
  {                            ## new scope
    my $var = 'private var';   ## will not exist outside
    print "scope: $var \n";
    {
      print "subscope: $var local: $local \n";
    }
  }
  print "test: $test::var main: $main::var var: $var local: $local \n";
}
print "test: $test::var main: $main::var var: $var local: $local \n";

Running this:

$ perl tst

Produces:

scope: private var
subscope: private var local: local
test: package global main: global var: package global local: local
test: package global main: global var: global local:

Modules, objects

Although the package declaration itself goes back to the earlier days of Perl, many of the features introduced in version 5 gave it a whole new significance. It had become much easier to isolate parts of a program from one another and even assign them object character by "blessing" them into the package, automatically changing the package into a class. All subroutines inside this package become "methods", even their call behavior changes significantly. Of course, purists are not entirely happy with the way Perl has implemented some of the classic features, but looking at the large number of modules implementing Perl's flavor of object oriented programming, it can't be so wrong.


Notes and references

  1. raised by Tom Christiansen, discussed here: Perl's Sins)