Chapter 1
Next: 1.2 Natural and Artificial
Languages
- An Overview of Perl
Contents:
Getting Started
Natural and Artificial Languages
A Grade Example
Filehandles
Operators
Control Structures
Regular Expressions
List Processing
What You Don’t Know Won’t Hurt You (Much)
1.1 Getting Started
We think that Perl is an easy language to learn and use, and we hope to
convince you that we’re right.
One thing that’s easy about Perl is that you don’t have to say much before you
say what you want to say.
In many programming languages, you have to declare the types, variables, and
subroutines you are going
to use before you can write the first statement of executable code. And for
complex problems demanding
complex data structures, this is a good idea. But for many simple, everyday
problems, you would like a
programming language in which you can simply say:
print “Howdy, world!\n”;
and expect the program to do just that.
Perl is such a language. In fact, the example is a complete program,[1] and if
you feed it to the Perl
interpreter, it will print “Howdy, world!” on your screen.
[1] Or script, or application, or executable, or doohickey. Whatever.
And that’s that. You don’t have to say much after you say what you want to say,
either. Unlike many
languages, Perl thinks that falling off the end of your program is just a
normal way to exit the program.
You certainly may call the exit function explicitly if you wish, just as you
may declare some of your
variables and subroutines, or even force yourself to declare all your variables
and subroutines. But it’s
your choice. With Perl you’re free to do The Right Thing, however you care to
define it.
There are many other reasons why Perl is easy to use, but it would be pointless
to list them all here,
because that’s what the rest of the book is for. The devil may be in the
details, as they say, but Perl tries
to help you out down there in the hot place too. At every level, Perl is
about helping you get from here to
there with minimum fuss and maximum enjoyment. That’s why so many Perl
programmers go around
with a silly grin on their face.
This chapter is an overview of Perl, so we’re not trying to present Perl to the
rational side of your brain.
Nor are we trying to be complete, or logical. That’s what the next chapter is
for.[2] This chapter presents
Perl to the other side of your brain, whether you prefer to call it
associative, artistic, passionate, or
merely spongy. To that end, we’ll be presenting various views of Perl that will
hopefully give you as
clear a picture of Perl as the blind men had of the elephant. Well,
okay, maybe we can do better than that.
We’re dealing with a camel here. Hopefully, at least one of these views of Perl
will help get you over the
hump.
to Chapter
2, The Gory Details, for maximum information density. If, on the other hand,
you’re looking
for a carefully paced tutorial, you should probably get Randal’s nice book,
Learning Perl
(published by O’Reilly & Associates). But don’t throw out this book just yet.
Previous: We’d Like to Hear
from You
We’d Like to Hear from You
Programming
Perl
Book
Index
Next: 1.2 Natural and Artificial
Languages
1.2 Natural and Artificial
Languages
[ Library Home | Perl in a Nutshell | Learning Perl | Learning Perl on Win32 |
HackersThirdEye.com
Previous: 1.1 Getting Started
Chapter 1
An Overview of Perl
Next: 1.3 A Grade Example
1.2 Natural and Artificial Languages
Languages were first invented by humans, for the benefit of humans. In the
annals of computer science,
this fact has occasionally been forgotten.[3] Since Perl was designed (loosely
speaking) by an occasional
linguist, it was designed to work smoothly in the same ways that natural
language works smoothly.
Naturally, there are many aspects to this, since natural language works well at
many levels
simultaneously. We could enumerate many of these linguistic principles here,
but the most important
principle of language design is simply that easy things should be easy, and
hard things should be
possible. That may seem obvious, but many computer languages fail at one or the
other.
[3] More precisely, this fact has occasionally been remembered.
Natural languages are good at both because people are continually trying to
express both easy things and
hard things, so the language evolves to handle both. Perl was designed first of
all to evolve, and indeed it
has evolved. Many people have contributed to the evolution of Perl over the
years. We often joke that a
camel is a horse designed by a committee, but if you think about it, the camel
is pretty well adapted for
life in the desert. The camel has evolved to be relatively self-sufficient.[4]
[4] On the other hand, the camel has not evolved to smell good. Neither has
Perl.
Now when someone utters the word “linguistics”, many people think of one of two
things. Either they
think of words, or they think of sentences. But words and sentences are just
two handy ways to “chunk”
speech. Either may be broken down into smaller units of meaning, or combined
into larger units of
meaning. And the meaning of any unit depends heavily on the syntactic,
semantic, and pragmatic context
in which the unit is located. Natural language has words of various sorts,
nouns and verbs and such. If I
say “dog” in isolation, you think of it as a noun, but I can also use the word
in other ways. That is, a noun
can function as a verb, an adjective or an adverb when the context demands it.
If you dog a dog during
the dog days of summer, you’ll be a dog tired dogcatcher.[5]
[5] And you’re probably dog tired of all this linguistics claptrap. But we’d
like you to
understand why Perl is different from the typical computer language, doggone
it!
Perl also evaluates words differently in various contexts. We will see how it
does that later. Just
remember that Perl is trying to understand what you’re saying, like any good
listener does. Perl works
pretty hard to try to keep up its end of the bargain. Just say what you mean,
and Perl will usually “get it”.
(Unless you’re talking nonsense, of course – the Perl parser understands Perl a
lot better than either
English or Swahili.)
But back to nouns. A noun can name a particular object, or it can name a class
of objects generically
without specifying which one or ones are currently being referred to. Most
computer languages make this
distinction, only we call the particular thing a value and the generic one a
variable. A value just exists
somewhere, who knows where, but a variable gets associated with one or more
values over its lifetime.
So whoever is interpreting the variable has to keep track of that association.
That interpreter may be in
your brain, or in your computer.
1.2.1 Nouns
A variable is just a handy place to keep something, a place with a name, so you
know where to find your
special something when you come back looking for it later. As in real life,
there are various kinds of
places to store things, some of them rather private, and some of them out in
public. Some places are
temporary, and other places are more permanent. Computer scientists love to
talk about the “scope” of
variables, but that’s all they mean by it. Perl has various handy ways of
dealing with scoping issues,
which you’ll be happy to learn later when the time is right. Which is not yet.
(Look up the adjectives
“local” and “my” in Chapter 3, Functions, when you get curious.)
But a more immediately useful way of classifying variables is by what sort of
data they can hold. As in
English, Perl’s primary type distinction is between singular and plural data.
Strings and numbers are
singular pieces of data, while lists of strings or numbers are plural. (And
when we get to object-oriented
programming, you’ll find that an object looks singular from the outside, but
may look plural from the
inside, like a class of students.) We call a singular variable a scalar, and a
plural variable an array. Since
a string can be stored in a scalar variable, we might write a slightly longer
(and commented) version of
our first example like this:
$phrase = “Howdy, world!\n”; # Set a variable.
print $phrase; # Print the variable.
Note that we did not have to predefine what kind of variable $phrase is. The $
character tells Perl that
phrase is a scalar variable, that is, one containing a singular value. An array
variable, by contrast,
would start with an @ character. (It may help you to remember that a $ is a
stylized “S”, for “scalar”,
while @ is a stylized “a”, for “array”.)
Perl has some other variable types, with unlikely names like “hash”, “handle”,
and “typeglob”. Like
scalars and arrays, these types of variables are also preceded by funny
characters.[6] For completeness,
Table 1.1 lists all the funny characters you’ll encounter.
[6] Some language purists point to these funny characters as a reason to abhor
Perl. This is
superficial. These characters have many benefits: Variables can be interpolated
into strings
with no additional syntax. Perl scripts are easy to read (for people who have
bothered to
learn Perl!) because the nouns stand out from verbs, and new verbs can be added
to the
language without breaking old scripts. (We told you Perl was designed to
evolve.) And the
noun analogy is not frivolous – there is ample precedent in various natural
languages for
requiring grammatical noun markers. It’s how we think! (We think.)
Table 1.1: Variable Syntax
Type Character Example Is a name for:
Scalar $ $cents An individual value (number or string)
Array @ @large A list of values, keyed by number
Hash % %interest A group of values, keyed by string
Subroutine & &how A callable chunk of Perl code
Typeglob * *struck Everything named struck
1.2.1.1 Singularities
From our example, you can see that scalars may be assigned a new value with the
= operator, just as in
many other computer languages. Scalar variables can be assigned any form of
scalar value: integers,
floating-point numbers, strings, and even esoteric things like references to
other variables, or to objects.
There are many ways of generating these values for assignment.
As in the UNIX shell, you can use different quoting mechanisms to make
different kinds of values.
Double quotation marks (double quotes) do variable interpolation[7] and
backslash interpretation,[8]
while single quotes suppress both interpolation and interpretation. And
backquotes (the ones leaning to
the left) will execute an external program and return the output of the
program, so you can capture it as a
single string containing all the lines of output.
[7] Sometimes called “substitution” by shell programmers, but we prefer to
reserve that
word for something else in Perl. So please call it interpolation. We’re using
the term in the
textual sense (“this passage is a Gnostic interpolation”) rather than in the
mathematical sense
(“this point on the graph is an interpolation between two other points”).
[8] Such as turning \t into a tab, \n into a newline, \001 into a CTRL-A, and
so on, in the
tradition of many UNIX programs.
$answer = 42; # an integer
$pi = 3.14159265; # a “real” number
$avocados = 6.02e23; # scientific notation
$pet = “Camel”; # string
$sign = “I love my $pet”; # string with interpolation
$cost = ‘It costs $100’; # string without interpolation
$thence = $whence; # another variable
$x = $moles * $avocados; # an expression
$cwd = pwd
; # string output from a command
$exit = system(“vi $x”); # numeric status of a command
$fido = new Camel “Fido”; # an object
Uninitialized variables automatically spring into existence as needed.
Following the principle of least
surprise, they are created with a null value, either “” or 0. Depending on
where you use them, variables
will be interpreted automatically as strings, as numbers, or as “true” and
“false” values (commonly called
Boolean values). Various operators expect certain kinds of values as
parameters, so we will speak of
those operators as “providing” or “supplying” a scalar context to those
parameters. Sometimes we’ll be
more specific, and say it supplies a numeric context, a string context, or a
Boolean context to those
parameters. (Later we’ll also talk about list context, which is the opposite of
scalar context.) Perl will
automatically convert the data into the form required by the current context,
within reason. For example,
suppose you said this:
$camels = ‘123’;
print $camels + 1, “\n”;
The original value of $camels is a string, but it is converted to a number to
add 1 to it, and then
converted back to a string to be printed out as 124. The newline, represented
by “\n”, is also in string
context, but since it’s already a string, no conversion is necessary. But
notice that we had to use double
quotes there – using single quotes to say ‘\n’ would result in a two-character
string consisting of a
backslash followed by an “n”, which is not a newline by anybody’s definition.
So, in a sense, double quotes and single quotes are yet another way of
specifying context. The
interpretation of the innards of a quoted string depends on which quotes you
use. Later we’ll see some
other operators that work like quotes syntactically, but use the string in
some special way, such as for
pattern matching or substitution. These all work like double-quoted strings
too. The double-quote context
is the “interpolative” context of Perl, and is supplied by many operators that
don’t happen to resemble
double quotes.
1.2.1.2 Pluralities
Some kinds of variables hold multiple values that are logically tied together.
Perl has two types of
multivalued variables: arrays and hashes. In many ways these behave like
scalars. They spring into
existence with nothing in them when needed. When you assign to them, they
supply a list context to the
right side of the assignment.
You’d use an array when you want to look something up by number. You’d use a
hash when you want to
look something up by name. The two concepts are complementary. You’ll often see
people using an array
to translate month numbers into month names, and a corresponding hash to
translate month names back
into month numbers. (Though hashes aren’t limited to holding only numbers. You
could have a hash that
translates month names to birthstone names, for instance.)
Arrays.
An array is an ordered list of scalars, accessed[9] by the scalar’s position in
the list. The list may contain
numbers, or strings, or a mixture of both. (In fact, it could also contain
references to other lists, but we’ll
get to that in Chapter 4, References and Nested Data Structures, when
we’re discussing multidimensional
arrays.) To assign a list value to an array, you simply group the variables
together (with a set of
parentheses):
[9] Or keyed, or indexed, or subscripted, or looked up. Take your pick.
@home = (“couch”, “chair”, “table”, “stove”);
Conversely, if you use @home in a list context, such as on the right side of a
list assignment, you get
back out the same list you put in. So you could set four scalar variables from
the array like this:
($potato, $lift, $tennis, $pipe) = @home;
These are called list assignments. They logically happen in parallel, so you
can swap two variables by
saying:
($alpha,$omega) = ($omega,$alpha);
As in C, arrays are zero-based, so while you would talk about the first through
fourth elements of the
array, you would get to them with subscripts 0 through 3.[10] Array subscripts
are enclosed in square
brackets [like this], so if you want to select an individual array element, you
would refer to it as
$home[n], where n is the subscript (one less than the element number) you want.
See the example
below. Since the element you are dealing with is a scalar, you always precede
it with a $.
[10] If this seems odd to you, just think of the subscript as an offset, that
is, the count of how
many array elements come before it. Obviously, the first element doesn’t have
any elements
before it, and so has an offset of 0. This is how computers think. (We think.)
If you want to assign to one array element at a time, you could write the
earlier assignment as:
$home[0] = “couch”;
$home[1] = “chair”;
$home[2] = “table”;
$home[3] = “stove”;
Since arrays are ordered, there are various useful operations that you can do
on them, such as the stack
operations, push and pop. A stack is, after all, just an ordered list, with a
beginning and an end.
Especially an end. Perl regards the end of your list as the top of a stack.
(Although most Perl
programmers think of a list as horizontal, with the top of the stack on the
right.)
Hashes.
A hash is an unordered set of scalars, accessed[11] by some string value that
is associated with each
scalar. For this reason hashes are often called “associative arrays”. But
that’s too long for lazy typists to
type, and we talk about them so often that we decided to name them something
short and snappy.[12]
The other reason we picked the name “hash” is to emphasize the fact that
they’re disordered. (They are,
coincidentally, implemented internally using a hash-table lookup, which is why
hashes are so fast, and
stay so fast no matter how many values you put into them.) You
can’t push or pop a hash though,
because it doesn’t make sense. A hash has no beginning or end. Nevertheless,
hashes are extremely
powerful and useful. Until you start thinking in terms of hashes, you aren’t
really thinking in Perl.
[11] Or keyed, or indexed, or subscripted, or looked up. Take your pick.
[12] Presuming for the moment that we can classify any sort of hash as
“snappy”. Please
pass the Tabasco.
Since the keys to a hash are not automatically implied by their position, you
must supply the key as well
as the value when populating a hash. You can still assign a list to it like an
ordinary array, but each pair
of items in the list will be interpreted as a key/value pair. Suppose you
wanted to translate abbreviated
day names to the corresponding full names. You could write the following list
assignment.
%longday = (“Sun”, “Sunday”, “Mon”, “Monday”, “Tue”, “Tuesday”,
“Wed”, “Wednesday”, “Thu”, “Thursday”, “Fri”,
“Friday”, “Sat”, “Saturday”);
Because it is sometimes difficult to read a hash that is defined like this,
Perl provides the => (equal sign,
greater than) sequence as an alternative separator to the comma. Using this
syntax (and some creative
formatting), it is easier to see which strings are the keys, and which strings
are the associated values.
%longday = (
“Sun” => “Sunday”,
“Mon” => “Monday”,
“Tue” => “Tuesday”,
“Wed” => “Wednesday”,
“Thu” => “Thursday”,
“Fri” => “Friday”,
“Sat” => “Saturday”,
);
Not only can you assign a list to a hash, as we did above, but if you use a
hash in list context, it’ll convert
the hash back to a list of key/value pairs, in a weird order. This is
occasionally useful. More often people
extract a list of just the keys, using the (aptly named) keys function. The key
list is also unordered, but
can easily be sorted if desired, using the (aptly named) sort function. More on
that later.
Because hashes are a fancy kind of array, you select an individual hash element
by enclosing the key in
braces. So, for example, if you want to find out the value associated with Wed
in the hash above, you
would use $longday{“Wed”}. Note again that you are dealing with a scalar value,
so you use $, not
%.
Linguistically, the relationship encoded in a hash is genitive or possessive,
like the word “of” in English,
or like “‘s”. The wife of Adam is Eve, so we write:
$wife{“Adam”} = “Eve”;
1.2.2 Verbs
As is typical of your typical imperative computer language, many of the verbs
in Perl are commands:
they tell the Perl interpreter to do something. On the other hand, as is
typical of a natural language, the
meanings of Perl verbs tend to mush off in various directions, depending on the
context. A statement
starting with a verb is generally purely imperative, and evaluated entirely for
its side effects. We often
call these verbs procedures, especially when they’re user-defined. A frequently
seen command (in fact,
you’ve seen it already) is the print command:
print “Adam’s wife is “, $wife{‘Adam’}, “.\n”;
This has the side effect of producing the desired output.
But there are other “moods” besides the imperative mood. Some verbs are for
asking questions, and are
useful in conditional statements. Other verbs translate their input parameters
into return values, just as a
recipe tells you how to turn raw ingredients into something (hopefully) edible.
We tend to call these
verbs functions, in deference to generations of mathematicians who don’t know
what the word
“functional” means in natural language.
An example of a built-in function would be the exponential function:
$e = exp(1); # 2.718281828459, or thereabouts
But Perl doesn’t make a hard distinction between procedures and functions.
You’ll find the terms used
interchangeably. Verbs are also sometimes called subroutines (when
user-defined) or operators (when
built-in). But call them whatever you like – they all return a value, which may
or may not be a
meaningful value, which you may or may not choose to ignore.
As we go on, you’ll see additional examples of how Perl behaves like a natural
language. But there are
other ways to look at Perl too. We’ve already sneakily introduced some notions
from mathematical
language, such as addition and subscripting, not to mention the exponential
function. But Perl is also a
control language, a glue language, a prototyping language, a text-processing
language, a list-processing
language, and an object-oriented language. Among other things.
But Perl is also just a plain old computer language. And that’s how we’ll look
at it next.
Previous: 1.1 Getting Started
1.1 Getting Started
Programming
Perl
Book
Index
Next: 1.3 A Grade Example
1.3 A Grade Example
[ Library Home | Perl in a Nutshell | Learning Perl | Learning Perl on Win32 |
HackersThirdEye.com
Previous: 1.2 Natural and
Artificial Languages
Chapter 1
An Overview of Perl
Next: 1.4
Filehandles
1.3 A Grade Example
Suppose you had a set of scores for each member of a class you are teaching.
You’d like a combined list
of all the grades for each student, plus their average score. You have a text
file (imaginatively named
grades) that looks like this:
Noël 25
Ben 76
Clementine 49
Norm 66
Chris 92
Doug 42
Carol 25
Ben 12
Clementine 0
Norm 66
…
You can use the following script to gather all their scores together, determine
each student’s average, and
print them all out in alphabetical order. This program assumes, rather naively,
that you don’t have two
Carols in your class. That is, if there is a second entry for Carol, the
program will assume it’s just another
score for the first Carol (not to be confused with the first Noël).
By the way, the line numbers are not part of the program, any other
resemblances to BASIC
notwithstanding.
1 #!/usr/bin/perl
2
3 open(GRADES, “grades”) or die “Can’t open grades: $!\n”;
4 while ($line = ) {
5 ($student, $grade) = split(” “, $line);
6 $grades{$student} .= $grade . ” “;
7 }
8
9 foreach $student (sort keys %grades) {
10 $scores = 0;
11 $total = 0;
12 @grades = split(” “, $grades{$student});
13 foreach $grade (@grades) {
14 $total += $grade;
15 $scores++;
16 }
17 $average = $total / $scores;
18 print “$student: $grades{$student}\tAverage: $average\n”;
19 }
Now before your eyes cross permanently, we’d better point out that this example
demonstrates a lot of
what we’ve covered so far, plus quite a bit more that we’ll explain presently.
But if you let your eyes go
just a little out of focus, you may start to see some interesting patterns.
Take some wild guesses now as to
what’s going on, and then later on we’ll tell you if you’re right.
We’d tell you to try running it, but you may not know how yet.
1.3.1 How to Do It
Gee, right about now you’re probably wondering how to run a Perl program. The
short answer is that you
feed it to the Perl language interpreter program, which coincidentally happens
to be named perl (note the
case distinction). The longer answer starts out like this: There’s More Than
One Way To Do It.[13]
[13] That’s the Perl Slogan, and you’ll get tired of hearing it, unless you’re
the Local Expert,
in which case you’ll get tired of saying it. Sometimes it’s shortened to
TMTOWTDI,
pronounced “tim-toady”. But you can pronounce it however you like. After all,
TMTOWTDI.
The first way to invoke perl (and the way most likely to work on any operating
system) is to simply call
perl explicitly from the command line. If you are on a version of UNIX and you
are doing something
fairly simple, you can use the -e switch (% in the following example represents
a standard shell prompt,
so don’t type it):
% perl -e ‘print “Hello, world!\n”;’
On other operating systems, you may have to fiddle with the quotes some. But
the basic principle is the
same: you’re trying to cram everything Perl needs to know into 80 columns or
so.[14]
[14] These types of scripts are often referred to as “one-liners”. If you ever
end up hanging
out with other Perl programmers, you’ll find that some of us are quite fond of
creating
intricate one-liners. Perl has occasionally been maligned as a write-only
language because of
these shenanigans.
For longer scripts, you can use your favorite text editor (or any other text
editor) to put all your
commands into a file and then, presuming you named the script gradation (not to
be confused with
graduation), you’d say:
% perl gradation
You’re still invoking the Perl interpreter explicitly, but at least you don’t
have to put everything on the
command line every time. And you don’t have to fiddle with quotes to keep the
shell happy.
The most convenient way to invoke a script is just to name it directly (or
click on it), and let the
operating system find the interpreter for you. On some systems, there may be
ways of associating various
file extensions or directories with a particular application. On those systems,
you should do whatever it is
you do to associate the Perl script with the Perl interpreter. On UNIX systems
that support the #!
“shebang” notation (and most UNIX systems do, nowadays), you can make the first
line of your script be
magical, so the operating system will know which program to run. Put a line
resembling[15] line 1 of our
example into your program: [15] If perl isn’t in /usr/bin, you’ll have to change the #! line accordingly.
!/usr/bin/perl
Then all you have to say is
% gradation
Of course, this didn’t work because you forgot to make sure the script was
executable (see the manpage
for chmod(1))[16] and in your PATH. If it isn’t in your PATH, you’ll have to
provide a complete
filename so that the operating system knows how to find your script. Something
like
[16] Although Perl has its share of funny notations, this one must be blamed on
UNIX.
chmod(1) means you should refer to the manpage for the chmod command in section
one of
your UNIX manual. If you type either man 1 chmod or man -s 1 chmod (depending
on your flavor of UNIX), you should be able to find out all the interesting
information your
system knows about the command chmod. (Of course, if your flavor of UNIX
happens to be
“Not UNIX!” then you’ll need to refer to your system’s documentation for the
equivalent
command, presuming you are so blessed. Your chief consolation is that, if an
equivalent
command does exist, it will have a much better name than chmod.)
% ../bin/gradation
Finally, if you are unfortunate enough to be on an ancient UNIX system that
doesn’t support the magic
! line, or if the path to your interpreter is longer than 32 characters (a
built-in limit on many systems),
you may be able to work around it like this:
!/bin/sh — # perl, to stop looping
eval ‘exec /usr/bin/perl -S $0 ${1+”$@”}’
if 0;
Some operating systems may require variants on this to deal with /bin/csh, DCL,
COMMAND.COM, or
whatever happens to be your default command interpreter. Ask your Local Expert.
Throughout this book, we’ll just use #!/usr/bin/perl to represent all these
notions and notations,
but you’ll know what we really mean by it.
A random clue: when you write a test script, don’t call your script test. UNIX
systems have a built-in test
command, which will likely be executed instead of your script. Try try instead.
A not-so-random clue: while learning Perl, and even after you think you know
what you’re doing, we
suggest using the -w option, especially during development. This option will
turn on all sorts of useful
and interesting warning messages, not necessarily in that order. You can put
the -w switch on the
shebang line, like this:
!/usr/bin/perl -w
Now that you know how to run your own Perl program (not to be confused with the
perl program), let’s
get back to our example.
Previous: 1.2 Natural and
Artificial Languages
1.2 Natural and Artificial
Languages
Programming
Perl
Book
Index
Next: 1.4
Filehandles
1.4 Filehandles
[ Library Home | Perl in a Nutshell | Learning Perl | Learning Perl on Win32 |
HackersThirdEye.com
Previous: 1.3 A Grade
Example
Chapter 1
An Overview of Perl
Next: 1.5
Operators
1.4 Filehandles
Unless you’re using artificial intelligence to model a solipsistic philosopher,
your program needs some
way to communicate with the outside world. In lines 3 and 4 of our grade
example you’ll see the word
GRADES, which exemplifies another of Perl’s data types, the filehandle. A
filehandle is just a name you
give to a file, device, socket, or pipe to help you remember which one
you’re talking about, and to hide
some of the complexities of buffering and such. (Internally, filehandles are
similar to streams from a
language like C++, or I/O channels from BASIC.)
Filehandles make it easier for you to get input from and send output to many
different places. Part of
what makes Perl a good glue language is that it can talk to many files and
processes at once. Having nice
symbolic names for various external objects is just part of being a good glue
language.[17]
[17] Some of the other things that make Perl a good glue language are: it’s
8-bit clean, it’s
embeddable, and you can embed other things in it via extension modules. It’s
concise, and
networks easily. It’s environmentally conscious, so to speak. You can invoke it
in many
different ways (as we saw earlier). But most of all, the language itself is not
so rigidly
structured that you can’t get it to “flow” around your problem. It comes back
to that
TMTOWTDI thing again.
You create a filehandle and attach it to a file by using the open function.
open takes two parameters: the
filehandle and the filename you want to associate it with. Perl also gives you
some predefined (and
preopened) filehandles. STDIN is your program’s normal input channel, while
STDOUT is your
program’s normal output channel. And STDERR is an additional output channel so
that your program can
make snide remarks off to the side while it transforms (or attempts to
transform) your input into your
output.[18]
[18] These filehandles are typically attached to your terminal, so you can type
to your
program and see its output, but they may also be attached to files (and such).
Perl can give
you these predefined handles because your operating system already provides
them, one
way or another. Under UNIX, processes inherit standard input, output, and error
from their
parent process, typically a shell. One of the duties of a shell is to set up
these I/O streams so
that the child process doesn’t need to worry about them.
Since you can use the open function to create filehandles for various purposes
(input, output, piping),
you need to be able to specify which behavior you want. As you would do on the
UNIX command line,
you simply add characters to the filename.
open(SESAME, “filename”); # read from existing file
open(SESAME, “<filename”); # (same thing, explicitly)
open(SESAME, “>filename”); # create file and write to it
open(SESAME, “>>filename”); # append to existing file
open(SESAME, “| output-pipe-command”); # set up an output filter
open(SESAME, “input-pipe-command |”); # set up an input filter
As you can see, the name you pick is arbitrary. Once opened, the filehandle
SESAME can be used to
access the file or pipe until it is explicitly closed (with, you guessed it,
close(SESAME)), or the
filehandle is attached to another file by a subsequent open on the same
filehandle.[19]
[19] Opening an already opened filehandle implicitly closes the first file,
making it
inaccessible to the filehandle, and opens a different file. You must be careful
that this is
what you really want to do. Sometimes it happens accidentally, like when
you say
open($handle,$file), and $handle happens to contain the null string. Be sure to
set $handle to something unique, or you’ll just open a new file on the null
filehandle.
Once you’ve opened a filehandle for input (or if you want to use STDIN), you
can read a line using the
line reading operator, <>. This is also known as the angle operator, because of
its shape. The angle
operator encloses the filehandle () you want to read lines from.[20] An
example using the
STDIN filehandle to read an answer supplied by the user would look something
like this:
[20] The empty angle operator, <>, will read lines from all the files specified
on the
command line, or STDIN, if none were specified. (This is standard behavior for
many
UNIX filter programs.)
print STDOUT “Enter a number: “; # ask for a number
$number = ; # input the number
print STDOUT “The number is $number\n”; # print the number
Did you see what we just slipped by you? What’s the STDOUT doing in those print
statements there?
Well, that’s one of the ways you can use an output filehandle. A filehandle may
be supplied as the first
argument to the print statement, and if present, tells the output where to go.
In this case, the filehandle is
redundant, because the output would have gone to STDOUT anyway. Much as STDIN
is the default for
input, STDOUT is the default for output. (In line 18 of our grade example, we
left it out, to avoid
confusing you up till now.)
We also did something else to trick you. If you try the above example, you may
notice that you get an
extra blank line. This happens because the read does not automatically remove
the newline from your
input line (your input would be, for example, “9\n”). For those times when you
do want to remove the
newline, Perl provides the chop and chomp functions. chop will indiscriminately
remove (and return)
the last character passed to it, while chomp will only remove the end of record
marker (generally, “\n”),
and return the number of characters so removed. You’ll often see this idiom for
inputting a single line:
chop($number = ); # input number and remove newline
which means the same thing as
$number = ; # input number
chop($number); # remove newline
Previous: 1.3 A Grade
Example
1.3 A Grade Example
Programming
Perl
Book
Index
Next: 1.5
Operators
1.5 Operators
[ Library Home | Perl in a Nutshell | Learning Perl | Learning Perl on Win32 |
HackersThirdEye.com
Previous: 1.4
Filehandles
Chapter 1
An Overview of Perl
Next: 1.6 Control Structures
1.5 Operators
As we alluded to earlier, Perl is also a mathematical language. This is true at
several levels, from
low-level bitwise logical operations, up through number and set manipulation,
on up to larger predicates
and abstractions of various sorts. And as we all know from studying math in
school, mathematicians love
strange symbols. What’s worse, computer scientists have come up with their own
versions of these
strange symbols. Perl has a number of these strange symbols too, but take
heart, most are borrowed
directly from C, FORTRAN, sed(1) or awk(1), so they’ll at least be familiar to
users of those languages.
Perl’s built-in operators may be classified by number of operands into unary,
binary, and trinary
operators. They may be classified by whether they’re infix operators or prefix
operators. They may also
be classified by the kinds of objects they work with, such as numbers, strings,
or files. Later, we’ll give
you a table of all the operators, but here are some to get you started.
1.5.1 Arithmetic Operators
Arithmetic operators do exactly what you would expect from learning them in
school. They perform
some sort of mathematical function on numbers.
Table 1.2: Some Binary Arithmetic Operators
Example Name Result
$a + $b Addition Sum of $a and $b
$a * $b Multiplication Product of $a and $b
$a % $b Modulus Remainder of $a divided by $b
$a ** $b Exponentiation $a to the power of $b
Yes, we left subtraction and division out of Table 1.2. But we suspect you can
figure out how they should
work. Try them and see if you’re right. (Or cheat and look in the index.)
Arithmetic operators are
evaluated in the order your math teacher taught you (exponentiation before
multiplication, and
multiplication before addition). You can always use parentheses to make it come
out differently.
1.5.2 String Operators
There is also an “addition” operator for strings that does concatenation.
Unlike some languages that
confuse this with numeric addition, Perl defines a separate operator (.) for
string concatenation:
$a = 123;
$b = 456;
print $a + $b; # prints 579
print $a . $b; # prints 123456
There’s also a “multiply” operation for strings, also called the repeat
operator. Again, it’s a separate
operator (x) to keep it distinct from numeric multiplication:
$a = 123;
$b = 3;
print $a * $b; # prints 369
print $a x $b; # prints 123123123
These string operators bind as tightly as their corresponding arithmetic
operators. The repeat operator is a
bit unusual in taking a string for its left argument but a number for its right
argument. Note also how Perl
is automatically converting from numbers to strings. You could have put all the
literal numbers above in
quotes, and it would still have produced the same output. Internally though, it
would have been
converting in the opposite direction (that is, from strings to numbers).
A couple more things to think about. String concatenation is also implied by
the interpolation that
happens in double-quoted strings. When you print out a list of values, you’re
also effectively
concatenating strings. So the following three statements produce the same
output:
print $a . ‘ is equal to ‘ . $b . “\n”; # dot operator
print $a, ‘ is equal to ‘, $b, “\n”; # list
print “$a is equal to $b\n”; # interpolation
Which of these you use in any particular situation is entirely up to you.
The x operator may seem relatively worthless at first glance, but it is quite
useful at times, especially for
things like this:
print “-” x $scrwid, “\n”;
which draws a line across your screen, presuming your screen width is in
$scrwid.
1.5.3 Assignment Operators
Although it’s not exactly a mathematical operator, we’ve already made extensive
use of the simple
assignment operator, =. Try to remember that = means “gets set to” rather than
“equals”. (There is also a
mathematical equality operator == that means “equals”, and if you start out
thinking about the difference
between them now, you’ll save yourself a lot of headache later.)
Like the operators above, assignment operators are binary infix operators,
which means they have an
operand on either side of the operator. The right operand can be any expression
you like, but the left
operand must be a valid lvalue (which, when translated to English, means a
valid storage location like a
variable, or a location in an array). The most common assignment operator is
simple assignment. It
determines the value of the expression on its right side, and sets the variable
on the left side to that value:
$a = $b;
$a = $b + 5;
$a = $a * 3;
Notice the last assignment refers to the same variable twice; once for the
computation, once for the
assignment. There’s nothing wrong with that, but it’s a common enough operation
that there’s a shortcut
for it (borrowed from C). If you say:
lvalue operator= expression
it is evaluated as if it were:
lvalue = lvalue operator expression
except that the lvalue is not computed twice. (This only makes a difference if
evaluation of the lvalue has
side effects. But when it does make a difference, it usually does what you
want. So don’t sweat it.)
So, for example, you could write the above as:
$a *= 3;
which reads “multiply $a by 3”. You can do this with almost any binary operator
in Perl, even some that
you can’t do it with in C:
$line .= “\n”; # Append newline to $line.
$fill x= 80; # Make string $fill into 80 repeats of itself.
$val ||= “2”; # Set $val to 2 if it isn’t already set.
Line 6 of our grade example contains two string concatenations, one of which is
an assignment operator.
And line 14 contains a +=.
Regardless of which kind of assignment operator you use, the final value is
returned as the value of the
assignment as a whole. (This is unlike, say, Pascal, in which assignment is a
statement and has no value.)
This is why we could say:
chop($number = );
and have it chop the final value of $number. You also frequently see assignment
as the condition of a
while loop, as in line 4 of our grade example.
1.5.4 Autoincrement and Autodecrement Operators
As if $variable += 1 weren’t short enough, Perl borrows from C an even shorter
way to increment a
variable. The autoincrement and autodecrement operators simply add (or
subtract) one from the value of
the variable. They can be placed on either side of the variable, depending on
when you want them to be
evaluated (see Table 1.3).
Table 1.3: Unary Arithmetic Operators
Example Name Result
++$a, $a++ Autoincrement Add 1 to $a
–$a, $a– Autodecrement Subtract 1 from $a
If you place one of the auto operators before the variable, it is known as a
pre-incremented
(pre-decremented) variable. Its value will be changed before it is referenced.
If it is placed after the
variable, it is known as a post-incremented (post-decremented) variable and its
value is changed after it is
used. For example:
$a = 5; # $a is assigned 5
$b = ++$a; # $b is assigned the incremented value of $a, 6
$c = $a–; # $c is assigned 6, then $a is decremented to 5
Line 15 of our grade example increments the number of scores by one, so that
we’ll know how many
scores we’re averaging the grade over. It uses a post-increment operator
($scores++), but in this case it
doesn’t matter, since the expression is in a void context, which is just a
funny way of saying that the
expression is being evaluated only for the side effect of incrementing the
variable. The value returned is
being thrown away.[21]
[21] The optimizer will notice this and optimize the post-increment into a
pre-increment,
because that’s a little more efficient to execute. (You didn’t need to know
that, but we hoped
it would cheer you up.)
1.5.5 Logical Operators
Logical operators, also known as “short-circuit” operators, allow the program
to make decisions based on
multiple criteria, without using nested conditionals. They are known as
short-circuit because they skip
evaluating their right argument if evaluating their left argument is sufficient
to determine the overall
value.
Perl actually has two sets of logical operators, a crufty old set borrowed from
C, and a nifty new set of
ultralow-precedence operators that parse more like people expect them to parse,
and are also easier to
read. (Once they’re parsed, they behave identically though.) See Table 1.4 for
examples of logical
operators.
Table 1.4: Logical Operators
Example Name Result
$a && $b And $a if $a is false, $b otherwise
$a || $b Or $a if $a is true, $b otherwise
! $a Not True if $a is not true
$a and $b And $a if $a is false, $b otherwise
$a or $b Or $a if $a is true, $b otherwise
not $a Not True if $a is not true
Since the logical operators “short circuit” the way they do, they’re often used
to conditionally execute
code. The following line (from our grade example) tries to open the file
grades.
open(GRADES, “grades”) or die “Can’t open file grades: $!\n”;
If it opens the file, it will jump to the next line of the program. If it can’t
open the file, it will provide us
with an error message and then stop execution.
Literally, the above message means “Open grades or die!” Besides being another
example of natural
language, the short-circuit operators preserve the visual flow. Important
actions are listed down the left
side of the screen, and secondary actions are hidden off to the right.
(The $! variable contains the error
message returned by the operating system – see “Special Variables” in Chapter
2). Of course, these
logical operators can also be used within the more traditional kinds of
conditional constructs, such as the
if and while statements.
1.5.6 Comparison Operators
Comparison, or relational, operators tell us how two scalar values (numbers or
strings) relate to each
other. There are two sets of operators – one does numeric comparison and the
other does string
comparison. (In either case, the arguments will be “coerced” to have the
appropriate type first.) Table 1.5
assumes $a and $b are the left and right arguments, respectively.
Table 1.5: Some Numeric and String Comparison Operators
Comparison Numeric String Return Value
Equal == eq True if $a is equal to $b
Not equal != ne True if $a is not equal to $b
Less than < lt True if $a is less than $b
Greater than > gt True if $a is greater than $b
Less than or equal <= le True if $a not greater than $b
Comparison <=> cmp 0 if equal, 1 if $a greater, -1 if $b
greater
The last pair of operators (<=> and cmp) are entirely redundant. However,
they’re incredibly useful in
sort subroutines (see Chapter 3).[22]
[22] Some folks feel that such redundancy is evil because it keeps a language
from being
minimalistic, or orthogonal. But Perl isn’t an orthogonal language; it’s a
diagonal language.
By which we mean that Perl doesn’t force you to always go at right
angles. Sometimes you
just want to follow the hypotenuse of the triangle to get where you’re going.
TMTOWTDI is
about shortcuts. Shortcuts are about efficiency.
1.5.7 File Test Operators
The file test operators allow you to test whether certain file attributes are
set before you go and blindly
muck about with the files. For example, it would be very nice to know that the
file /etc/passwd already
exists before you go and open it as a new file, wiping out everything that was
in there before. See Table
1.6 for examples of file test operators.
Table 1.6: Some File Test Operators
Example Name Result
-e $a Exists True if file named in $a exists
-r $a Readable True if file named in $a is readable
-w $a Writable True if file named in $a is writable
-d $a Directory True if file named in $a is a directory
-f $a File True if file named in $a is a regular file
-T $a Text File True if file named in $a is a text file
Here are some examples:
-e “/usr/bin/perl” or warn “Perl is improperly installed\n”;
-f “/vmunix” and print “Congrats, we seem to be running BSD Unix\n”;
Note that a regular file is not the same thing as a text file. Binary files
like /vmunix are regular files, but
they aren’t text files. Text files are the opposite of binary files, while
regular files are the opposite of
irregular files like directories and devices.
There are a lot of file test operators, many of which we didn’t list. Most of
the file tests are unary Boolean
operators: they take only one operand, a scalar that evaluates to a file or a
filehandle, and they return
either a true or false value. A few of them return something fancier,
like the file’s size or age, but you can
look those up when you need them.
Previous: 1.4
Filehandles
1.4 Filehandles
Programming
Perl
Book
Index
Next: 1.6 Control Structures
1.6 Control Structures
[ Library Home | Perl in a Nutshell | Learning Perl | Learning Perl on Win32 |
HackersThirdEye.com
Previous: 1.5
Operators
Chapter 1
An Overview of Perl
Next: 1.7 Regular
Expressions
1.6 Control Structures
So far, except for our one large example, all of our examples have been
completely linear; we executed each
command in order. We’ve seen a few examples of using the short circuit
operators to cause a single
command to be (or not to be) executed. While you can write some very
useful linear programs (a lot of CGI
scripts fall into this category), you can write much more powerful programs if
you have conditional
expressions and looping mechanisms. Collectively, these are known as control
structures. So you can also
think of Perl as a control language.
But to have control, you have to be able to decide things, and to decide
things, you have to know the
difference between what’s true and what’s false.
1.6.1 What Is Truth?
We’ve bandied about the term truth,[23] and we’ve mentioned that certain
operators return a true or a false
value. Before we go any further, we really ought to explain exactly what we
mean by that. Perl treats truth a
little differently than most computer languages, but after you’ve worked with
it awhile it will make a lot of
sense. (Actually, we’re hoping it’ll make a lot of sense after you’ve read the
following.)
[23] Strictly speaking, this is not true.
Basically, Perl holds truths to be self-evident. That’s a glib way of saying
that you can evaluate almost
anything for its truth value. Perl uses practical definitions of truth that
depend on the type of thing you’re
evaluating. As it happens, there are many more kinds of truth than there are of
nontruth.
Truth in Perl is always evaluated in a scalar context. (Other than that, no
type coercion is done.) So here are
the rules for the various kinds of values that a scalar can hold:
- Any string is true except for “” and “0”.
- Any number is true except for 0.
- Any reference is true.
- Any undefined value is false.
Actually, the last two rules can be derived from the first two. Any reference
(rule 3) points to something with
an address, and would evaluate to a number or string containing that address,
which is never 0. And any
undefined value (rule 4) would always evaluate to 0 or the null string.
And in a way, you can derive rule 2 from rule 1 if you pretend that everything
is a string. Again, no coercion
is actually done to evaluate truth, but if a coercion to string were done, then
any numeric value of 0 would
simply turn into the string “0”, and be false. Any other number would not turn
into the string “0”, and so
would be true. Let’s look at some examples so we can understand this better:
0 # would become the string “0”, so false
1 # would become the string “1”, so true
10 – 10 # 10-10 is 0, would convert to string “0”, so false
0.00 # becomes 0, would convert to string “0”, so false
“0” # the string “0”, so false
“” # a null string, so false
“0.00” # the string “0.00”, neither empty nor exactly “0”, so true
“0.00” + 0 # the number 0 (coerced by the +), so false.
\$a # a reference to $a, so true, even if $a is false
undef() # a function returning the undefined value, so false
Since we mumbled something earlier about truth being evaluated in a scalar
context, you might be
wondering what the truth value of a list is. Well, the simple fact is, there is
no operation in Perl that will
return a list in a scalar context. They all return a scalar value
instead, and then you apply the rules of truth to
that scalar. So there’s no problem, as long as you can figure out what any
given operator will return in a
scalar context.
1.6.1.1 The if and unless statements
We saw earlier how a logic operator could function as a conditional. A slightly
more complex form of the
logic operators is the if statement. The if statement evaluates a truth
condition, and executes a block if the
condition is true.
A block is one or more statements grouped together by a set of braces. Since
the if statement executes a
block, the braces are required by definition. If you know a language like C,
you’ll notice that this is different.
Braces are optional in C if you only have a single line of code, but they are
not optional in Perl.
if ($debug_level > 0) {
Something has gone wrong. Tell the user.
print “Debug: Danger, Will Robinson, danger!\n”;
print “Debug: Answer was ’54’, expected ’42’.\n”;
}
Sometimes, just executing a block when a condition is met isn’t enough. You may
also want to execute a
different block if that condition isn’t met. While you could certainly use two
if statements, one the negation
of the other, Perl provides a more elegant solution. After the block, if
can take an optional second condition,
called else, to be executed only if the truth condition is false. (Veteran
computer programmers will not be
surprised at this point.)
Other times, you may even have more than two possible choices. In this case,
you’ll want to add an elsif truth
condition for the other possible choices. (Veteran computer programmers may
well be surprised by the
spelling of “elsif”, for which nobody here is going to apologize. Sorry.)
if ($city eq “New York”) {
print “New York is northeast of Washington, D.C.\n”;
}
elsif ($city eq “Chicago”) {
print “Chicago is northwest of Washington, D.C.\n”;
}
elsif ($city eq “Miami”) {
print “Miami is south of Washington, D.C. And much warmer!\n”;
}
else {
print “I don’t know where $city is, sorry.\n”;
}
The if and elsif clauses are each computed in turn, until one is found to be
true or the else condition is
reached. When one of the conditions is found to be true, its block is executed
and all the remaining branches
are skipped. Sometimes, you don’t want to do anything if the condition is true,
only if it is false. Using an
empty if with an else may be messy, and a negated if may be illegible; it
sounds weird to say “do something
if not this is true”. In these situations, you would use the unless statement.
unless ($destination eq $home) {
print “I’m not going home.\n”;
}
There is no “elsunless” though. This is generally construed as a feature.
1.6.2 Iterative (Looping) Constructs
Perl has four main iterative statement types: while, until, for, and foreach.
These statements allow a Perl
program to repeatedly execute the same code for different values.
1.6.2.1 The while and until statements
The while and until statements function similarly to the if and unless
statements, in a looping fashion. First,
the conditional part of the statement is checked. If the condition is met (if
it is true for a while or false for an
until) the block of the statement is executed.
while ($tickets_sold < 10000) {
$available = 10000 – $tickets_sold;
print “$available tickets are available. How many would you like: “;
$purchase = ;
chomp($purchase);
$tickets_sold += $purchase;
}
Note that if the original condition is never met, the loop will never be
entered at all. For example, if we’ve
already sold 10,000 tickets, we might want to have the next line of the program
say something like:
print “This show is sold out, please come back later.\n”;
In our grade example earlier, line 4 reads:
while ($line = ) {
This assigns the next line to the variable $line, and as we explained earlier,
returns the value of $line so
that the condition of the while statement can evaluate $line for truth. You
might wonder whether Perl will
get a false negative on blank lines and exit the loop prematurely. The answer
is that it won’t. The reason is
clear, if you think about everything we’ve said. The line input operator leaves
the newline on the end of the
string, so a blank line has the value “\n”. And you know that “\n” is not one
of the canonical false values.
So the condition is true, and the loop continues even on blank lines.
On the other hand, when we finally do reach the end of the file, the line input
operator returns the undefined
value, which always evaluates to false. And the loop terminates, just when we
wanted it to. There’s no need
for an explicit test against the eof function in Perl, because the input
operators are designed to work smoothly
in a conditional context.
In fact, almost everything is designed to work smoothly in a conditional
context. For instance, an array in a
scalar context returns its length. So you often see:
while (@ARGV) {
process(shift @ARGV);
}
The loop automatically exits when @ARGV is exhausted.
1.6.2.2 The for statement
Another iterative statement is the for loop. A for loop runs exactly like the
while loop, but looks a good deal
different. (C programmers will find it very familiar though.)
for ($sold = 0; $sold < 10000; $sold += $purchase) {
$available = 10000 – $sold;
print “$available tickets are available. How many would you like: “;
$purchase = ;
chomp($purchase);
}
The for loop takes three expressions within the loop’s parentheses: an
expression to set the initial state of the
loop variable, a condition to test the loop variable, and an expression to
modify the state of the loop variable.
When the loop starts, the initial state is set and the truth condition is
checked. If the condition is true, the
block is executed. When the block finishes, the modification expression is
executed, the truth condition is
again checked, and if true, the block is rerun with the new values. As long as
the truth condition remains
true, the block and the modification expression will continue to be executed.
1.6.2.3 The foreach statement
The last of Perl’s main iterative statements is the foreach statement. foreach
is used to execute the same
code for each of a known set of scalars, such as an array:
foreach $user (@users) {
if (-f “$home{$user}/.nexrc”) {
print “$user is cool… they use a perl-aware vi!\n”;
}
}
In a foreach statement, the expression in parentheses is evaluated to produce a
list. Then each element of the
list is aliased to the loop variable in turn, and the block of code is executed
once for each element. Note that
the loop variable becomes a reference to the element itself, rather than a copy
of the element. Hence,
modifying the loop variable will modify the original array.
You find many more foreach loops in the typical Perl program than for loops,
because it’s very easy in Perl
to generate the lists that foreach wants to iterate over. A frequently seen
idiom is a loop to iterate over the
sorted keys of a hash:
foreach $key (sort keys %hash) {
In fact, line 9 of our grade example does precisely that.
1.6.2.4 Breaking out: next and last
The next and last operators allow you to modify the flow of your loop. It is
not at all uncommon to have a
special case; you may want to skip it, or you may want to quit when you
encounter it. For example, if you are
dealing with UNIX accounts, you may want to skip the system accounts (like root
or lp). The next operator
would allow you to skip to the end of your current loop iteration, and start
the next iteration. The last
operator would allow you to skip to the end of your block, as if your test
condition had returned false. This
might be useful if, for example, you are looking for a specific account and
want to quit as soon as you find it.
foreach $user (@users) {
if ($user eq “root” or $user eq “lp”) {
next;
}
if ($user eq “special”) {
print “Found the special account.\n”;
do some processing
last;
}
}
It’s possible to break out of multi-level loops by labeling your loops and
specifying which loop you want to
break out of. Together with statement modifiers (another form of conditional we
haven’t talked about), this
can make for very readable loop exits, if you happen to think English is
readable:
LINE: while ($line = ) {
last LINE if $line eq “\n”; # stop on first blank line
next LINE if $line =~ /^#/; # skip comment lines
your ad here
}
You may be saying, “Wait a minute, what’s that funny ^# thing there inside the
leaning toothpicks? That
doesn’t look much like English.” And you’re right. That’s a pattern match
containing a regular expression
(albeit a rather simple one). And that’s what the next section is about. Perl
is above all a text processing
language, and regular expressions are at the heart of Perl’s text processing.
Previous: 1.5
Operators
1.5 Operators
Programming
Perl
Book
Index
Next: 1.7 Regular
Expressions
1.7 Regular Expressions
[ Library Home | Perl in a Nutshell | Learning Perl | Learning Perl on Win32 |
Programming Perl | Advanced Perl Programming
| Perl Cookbook ]
Previous: 1.6 Control
Structures
Chapter 1
An Overview of Perl
Next: 1.8 List
Processing
1.7 Regular Expressions
Regular expressions (aka regexps, regexes or REs) are used by many UNIX
programs, such as grep, sed
and awk,[24] editors like vi and emacs, and even some of the shells. A regular
expression is a way of
describing a set of strings without having to list all the strings in your set.
[24] A good source of information on regular expression concepts is the
Nutshell Handbook
sed & awk by Dale Dougherty (O’Reilly & Associates). You might also keep an eye
out for
Jeffrey Friedl’s forthcoming book, Mastering Regular Expressions (O’Reilly &
Associates).
Regular expressions are used several ways in Perl. First and foremost, they’re
used in conditionals to
determine whether a string matches a particular pattern. So when you see
something that looks like
/foo/, you know you’re looking at an ordinary pattern-matching operator.
Second, if you can locate patterns within a string, you can replace them with
something else. So when
you see something that looks like s/foo/bar/, you know it’s asking Perl to
substitute “bar” for “foo”,
if possible. We call that the substitution operator.
Finally, patterns can specify not only where something is, but also where it
isn’t. So the split operator
uses a regular expression to specify where the data isn’t. That is, the regular
expression defines the
delimiters that separate the fields of data. Our grade example has a couple of
trivial examples of this.
Lines 5 and 12 each split strings on the space character in order to return
a list of words. But you can
split on any delimiter you can specify with a regular expression.
(There are various modifiers you can use in each of these situations to do
exotic things like ignore case
when matching alphabetic characters, but these are the sorts of gory details
that we’ll cover in Chapter 2.)
The simplest use of regular expressions is to match a literal expression. In
the case of the splits we just
mentioned, we matched on a single space. But if you match on several characters
in a row, they all have
to match sequentially. That is, the pattern looks for a substring, much as
you’d expect. Let’s say we want
to show all the lines of an HTML file that are links to other HTML files (as
opposed to FTP links). Let’s
imagine we’re working with HTML for the first time, and we’re being a little
naive yet. We know that
these links will always have “http:” in them somewhere. We could loop through
our file with this:[25]
[25] This is very similar to what the UNIX command grep ‘http:’ file would do.
On MS-DOS you could use the find command, but it doesn’t know how to do more
complicated regular expressions. (However, the misnamed findstr program of
Windows NT
does know about regular expressions.)
while ($line = ) {
if ($line =~ /http:/) {
print $line;
}
}
Here, the =~ (pattern binding operator) is telling Perl to look for a match of
the regular expression
http: in the variable $line. If it finds the expression, the operator returns a
true value and the block
(a print command) is executed. By the way, if you don’t use the =~ binding
operator, then Perl will
search a default variable instead of $line. This default space is really just a
special variable that goes
by the odd name of $_. In fact, many of the operators in Perl default to
using the $_ variable, so an expert
Perl programmer might write the above as:
while () {
print if /http:/;
}
(Hmm, another one of those statement modifiers seems to have snuck in there.
Insidious little beasties.)
This stuff is pretty handy, but what if we wanted to find all the links, not
just the HTTP links? We could
give a list of links, like “http:”, “ftp:”, “mailto:”, and so on. But that list
could get long, and what
would we do when a new kind of link was added?
while () {
print if /http:/;
print if /ftp:/;
print if /mailto:/;
What next?
}
Since regular expressions are descriptive of a set of strings, we can just
describe what we are looking for:
a number of alphabetic characters followed by a colon. In regular expression
talk (Regexpese?), that
would be /[a-zA-Z]+:/, where the brackets define a character class. The a-z and
A-Z represent all
alphabetic characters (the dash means the range of all characters between the
starting and ending
character, inclusive). And the + is a special character which says “one or more
of whatever was before
me”. It’s what we call a quantifier, meaning a gizmo that says how many
times something is allowed to
repeat. (The slashes aren’t really part of the regular expression, but rather
part of the pattern match
operator. The slashes are acting like quotes that just happen to contain a
regular expression.)
Because certain classes like the alphabetics are so commonly used, Perl defines
special cases for them.
See Table 1.7 for these special cases.
Table 1.7: Regular Expression Character
Classes
Name Definition Code
Whitespace [ \t\n\r\f] \s
Word character [a-zA-Z_0-9] \w
Digit [0-9] \d
Note that these match single characters. A \w will match any single word
character, not an entire word.
(Remember that + quantifier? You can say \w+ to match a word.) Perl also
provides the negation of
these classes by using the uppercased character, such as \D for a non-digit
character.
(We should note that \w is not always equivalent to [a-zA-Z_0-9]. Some locales
define additional
alphabetic characters outside the ASCII sequence, and \w respects them.)
There is one other very special character class, written with a “.”, that will
match any character
whatsoever.[26] For example, /a./ will match any string containing an “a” that
is not the last character
in the string. Thus it will match “at” or “am” or even “a+”, but
not “a”, since there’s nothing after the
“a” for the dot to match. Since it’s searching for the pattern anywhere in the
string, it’ll match “oasis”
and “camel”, but not “sheba”. It matches “caravan” on the first “a”. It could
match on the second
“a”, but it stops after it finds the first suitable match, searching from left
to right.
[26] Except that it won’t normally match a newline. When you think about it, a
“.” doesn’t
normally match a newline in grep(1) either.
1.7.1 Quantifiers
The characters and character classes we’ve talked about all match single
characters. We mentioned that
you could match multiple “word” characters with \w+ in order to match an entire
word. The + is one
kind of quantifier, but there are others. (All of them are placed after the
item being quantified.)
The most general form of quantifier specifies both the minimum and maximum
number of times an item
can match. You put the two numbers in braces, separated by a comma. For
example, if you were trying to
match North American phone numbers, /\d{7,11}/ would match at least seven
digits, but no more
than eleven digits. If you put a single number in the braces, the number
specifies both the minimum and
the maximum; that is, the number specifies the exact number of times the item
can match. (If you think
about it, all unquantified items have an implicit {1} quantifier.)
If you put the minimum and the comma but omit the maximum, then the maximum is
taken to be
infinity. In other words, it will match at least the minimum number of times,
plus as many as it can get
after that. For example, /\d{7}/ will only match a local (North American) phone
number (7 digits),
while /\d{7,}/ will match any phone number, even an international one (unless
it happens to be
shorter than 7 digits). There is no special way of saying “at most” a certain
number of times. Just say
/.{0,5}/, for example, to find at most five arbitrary characters.
Certain combinations of minimum and maximum occur frequently, so Perl defines
special quantifiers for
them. We’ve already seen +, which is the same as {1,}, or “at least one of the
preceding item”. There is
also *, which is the same as {0,}, or “zero or more of the preceding item”, and
?, which is the same as
{0,1}, or “zero or one of the preceding item” (that is, the preceding item is
optional).
There are a couple things about quantification that you need to be careful of.
First of all, Perl quantifiers
are by default greedy. This means that they will attempt to match as much as
they can as long as the
entire expression still matches. For example, if you are matching /\d+/ against
“1234567890”, it will
match the entire string. This is something to especially watch out for when you
are using “.”, any
character. Often, someone will have a string like:
spp:Fe+H20=FeO2;H:2112:100:Stephen P Potter:/home/spp:/bin/tcsh
and try to match “spp:” with /.+:/. However, since the + quantifier is greedy,
this pattern will match
everything up to and including “/home/spp:”. Sometimes you can avoid this by
using a negated
character class, that is, by saying /[^:]+:/, which says to match one or more
non-colon characters (as
many as possible), up to the first colon. It’s that little caret in there that
negates the sense of the character
class.[27] The other point to be careful about is that regular expressions will
try to match as early as
possible. This even takes precedence over being greedy. Since scanning happens
left-to-right, this means
that the pattern will match as far left as possible, even if there is some
other place where it could match
longer. (Regular expressions are greedy, but they aren’t into delayed
gratification.) For example, suppose
you’re using the substitution command (s///) on the default variable space
(variable $_, that is), and
you want to remove a string of x’s from the middle of the string. If you say:
[27] Sorry, we didn’t pick that notation, so don’t blame us. That’s just how
regular
expressions are customarily written in UNIX culture.
$_ = “fred xxxxxxx barney”;
s/x*//;
it will have absolutely no effect. This is because the x* (meaning zero or more
“x” characters) will be
able to match the “nothing” at the beginning of the string, since the null
string happens to be zero
characters wide and there’s a null string just sitting there plain as day
before the “f” of “fred”.[28]
[28] Even the authors get caught by this from time to time.
There’s one other thing you need to know. By default quantifiers apply to a
single preceding character, so
/bam{2}/ will match “bamm” but not “bambam”. To apply a quantifier to more than
one character, use
parentheses. So to match “bambam”, use the pattern /(bam){2}/.
1.7.2 Minimal Matching
If you were using an ancient version of Perl and you didn’t want greedy
matching, you had to use a
negated character class. (And really, you were still getting greedy matching of
a constrained variety.)
In modern versions of Perl, you can force nongreedy, minimal matching by use of
a question mark after
any quantifier. Our same username match would now be /.?:/. That .? will now
try to match as
few characters as possible, rather than as many as possible, so it stops at the
first colon rather than the
last.
1.7.3 Nailing Things Down
Whenever you try to match a pattern, it’s going to try to match in every
location till it finds a match. An
anchor allows you to restrict where the pattern can match. Essentially, an
anchor is something that
matches a “nothing”, but a special kind of nothing that depends on its
surroundings. You could also call it
a rule, or a constraint, or an assertion. Whatever you care to call it, it
tries to match something of zero
width, and either succeeds or fails. (If it fails, it merely means that the
pattern can’t match that particular
way. The pattern will go on trying to match some other way, if there are any
other ways to try.)
The special character string \b matches at a word boundary, which is defined as
the “nothing” between a
word character (\w) and a non-word character (\W), in either order. (The
characters that don’t exist off
the beginning and end of your string are considered to be non-word characters.)
For example,
/\bFred\b/
would match both “The Great Fred” and “Fred the Great”, but would not match
“Frederick the Great” because the “de” in “Frederick” does not contain a word
boundary.
In a similar vein, there are also anchors for the beginning of the string and
the end of the string. If it is the
first character of a pattern, the caret (^) matches the “nothing” at the
beginning of the string. Therefore,
the pattern /^Fred/ would match “Frederick the Great” and not “The Great Fred”,
whereas /Fred^/
wouldn’t match either. (In fact, it doesn’t even make much sense.) The dollar
sign ($) works like the
caret, except that it matches the “nothing” at the end of the string instead
of the beginning.[29]
[29] This is a bit oversimplified, since we’re assuming here that your string
contains only
one line. ^ and $ are actually anchors for the beginnings and endings of lines
rather than
strings. We’ll try to straighten this all out in Chapter 2 (to the extent that
it can be
straightened out).
So now you can probably figure out that when we said:
next LINE if $line =~ /^#/;
we meant “Go to the next iteration of LINE loop if this line happens to begin
with a # character.”
1.7.4 Backreferences
We mentioned earlier that you can use parentheses to group things for
quantifiers, but you can also use
parentheses to remember bits and pieces of what you matched. A pair of
parentheses around a part of a
regular expression causes whatever was matched by that part to be remembered
for later use. It doesn’t
change what the part matches, so /\d+/ and /(\d+)/ will still match as many
digits as possible, but
in the latter case they will be remembered in a special variable to be
backreferenced later.
How you refer back to the remembered part of the string depends on where you
want to do it from.
Within the same regular expression, you use a backslash followed by an integer.
The integer
corresponding to a given pair of parentheses is determined by counting left
parentheses from the
beginning of the pattern, starting with one. So for example, to
match something similar to an HTML tag
(like “Bold“, you might use /<(.?)>.?<\/\1>/. This forces the two
parts of the
pattern to match the exact same string, such as the “B” above.
Outside the regular expression itself, such as in the replacement part of a
substitution, the special variable
is used as if it were a normal scalar variable named by the integer. So, if you
wanted to swap the first two
words of a string, for example, you could use:
s/(\S+)\s+(\S+)/$2 $1/
The right side of the substitution is really just a funny kind of double-quoted
string, which is why you
can interpolate variables there, including backreference variables. This is a
powerful concept:
interpolation (under controlled circumstances) is one of the reasons Perl is a
good text-processing
language. The other reason is the pattern matching, of course. Regular
expressions are good for picking
things apart, and interpolation is good for putting things back together again.
Perhaps there’s hope for
Humpty Dumpty after all.
Previous: 1.6 Control
Structures
1.6 Control Structures
Programming
Perl
Book
Index
Next: 1.8 List
Processing
1.8 List Processing
[ Library Home | Perl in a Nutshell | Learning Perl | Learning Perl on Win32 |
HackersThirdEye.com
Previous: 1.7 Regular
Expressions
Chapter 1
An Overview of Perl
Next: 1.9 What You Don’t
Know Won’t Hurt You (Much)
1.8 List Processing
Much earlier in this chapter, we mentioned that Perl has two main contexts,
scalar context (for dealing
with singular things) and list context (for dealing with plural things). Many
of the traditional operators
we’ve described so far have been strictly scalar in their operation. They
always take singular arguments
(or pairs of singular arguments for binary operators), and always produce a
singular result, even in a list
context. So if you write this:
@array = (1 + 2, 3 – 4, 5 * 6, 7 / 8);
you know that the list on the right side contains exactly four values, because
the ordinary math operators
always produce scalar values, even in the list context provided by the
assignment to an array.
However, other Perl operators can produce either a scalar or a list value,
depending on their context.
They just “know” whether a scalar or a list is expected of them. But how will
you know that? It turns out
to be pretty easy to figure out, once you get your mind around a few key
concepts.
First, list context has to be provided by something in the “surroundings”. In
the example above, the list
assignment provides it. If you look at the various syntax summaries scattered
throughout Chapter 2 and
Chapter 3, you’ll see various operators that are defined to take a LIST as an
argument. Those are the
operators that provide a list context. Throughout this book, LIST is used as a
specific technical term to
mean “a syntactic construct that provides a list context”. For example, if you
look up sort, you’ll find the
syntax summary:
sort LIST
That means that sort provides a list context to its arguments.
Second, at compile time, any operator that takes a LIST provides a list context
to each syntactic element
of that LIST. So every top-level operator or entity in the LIST knows that it’s
supposed to produce the
best list it knows how to produce. This means that if you say:
sort @guys, @gals, other();
then each of @guys, @gals, and other() knows that it’s supposed to produce a
list value.
Finally, at run-time, each of those LIST elements produces its list in turn,
and then (this is important) all
the separate lists are joined together, end to end, into a single list. And
that squashed-flat,
one-dimensional list is what is finally handed off to the function that wanted
a LIST in the first place. So
if @guys contains (Fred,Barney), @gals contains (Wilma,Betty), and the other()
function
returns the single-element list (Dino), then the LIST that sort sees is
(Fred,Barney,Wilma,Betty,Dino)
and the LIST that sort returns is
(Barney,Betty,Dino,Fred,Wilma)
Some operators produce lists (like keys), some consume them (like print), and
some transform lists into
other lists (like sort). Operators in the last category can be considered
filters; only, unlike in the shell, the
flow of data is from right to left, since list operators operate on their
arguments passed in from the right.
You can stack up several list operators in a row:
print reverse sort map {lc} keys %hash;
That takes the keys of %hash and returns them to the map function, which
lowercases all the keys by
applying the lc operator to each of them, and passes them to the sort function,
which sorts them, and
passes them to the reverse function, which reverses the order of the list
elements, and passes them to the
print function, which prints them.
As you can see, that’s much easier to describe in Perl than in English.
Previous: 1.7 Regular
Expressions
1.7 Regular Expressions
Programming
Perl
Book
Index
Next: 1.9 What You Don’t
Know Won’t Hurt You (Much)
1.9 What You Don’t Know
Won’t Hurt You (Much)
[ Library Home | Perl in a Nutshell | Learning Perl | Learning Perl on Win32 |
HackersThirdEye.com
Previous: 1.8 List Processing
Chapter 1
An Overview of Perl
Next: 2. The
Gory Details
1.9 What You Don’t Know Won’t Hurt You (Much)
Finally, allow us to return once more to the concept of Perl as a natural
language. Speakers of a natural
language are allowed to have differing skill levels, to speak different subsets
of the language, to learn as
they go, and generally, to put the language to good use before they know the
whole language. You don’t
know all of Perl yet, just as you don’t know all of English. But that’s
Officially Okay in Perl culture. You
can work with Perl usefully, even though we haven’t even told you how to write
your own subroutines
yet. We’ve scarcely begun to explain how to view Perl as a system management
language, or a rapid
prototyping language, or a networking language, or an object-oriented language.
We could write chapters
about some of these things. (Come to think of it, we already did.)
But in the end, you must create your own view of Perl. It’s your privilege as
an artist to inflict the pain of
creativity on yourself. We can teach you how we paint, but we can’t teach you
how you paint. There’s
More Than One Way To Do It.
Have the appropriate amount of fun.
Previous: 1.8 List Processing
1.8 List Processing
Programming
Perl
Book
Index
Next: 2. The
Gory Details
- The Gory Details
HackersThirdEye.com
Previous: 1.9 What You Don’t
Know Won’t Hurt You (Much)
Chapter 2
Next: 2.2 Built-in Data Types
- The Gory Details
Contents:
Lexical Texture
Built-in Data Types
Terms
Pattern Matching
Operators
Statements and Declarations
Subroutines
Formats
Special Variables
This chapter describes in detail the syntax and semantics of a Perl program.
Individual Perl functions are
described in Chapter 3, Functions, and certain specialized topics such as
References and Objects are
deferred to later chapters.
For the most part, this chapter is organized from small to large. That is, we
take a bottom-up approach.
The disadvantage is that you don’t necessarily get the Big Picture before
getting lost in a welter of details.
But the advantage is that you can understand the examples as we go along. (If
you’re a top-down person,
just turn the book over and read the chapter backward.)
2.1 Lexical Texture
Perl is, for the most part, a free-form language. The main exceptions to this
are format declarations and
quoted strings, because these are in some senses literals. Comments are
indicated by the # character and
extend to the end of the line.
Perl is defined in terms of the ASCII character set. However, string literals
may contain characters
outside of the ASCII character set, and the delimiters you choose for various
quoting mechanisms may
be any non-alphanumeric, non-whitespace character.
Whitespace is required only between tokens that would otherwise be confused as
a single token. All
whitespace is equivalent for this purpose. A comment counts as whitespace.
Newlines are distinguished
from spaces only within quoted strings, and in formats and certain
line-oriented forms of quoting.
One other lexical oddity is that if a line begins with = in a place where a
statement would be legal, Perl
ignores everything from that line down to the next line that says =cut. The
ignored text is assumed to be
POD, or plain old documentation. (The Perl distribution has programs that will
turn POD commentary
into manpages, LaTeX, or HTML documents.)
Previous: 1.9 What You Don’t
Know Won’t Hurt You (Much)
1.9 What You Don’t Know
Won’t Hurt You (Much)
Programming
Perl