File & Directory Manipulation


Now that the basics are out of the way, lets get into some useful things about Perl. Since it was originally designed to do text processing and the like, there are some very basic, yet very powerful file operations built into Perl. The style of these operations is more C than it is JAVA (no fancy wrapper classes to deal with), so pay close attention.

Reading a file is a very basic operation. To read a file there are a couple of steps:

Here is a basic example, we will pick this apart a little:

      #!/usr/bin/perl -w
 
      use Fcntl ':flock';

      $file = "data.txt";

      open(INFILE, $file) or die "File Not Found";

      flock(INFILE, LOCK_EX);

      while (<INFILE>) {
        print "Line: ", $_;
      }

      flock(INFILE, LOCK_UN);

      close(INFILE); 
      

The first thing that you will notice is the line use Fcntl ':flock';. This is sort of like a C #include or a JAVA import. We wish to bring the Fcntl library into play, this incidentally is a lot of things dealing with files.

Our second step is to define a filename, we will store this in a scalar. The line $file = "data.txt"; will do this for us.

The third step is our first piece of real work, we need to open the file. The line open(INFILE, $file) or die "File Not Found"; will do that for us. The INFILE part is our 'file handle', this is how we will refer to the file from now on. We wish to make sure that the file is actually there before we open it, that is what the or die "File Not Found will do. If the file doesn't exist it will print the message and exit.

flock(INFILE, LOCK_EX);, as the name suggests, is used to lock files. The first argument is the file handle, and the second is the 'type' of lock you want. Here are the common lock types:

Name Description
LOCK_SH Request a shared lock
LOCK_EX Request an exclusive lock
LOCK_UN Releases a previously requested lock.
LOCK_NB Added to LOCK_SH or LOCK_EX to make it 'non blocking'

If you are ever unsure of what lock to use, use the exclusive. It is by far the safest lock.

The while loop that is featured next reads everything in the file, until the EOF (end of file) occurs. The print statement, print "Line: ", $_; is kind of tricky though. You will notice a strange symbol, $_, in the statement. This is a short hand notation for 'last touched variable'. Since the last thing that we did is read a line from the file, that line will be stored in $_. This concept is also used for passing variables from subroutines, as well as some other key concepts. We will see it again later. After we are done reading the file, we want to be polite and release the lock; flock(INFILE, LOCK_UN);, and close it when we are all done; close(INFILE); . Easy huh? No file wrappers, catch statements, strange stream errors, etc. Just simple plug and chug programming. Perl is about as close to English as you are going to get.

Another important note is that you can read items from a file, and store them into a data structure such as an array. If you assume that the file will space out data items using a space, and each record with a newline like this:

      Record1 1 2 3
      Record2 4 5 6
      Record3 7 8 9
      

You can store these items into an array using the below program. The split(pattern, variable) function is used to split up tokens by whatever is specified, here we want to split by a space. After splitting, we have the items in an array, we can create a multidimensional array to store everything. The push function will aid us in this process:

      #!/usr/bin/perl -w
 
      use Fcntl ':flock';

      $file = "data.txt";

      open(INFILE, $file) or die "File Not Found";

      flock(INFILE, LOCK_EX);

      while (<INFILE>) {
  
        #First we need to read the data, and store it in 
        # an array.  It will need to be split

        @record = (split(/ /,$_));
  
        #Then we can use the 'push' function to add it to
        # another array, these are understood to enter
        # in order.

        push @students, [@record];

      }

      flock(INFILE, LOCK_UN);

      #You can access by the standard [][] method...

      for($x = 0; $x <= 2; $x++) {
        for($y = 0; $y <= 2; $y++) {
          print $students[$x][$y], " ";
        }
        print "\n";
      }
      print "\n";


      close(INFILE);
      

Writing a file is similar to reading, I will not explain every line of the code, but here is an example:

      #!/usr/bin/perl -w

      use Fcntl ':flock';

      $file = ">output.txt";

      open(OUTFILE, $file) or die "Writing File Not Found";

      flock(OUTFILE, LOCK_EX);

      $number1 = 10;
      while ($number1 >= 1) {
        print OUTFILE $number1--, "...";
      }
      print OUTFILE "BOOM\n";

      flock(OUTFILE, LOCK_UN);

      close(OUTFILE); 
      
      
The output will be stored in a file called output.txt, and it should look like this:
 
      10...9...8...7...6...5...4...3...2...1...BOOM   
      

The only real difference you will notice is the filename; $file = ">output.txt";. Why is that > symbol there? This is what is called a 'redirect'. Basically, if the file doesn't exist we don't want this thing to die on us. With the redirect present as the first character of the name, we can ensure that the file will be created, no matter what. What would happen if we were to use the redirect on a file that does already exist? It will overwrite the entire file. This is bad, so you may need to try this instead; $file = ">>;output.txt";. Adding the >> symbol works like an append, all items will be placed at the end of the file. Here is a handy little chart:

Name Description
< filename Allow file to be read only
> filename Allow file to be created, and overwritten (clobbered)
>> filename Allow file to be created, and appended to.
+< filename Allow file read from or written to
+> filename Allow file to be created, overwritten (clobbered), and read from
+>> filename Allow file to be created, appended to, and read from
| command The file will be able to be written into by the command on the other end of the pipe.
command | The command on the other end of the pipe can only read the file

The last two items are particularly interesting, because they involve the usage of pipes. Pipes are a communication method in *NIX and some other operating systems. Basically you can use a pipe to allow different things to interact. In the following example, I use a pipe to take the output of the ps command (a list of currently running programs), and I store this into a file. After this, I then utilize the cat command to display what I read from the file. The final step is the usage of grep to narrow down all of the output; I only wish to display the processes that are of a ps nature (i.e. what I did in the previous step). Take a look and see if you can follow:

      #!/usr/bin/perl -w

      use Fcntl ':flock';

      $file = ">>ps.txt";

      $cmd = "| ps ";

      open(OUTFILE, $cmd . $file) or die "Writing File Not Found";

      close(OUTFILE);

      $file2 = "ps.txt";

      $cmd2 = "| cat ";

      $cmd3 = " | grep ps";

      open(INFILE, $cmd2 . $file2 . $cmd3) or die "Writing File Not Found";

      close(INFILE);       
      
The output (at the time) was:
 
      2925 pts/3    00:00:00 ps
      

If you are having a hard time, try to run these commands from a command prompt, it will do basically the same thing:

      ps >& ps.txt
      
      cat ps.txt | grep ps
      

We will do more with system program in the next section. For now, lets worry about the last topic here which is directories. Reading a directory is simple, all you need to do is supply the directory name (it can be relative to where you are, or it can be absolute), and read it like a file (directories after all are just files). Here is a small example:

      #!/usr/bin/perl -w

      $dir = ".";

      opendir(INDIR, $dir) or die "Directory Not Found";

      @dir_contents = readdir(INDIR);

      closedir(INDIR); 

      foreach $file (@dir_contents) {
        print $file, "\n";
      }

      
The output (at the time I ran this) is:
      .
      .. 
      hw.pl
      data.txt
      hist.txt
      var.pl
      present
      outline.txt
      PerlTimeline.pdf
      HelloWorld.pl
      var2.pl
      out.txt
      read.pl
      write.pl
      readdir.pl       
      

The first step, as always, is to make our directory name. I have chosen to read '.', which is the 'current directory'. I could have put any name here really. Instead of doing a file open, we do opendir(INDIR, $dir) or die "Directory Not Found";. The next step is a neat shortcut, we can store the entire contents of the directory in an array, simply by using the @dir_contents = readdir(INDIR); statement. After that, we just print each element using a foreach loop.


Control    <Files>    IPC

Created By: Jason Zurawski
Last Modified: Feb. 29, 2004