Perl Project
Building Code Analysis and Instrumentation Tools and Trace Analysis Tools

CISC 470/670 - Programming Languages
Fall 2000

DUE DATES:
	Part 1: Code Analysis and Instrumentation Tool (Originally Due 9/12/00 at start of class, now due 9/14/00)
	Part 2: Trace Generation and Analysis Tool (Due 9/19/00 at start of class)

NOTE: This is a group project.  You will be assigned to a group to work on
this project.  To promote everyone's participation in the group, 
the project will be graded,
and then each individual's grade will be computed by weighting the project
grade by a percentage that is determined to reflect your individual
effort on the project. 
The weight for your individual effort on the project will be 
determined through peer and
self reviews.
If your reviews indicate that you did not participate at all on the
project meetings, discussions, codings, etc, then your weight will be 0,
and your grade on the project will be 0.  If your reviews indicate that you
participated actively in the meetings and coding, etc, then your weight 
will
be 100, and your grade on the project will be the project grade itself.
Similarly, it is possible to get a percentage between 0 and 100, depending on
your individual effort.
Each member of the group can contribute in a variety of ways, but each should
be involved in some part of the coding in order to gain the Perl experience.
This should be kept in mind when the responsibilities are divided among
the group.

PURPOSE: This project has several different purposes.
  - to gain some experience writing programs in a scripting language, Perl 
  - to apply regular expressions to build a useful tool related to programming languages
  - to gain the experience of building code analysis and instrumentation
tools and execution trace analysis tools
  - to gather and examine data on real object-oriented programs to gain
    some insight into what programmers really do in this paradigm 

MAJOR TASKS:
The project is divided into two pieces of software.

	1. The Code Analysis and Instrumentation Tool
	
	2. The Program Trace Generation and Analyzer 

TARGET PROGRAMS FOR ANALYSIS:
You may either target Java or C++ programs for this analysis.  
We will supply some programs for you to analyze, in addition to ones
you may be able to dig up from the internet.

Code Analysis and Instrumentation Tool.

Your Code Analysis and Instrumentation Tool (let's call it CAN_IT for short) should have a switch on the command line that indicates whether the user wants to perform (1) static analysis, (2) code instrumentation for dynamic analysis, or (3) both. CAN_IT should then act accordingly. CAN_IT should be able to work with a set of files that make up the application, but need not analyze any libraries called by the user program. The static analyzer component counts the number of occurrences of syntactic constructions from the list below: (1) lines of code (This is usually called LOC in the literature, and does not contain comments and blank lines.) (2) number of classes (3) number of occurrences of explicit exception handling coded by the user (4) number of loops (in total, and number of each of the different kinds of loops in the language; e.g., number of for's, number of while's,...) (5) number of conditionals (6) number of method calls (7) number of instances of print statements (8) number of comment lines (9) number of variables declared of primitive types (int, boolean,...) (10) number of occurrences of the new construct All of these counts are called static because it is only an indication of how many exist in the program, not how many actually get executed on a given execution of the program. Your task for static analysis: Undergraduate groups should implement their CAN_IT to count the items numbered 1-5 above, while the graduate groups should implement theirs to count the items numbered 1, 6-10 above. A nicely formatted file containing a report of the final values for each counter (nicely labeled) should be produced by the static analyzer. Extra Credit: Also count the following:
(1) number of calls to user-defined routines versus library routines
(2) number of potentially polymorphic calls
(3) number of classes that are extended from other classes (can look for a:b in C++ or extended keywords in Java.

The code instrumentation part of CAN_IT is responsible for inserting counter statements into the code to create an instrumented program that will be compiled and run to obtain dynamic statistics. Note that the code instrumentation statements should not be included in the static analysis statistics, so the static analysis should occur first on the original program. While CAN_IT could insert counters for every instruction to find out which instruction was executed most frequently, we are not interested in quite that much detail. The specific counters to be inserted include: (1) a counter at the start of each method to count how many times that method was executed. (2) a counter immediately before each method call to count how many times that method call was executed. (3) counters at each branch of each conditional to count how many times the true branch is taken versus the false branch. In order to create a readable and informative report with the final values of the various counters, each counter that is created by your tool needs to have an associated line number or name, indicating the method, method call, or conditional which it is profiling. You should use your judgement in what to maintain for each counter and how to produce a nicely formatted report of the profiling information. CAN_IT should insert print statements to cause the final results of each counter to be printed to a separate profile file, separate from the program's normal output. You might want to look at the output of gprof as an example of such a report.

Program Trace Generator and Analyzer.

There are some times when it is helpful to obtain a trace of part or all of the execution of the program. Different kinds of events can be traced. The trace gives information about the ordering of the events of interest, in addition to the frequency of the individual events. Your task is to enhance CAN_IT to have a switch for tracing. The switch indicates a single method to be traced each time it is called. When tracing is turn on, CAN_IT should do the following in addition to its other actions: 1. Insert print's into the program to trace all method calls and returns of the program in order to produce a trace file that contains the dynamic call chain. 2. Insert print's into the program to trace the execution of the method that was indicated by the switch, say method SUB. The trace should output to a separate trace file a trace that contains for each call to SUB, a line saying "enter SUB", the line number of every branch instruction with T or F to indicate whether it was evaluated to true or false, and then, an "exit SUB" line. A single trace file for SUB and a trace file for the method call chain should be produced in addition to the program's normal ouput and other dynamic analysis results. Extra Credit: Create a trace analyzer that scans the call chain trace and detects any recursion in the trace. Recursion would cause a call chain that contains the same name more than once before a return from the repeating method, e.g., A->B->C->D->return D->B->... has recursion involving B. A report that shows any recursive chains (starting from the repeating method) should be created. If none exist, then the report should indicate no recursion executed.

What to Hand in:

You may hand in your assignment by either setting up a password protected web site for the TA to use (and email the password to the TA and instructor) or emailing the TA with a set of attachments. For the first due date, you need to include the following: (1) the perl program for CAN_IT (2) a README file that explains what your tool does, how to invoke it, any other nuances in order to use it properly, and your group members' names and contributions listed. (3) a table of your static analysis results and a table of your dynamic analysis results for your set of programs, at least 5 different programs run and analyzed. Your tables should show the results from your analyzer, with each program as a separate line of the table. (4) a scripted run of your tool demonstrating how it works for one program For the second due date, you need to include the following: (1) your updated CAN_IT (2) a README file that explains your changes, how to invoke it, any other nuances in order to use it properly, and your group members' names and contributions listed. (3) a scripted run demonstrating how it works for one program
Last Change: September 4, 2000 / pollock@cis.udel.edu