CISC 672 Advanced Compiler Construction
Frequently Asked Questions

(Spring 2004)
General questions

Q1. What is the purpose of this page?

A1. This page is a list of the most frequently asked questions (hence the name FAQ) received by the instructor/TA for the assignments. Before sending us e-mail with a question regarding specification or implementation details of the assignment, you should check this FAQ page to see if the question has already been answered. If the question is not on the page, or if the explanation on the page is unclear, feel free to e-mail me with the question.
Questions on Assignment PA1

Question: 1. stack2.test has an expression: 12 2 + 3 40 * - ad Contents of stack before this command are 4 10. So after at the time of execution of a, stack contains 4 10 12 2 3 40 * - To execute this expression, we need to perform the evaluation recursively. Thus 'a' is not same as repeatedly performing 'e'.

Answer: the 'a' command is not exactly repeatedly performing 'e'. It is an evaluation of the whole stack. 'a' stands for 'all'. It is a recursive evaluation.

Question: 2. This brings me to my second question: Is 'e' command supposed to accept and execute a statement like 2 3 4 + *. I mean do we need to recursively evaluate expressions while evaluating an 'e' command too? Because the sepc just says, pop nest 2 integers and push the result of the operation performed

Answer: 'e' would need to have 's' for swap to evaluate this expression. 'e' is not expected to evaluate this expression.

Question: 3. The result of example of stack2.test (written in 1), brings a negative value. So are negative values allowed in results of the operation?

Answer: Any integer could be on the stack, negative or positive or 0.

Question: 4. Can I use temporary another stack for display (because the spec says that stack machine is a machine which uses a single stack for storage).

Answer: No, you should have only one stack working.

Question: I am confused by the sentence "The newline signals that the interpreter should execute the command on the top of the stack" in the "Program Specification" section. From my understanding there are no commands on the stack, but the stack is manipulated by them. My current implementation reads the entered string (command stream) from left to right and executes each command immediately, but this does not conform with this sentence. e.g.:

> 1 2+ed 3 [ push(1), push(2), push(+), eval, display ]

Pushing e,a,d,x on the stack first, and execute only the top one on a newline is inconsistent with the language specification, as every token is referred to as "command", and it makes no sense to push "push(int)" on the stack. e.g.:

Answer: You are correct. The spec is misleading. I am removing that line from the spec. You should execute the commands as you go.

The 672 materials are on stimpy.eecis.udel.edu. /usa/pollock/public/cool02

One question I just got: I had a doubt in project 1. Can the integers be of more than 1 digit numbers? I mean, should our interpreter be able to execute a command like 12+15s? Because after swapping it becomes 1215+ and then I need some other way to differentiate the 2 operands.

Answer: yes they can be more than 1 digit. You need to think about how you are representing the numbers on the stack so swapping works out ok in this case. Numbers should not be kept as strings. Try also to use oop programming style.

Question: 1. Do I need to take care of operator precedence in the stack machine? Answer: Operator precedence is taken care of by the semantics of each command as given, ie, how they do their operation based on top of stack.

Question: 2. By multiple instructions in a single line you mean instructions like 34+56sded or 34+56seed 1+2seed?

Answer: Each command is considered an instruction. So, the first example you give is what is meant by multiple instructions. Look at stack.test in the PA1 directory for an example of one command/instruction per line.

Question: Should our programs be able to handle compound statements of the form abc+*, i.e. a*(b+c) in infix?

Answer: e does the evaluation based on what is on top of the stack. Just follow the semantics of each command as it is given in the table.

Question: That actually brings me to my second question: what forms of input are allowed? Is it intended that we use infix notation with the 's' command to swap the ordering as shown on the handout, or could input look like: 123 456 + ...using spaces to separate the numbers and entering the expression directly in postfix notation?

Answer: Yes, this is legal. Either postfix or infix could be used, with proper use of s for swap when needed so things are in the right order for e. You can take a look at stack2.test I put in the PA1 directory as a second example. Spaces are delimiters between numbers.

Questions on Assignment PA2

We need another change to the permissions in order for me to be able to grade your pa2 submissions. Please execute the following command:

setfacl -m user:bloodgoo:r-x,mask:r-x directorypathtoyourproject

where directorypathtoyourproject will be something like:

~yourusername/672_submit_PA2

or

~yourusername/672_submit_PA2J if you did the java version.

For example, if a student with username smith did the project using java, he would execute: setfacl -m user:bloodgoo:r-x,mask:r-x ~smith/672_submit_PA2J.

N.B.: in the above example, the period is not part of the command (it's just the end of my sentence).

In order for the TA to access your pa2 assignments that you submit using the turnin script it will be necessary for you to give me permission to at least cd into your home directory. One way to accomplish this would be to use the chmod command by doing the following. cd to your home directory and then type:

chmod 711 .

at the prompt.

Question: Either yyunput() or unput() is not defined in #include files. I guess that library for C++ is not listed in #include files or C++ uses another function to do so. If there is any other function can also put back characters onto the input stream, could you please give any hints about it?

Answer:

I think unput is predefined for C++. It would not be in include files because it is predefined like yytext and yyleng. Doing google on flex and C++, I found http://wwwwbs.cs.tu-berlin.de/user-taipan/kraxel/gnuinfo/flex/C++.html which shows an example C++ flex spec using unput. There is also yymore and yyless to use.

Question: Then I need clarification to case insensitivity of the scanner in PA2. Are mixed case keywords accepted? For example "CaSe" instead of just "CASE" and "case"? Hope not, it is not easy to get flex support them, if some parts of the specification still requires case sensitivity.

Answer: The manual says the keywords are case insensitive. That means that any of Case, CASE, CAse, CaSE,... are legal and mean the case keyword. This is handled easily by thinking about [Cc] as legal for the first symbol, etc.

Question: If the lexer encounters a string containing \c, where c is any character other than b, t, n, or f, should we strip the \ from the string before returning the string, or does some other phase of the compiler handle this?

Answer: The lexer is typically in charge of sending back the correct lexemes. So, the lexer should strip off the \.

Question: Should the lexer should strip the " symbols from around strings?

Answer: Yes, it should strip off the " to save space since it is not part of the string constant.

There is an executable lexer for you to check your output against at pollock/public/cool02/bin/lexer-test.

I created a directory ~pollock/672tests, which has subdirectories for each part of the compiler you are building. If you want to share your test cases with the whole class, please put them in the appropriate directory there. Let me know if permissions don't allow this. For the scanner, ~pollock/672tests/scanner is the directory to dump test cases.

Question: I found LET_STMT is defined with all other tokens. But since it looks like a syntax problem, I am not sure it is expected to be handled in this project. Could you please clarify whether we need to deal with it now, and if so what are expected to do for it?

Answer: No, you do not have to do anything with LET_STMT in the lexer.

Question: How does \0 the null character end up in a string in a test case?

Answer: Thanks to Merten, here is the correct answer! The intended meaning is, that it is not allowed for a literal ASCII NUL to appear in a string. So you'll need a hexeditor to write a test case for that. This complies with the behaviour of "lexer-test" that I observed with a real, non-escaped ASCII NUL:

bash-2.05$ ~pollock/public/cool02/bin/lexer-test test_string.cl #name "test_string.cl" #1 ERROR "String contains null character."

Question: Is there any limit on the size of integers in cool02? I was looking at prev years FAQs where you have said that we can keep it as 2^32. But, when I assigned a value of 1000000000000000 to an integer, coolc compiler did not give any error message or warning. However while running, it displayed some other number. So should we truncate the int at scanning itself, or keep it as it is?

Answer: You can choose a reasonable limit, truncate the int as you scan it, emit the error message, and move on.

Question: It seems the executable test lexer does not perform any recovery. Is there some way I can test my results after recovery?

Recovery is a very subjective fuzzy area. To test yours, just look at what you think is reasonable to do after you find each error. The goal is to detect as many errors as you can in one compile, with useful error messages whenever possible. You can compare your output to others in the class also.

Choose any reasonable limit for the integer limit.

Question:

Should I perform some recovery from errors like if a newline is encountered in string then truncate the string at that point. In such case, what should my program return. If I return an ERROR, is there any way I can store the truncated string?

Answer: Recovery is subjective. But, you could return the string as a string constant with the lexeme being the string you got until the newline. THen, start scanning again after the newline, and return whatever that matches, posibly another string constant because it is legal from then on. An alternative is to scan beyond the newline and grab the whole thing, deleting thenewline as one string constant. Either case, an error needs to be emitted.

Question:

In section 8 of the handout for PA2, the command for submitting our work is not there. I wish I could say I expect to need it soon - but that will not be the case!

Answer:

The turnin instructions are given in the readme and makefile files in the files you get. There is a turnin script that gets run.
Questions on PA3

Question: We have largely finished our parser, though we have a question concerning the output it generates. If we compare the output of your sample parser and ours, we see different line numbers only. It looked to us like the numbers after the #'s are line numbers of the input. We were wondering what affect, if any, this has on the later stages of the compiler. We compiled code using our parser and everything worked (the generated executable code ran normally). Are these differences important? Also, do you have any idea what would cause them?

Answer: This is fine to be different in line numbers.

Question and Answer: The Let statement: Several people are still asking about this.

let a:int, b:int in a<_a)

can be handled as:
let letlist
where letlist is recursively defined as either objectid : typeid optional_init in expr or objectid : typeid optional_init , letlist
The let(symbol,symbol,expr,expr) works fine with this.

When you create new nonterminals, you declare them with %type.

Comment: Take a look at your dfa you are creating. YOu have to do this. It is sooo cool. Use the debug flag on yacc/bison/jcup to see that file. Check out those items and conflicts and how they look! Look familiar! The flag is -v on yacc.

Another hint on debugging: You can set the start symbol to grammar symbols lower in the hierarchy and test parts of your grammar at a time, with test files that only cover those parts of the grammar. Sometimes this helps to focus the testing.

The groups should turn in one project, one turnin per group. Also, each member of the groups will fill out a peer evaluation form as seen on the course website, in class. Each project will be graded as a single project, then the individual grades will be made as a percentage of that grade, based on feedback to make sure that everyone is participating and balancing the load among the group members. A nicely balanced group will both receive the same grade, the original grade of the project.

Question: For this first part (due March 12), you say to "Your parser should accept correct Cool programs, and print error messages for syntactically erroneous Cool programs." For the "printing error messages" portion, I notice the parser already does that to an extent. Are we going to actually have to do anything for error detection in this first part?

Answer: You just need to be sure you are NOT accepting syntactically bad programs as correct ones, and you are not REJECTING good programs. Thus, be sure the grammar spec fits the language as defined.

Question: If we create new non-terminals, do we need to declare their types (as with the normal non-terminals). I'm guessing that we do, even though our parser works fine (as far as this first part is supposed to) without declaring types for these new non-terminals. I'd appreciate any help.

Answer: Yes, you will need types declared for each nonterminal when you actually start the tree building semantic actions.

Question: I am having some confusions about the procesisng of the let statement.

class Main { main() : Int { (let a: Int, b: Int in a<-a) }; };

2. I am not finding the right function to call for processing the let statement. The let function in cool-tree.h has following prototype

Expression let(Symbol, Symbol, Expression, Expression);

But how to use this function prototype when we declare multiple variables? I can not find any prototype that shows a list of let initializations.

Answer: You need to have the grammar rules set up for a let_list to treat the above as an optional_initialization, let_list so the recursion will take care of being about to call let(objectid value, typeid value,optionalinitvalue,expr value) where value is the semantic value from yylval in each case (ie on the parallel semantic stack).

Question: 3. Should we use the same function prototype (neg (Expression)) for both ~ and not operators?

Answer: neg is used for ~ and comp is used for not.

Answer: Question: for the first deadline, do we need to turn in just the grammar without any semantic actions?

Answer: The last section of the assignment spec explains the two deadlines. You do not need to have any semantic actions for the first deadline.

Questions on Parsing Homework

Question: What is meant by right associativity for the ^ operator?

Answer: Just ignore this remark. The operator is a unary operator, not binary.

Replacement of last part of last question: Please reprint the assignment sheet to get the NEW, MUCH SIMPLER question for the last part of the last question on LR and LALR parsing. The original one had a ridiculous number of states.

Questions on PA4

Comment: The example semant and mysemant are in pollock/public/cool02/bin now.

Question: 1) For the semantic analyzer, why exactly is there a need for links from the leaves of the AST to the symbol table entries? (we haven't quite run into a reason yet, since the leaves of the AST can store the type information for everything)

Answer: You just need a way to get to any info you would like to have about a user-defined name, from any leaf that has that name. It can be a link from the leaves to the appropriate symbol table entry, or put the info into the ast nodes. It is more space efficient to put in one symbol table entry with links as you would be duplicating the same info about each occurrence of the same name declaration otherwise in the ast each time the name is used in expressions.

Question: 2) The due date for the project has been pushed back to next Friday, 4/23, right?

Answer: Yes, it is friday midnight, 4/23.

Question: Could you please let me know what is the type of "no_expr"? If the block is empty, I have to know the type of "no_expr" to infer the type of the block.

Answer: no_expr is no_expr_class defined in cool-tree.h/cc. Question: What is prim_slot for in installbasicclasses?

Answer: prim_slot is a class known to the code generator. It stands for primitive type.

Question:
About symtab.h, README file says not to modify symtab.h, and the assignment hand out for PA4 says we are free to modify symtab.h. Which is right?

Answer: You may change this file, by addition as long as you don't change the interface of the current methods.

Question:
I am trying to use the STL map and list containers, but the globals defined in the STL are clashing with the globals in the AST and list.h. For instance there is a list constructor called plus, and an STL function called plus. It's not feasible to go and change the names of variables in the tree. The only solution I have come up with is using namespaces to hide the globals of PA4 from the STL (I haven't even gotten this to work yet), or to not use the STL. Is there any other way around this that you know?

Answer: Anyone have a suggestion for this? This is the first time that this has come up on this project.

Questions on PA5

Question: What is the load immediate value just before abort instructions in the assembly code?

Answer: The constant is the line number.

Question: Is cool using call by refernce or call by value parameter passing?

ANswer: The manual says "formals are bound to actuals" and the method is evaluated. The cool compiler implements call by refernce by just pushing references to the objects onto the stack frame, not making copies of new objects. So, it is implementing call by reference. You should do the same. --------- Here are some hints on variable and method bindings for generating code: // VarBinding and MethodBinding methods // // From the point of view of code generation, there are five distinct // classes of names in a method: // // locals are stored in the temporary area of the stack frame // formal parameters are stored in the actuals are of the frame // self is in the SELF register // attribute is at an offset from the address given by SELF // method address is at an offset from the dispatch table // // See the discussion about frame layout under function_prologue/epilogue. // For the first four categories, there are distinct conventions for generating // code for allocation, reference, and update. For methods, each method name // refers to a method in some ancestor class. // // VarBinding is the base class of three derived classes: // // AttributeBinding Records the offset of an attribute in the object. // // LocalBinding Records the offset from the frame pointer of // a local variable or formal parameter. Because // locals and formals are layed out in different // areas relative to the frame pointer, there are // separate methods for creating local and formal // bindings. // // SelfBinding The self object. // // A VarBinding has virtual functions for generating code for referencing and // updating variables of each kind. // // // MethodBindings are pairs consisting of a method's name and the class in // which it is defined. This information is needed for each method of a // class to define the class' dispatch table. //

Question: How to test the output of my code generator? Its different from the one the test code generator outputs. By just running it, I might miss some errors that might be in the code. Is there some direct way of doing so?

Answer: You are too ambitious! Just kidding. Without going crazy with the details of the assembler code checking, the best way to test is: - run the generated program and be happy it gives the same output as the cool compiler's code does when it is run - insert more printing of variables etc in the cool program (input to your compiler) so you get more output when the generated code is run - compare your output code to the cool output and other group's output to see whether it is similar in nature for specific constructs - it helps alot if you output a comment in your assembly code before outputing the code for each construct so you can easily identify which construct caused which assembly code to be output. you may leave these comments in your final output. It makes debugging a lot easier for you also. So, the testing is mostly based on input/ouput of the executed, generated code, so lots of cool programs for different constructs is the best way to test as thoroughly as possible. trade test cases again.

Question: I recall you mentioning 3 places to store things, and I wanted to confirm that you were talking about the heap, the stack, and the static storage area. Is that correct?

Answer: Yes, they are the three possible areas: the static data area is after the .data, referenced by labels. The heap is accessed by offsets from fp register, and the heap is referenced as references to objects allocated space through the Object.copy() call, with offsets to the correct fields within that object's prototype layout. Check out traphandler in /usa/pollock/public/cool02/lib to see how it works.

Question: I'm wondering about the coolc compiler code created for isvoid. For objects other than Int, Bool, and String, it simply checks to see if the address of the object is 0. I thought that a void object was what you got when you created an object with new, and didn't initialize it (section 5.1 of the Cool Reference Manual). Since new really does create an object, which doesn't have an address of 0, how can the coolc code ever conclude that an object was void? Obviously there's something wrong with my understanding, but I don't know what.

A void value is used as the default initialization for variables with no initialization supplied by the user. This is referring to the declaration of a variable, not the new of an object.

Question: I have another question. If I change the link in my directory for semant, to point to my semantic analyzer from PA4, then that is the semantic analyzer used by mycoolc. But I don't see how I can get access to the scopes I defined in PA4, since I'm not actually linking my PA5 code with the PA4 code. How do I get access to the scope I created in semant for PA4, in my PA5 code? I'm thinking that, since the only way that semant communicates with the code generator is through the AST, then I have to put pointers to the scope info in the AST. This still isn't clear to me though, because I thought that semant printed the AST to a pipe that was read by the code generator. That would mean that, if the pointers to the scope info aren't printed out, then the code generator still will not see them.

Question: You are welcome to use your own frontends for your code generators or the cool frontend, that is, all the rest of the phases. We just need to have all of the files in order to compile, link, and run correctly for grading. It will help if you are available at least via email during the week of the grading, next week. (which you should all be given the date of the final!) Anyways, you can handle scoping in the code generator in a number of ways. 1. you can basically build tables similar to your symbol table as you generate code during code generation, which allows your code generator to work with the cool front end. That is actually what the cool code generator does. 2. you can have your semantic analyzer store enough info in the ast nodes of the declarations of variables and have links from uses of variables to those declaration nodes, so that the code generato gets the direct link to the right declaration, where all info about type and now storage (where is the offset of that variable and is it on the stack or heap?...) so all info is stored in the ast and accessible from each ref of the ast 3. you can have pointers from declaration nodes of the ast to symbol table entry info and links directly to these entries from the refs in the ast to that variable 4. you could keep your symbol table around, which is more difficult in our case with the way the phases are piped together.