Spring 2018 CISC 471/672 PROGRAMMING ASSIGNMENT 2 - LEXICAL ANALYZER ---------------------------------------------------- GRADING INSTRUCTIONS -------------------- Name:_____________________________ Date of Submission:_________ Total grade ____ - late points ____ = Final grade (out of 100)_____ Days Late:____ Due Date: March 5, 2018. (85) Correctness: Show point deductions or check below (partial credit given where applicable): Token Recognition: recognizes tokens correctly. (50: 2.5 pts each for undergrads; 2 pts each for grads) If it recognizes some, but not all of the items for a given line below, give 1 pt, otherwise 2 pts if recognizes all described on the line, or 0 if none of them. Give the name of test cases that do not go through correctly, that show that points should be taken off if necessary. Graduate and undergraduate students must do the following 18 items. ___ blanks, tabs, newlines, comments and delimiter symbols act as delimiters only. ___ reserved word tokens are properly recognized, and not recognized as ids, and case insensitive except for TRUE and FALSE ___ properly formed id's recognized: letters, digits, underscore ___ self and SELF_TYPE treated as id's, not keywords by lexer (note: string table handles same id not returned for 2 different names) (note: string table handles same id returned for same name in different places) ___ legal symbols recognized properly: []():<-@.,{}+-*/~<<= = ___ legal string constants recognized properly: ___ strings: enclosed in double quotes. ___ strings: \c denotes character c with exception of \b,\,\n,\f. ___ strings: nonescaped \n cannot appear in string ___ strings: EOF cannot be in string ___ strings: null \0 cannot be in string ___ string that looks like "" is a null string, and is legal ___ legal integer constants recognized properly (nonempty string of digits, nonnegative) ___ upper and lower case indistinguishable in reserved words ___ case sensitivity in id's ___ no signed integers accepted ___ TYPESYM's have uppercase first letter ___ OBJECTSYM's have lowercase first letter Only Graduate students must do the following 8 items. (2 pts each for grads) ___ -- comments handled correctly until newline or EOF. ___ nonnested (* *) comments handled correctly - does not pass legal comment back as string of tokens ___ nested (* *) treated correctly as nested comment, not returning errors or tokens. ___ keywords true and false accepted only with first letter lowercase; other letter can be either lower or upper. (unlimited number of identifiers and constants allowed in a program.) (done for them by the use of the string table) Token Attribute Handling: (15 : 5 pts each) Again, give partial credit based on percentage of the kinds of items on each line that are handled properly. ___ correct integer values are returned for integer constants. ___ each id is added to the string table with unique index (pointer) put in yylval, by them making an add_table call. ___ correct string constant referenced by yylval - should not include quotes, but ok if they do. Error Detection and Recovery: (18 : grads 2 pts each, undergrad 4.5 pts each) Give partial credit if partially working, works sometimes and not others. Both graduate and undergraduate students must do the following 5 items. ___ integer overflow is detected without scanner bombing (done in a general way with reasonable MAXINT defined) ___ invalid character on input (does not match any pattern) ___ illegal (* *) comments result in emitting error message: recovery could vary here. important to emit error message and continue. - end of file reached, not matched. ___ illegal strings ___ descriptive error message printed with line numbers Only graduate students must do the following 4 items. ___ intelligent recovery - continue to scan rest of input, attempting to pass correct tokens in some reasonable way ___ can handle more than one error in the code and continue to eof ___ string 'xxx' is illegal as single quotes are illegal ___ 1234like could be handled as two legal tokens, int followed by id, or an error. Either is acceptable during scanning. It will get caught during parsing. Any other errors found? Any fancy recovery? If so, give extra 2 points if any extras in error detection or recovery. (5) Program Structure ___ Readable, concise, simple structure. ___ Most work done in regular expressions, not all in the action code. (5) Efficiency ___ Efficiency of the overall solution ___ Efficiency in the handling of individual tasks (string table lookup/insert) (5) Documentation ___ Internal documentation (comments throughout spec: explain unusual reg exprs and start condition use) ___ External documentation in README Any other restrictions of the program not mentioned above? Any additional features, mentioned in the README file? General Comments: