Lecture 6 --- Administrative - Todays office hour is moved to Friday (Sept 27) the same time. --- Top-down parsers may need to backtrack when they select the wrong production. To avoid backtrack, look ahead in the input stream is used. Predictive parsing Example grammar 1 ::= + | - | ::= number | id grammar 2 = ::= + | - | epsilon ::= number | id Two grammars accept the same language, and are left-recursion free. What is the difference? Grammar 1 suffers from a problem -- common prefixes -- look ahead one token can not decide which production to derive. Like left-recursion, common prefixes can be removed mechanically. A standard technique is Left-factoring: A ::= | | transforms to A ::= B B ::= | | which is now a LL(1) grammar. That is how we transform grammar 1 to grammer 2. Coveat: Eliminating left recursion and left-factoring cannot guarantee to transform an arbitrary cfg to a LL(1) grammar. Infinitely many CFGs are non-LL(1) grammar. Bottom-up parsing (construct a rightmost derivation, in reverse) Example grammar ::= ae ::= bc | b ::= d string = abbcde parsed as => a e => ade => a bcde => abbcde (this is a rightmost derivation, in reverse) stack input action ------------------------- $ abbcde shift $a bbcde shift $ab bcde reduce $a bcde shift $a b cde shift $a bc de reduce $a de shift $a d e reduce $a e shift $ae reduce $ accept Conflicts: - shift/reduce conflicts (caused by ambiguous grammar) o modify grammar o resolve in favor of shifting - reduce/reduce conflicts o modify grammar o look ahead LR grammars are the most general grammars parsable by a non-backtracking, shift-reduce parser Virtually all cf languages can be expressed in LR(1) grammar Grammar classes Unambiguous grammars: LL(0) < LR(0) < SLR < LALR(1) < LR(1) < ... < LR(k) V V LL(0) < LL(1) < ... < LL(k) LR is more powerful than LL: LL(k) has to predict which production to use, having seen just the first k tokens of the right-hand side; LR(k) postpone the decision until see input tokens corresponding to the entire right-hand side of the production (and k more input tokens beyond). Context-sensitive grammars ab ::= .... c d ::= .... ::= ... ::= ... Chomsky hierarchy regular expressions <=> Finite Automata context-free grammars <=> Push-down automata context-sensitive grammars <=> Turing machines Syntax vs. Semantics