Toward a Morphosyntactic User Model for Language Analysis and Generation: A PhD Proposal Lisa N. Michaud michaud@cis.udel.edu Computer and Information Sciences Department University of Delaware Newark, DE 19716 September 9, 1999 Abstract This proposal paper is being presented in partial fulfillment of the Ph.D. requirements of the Department of Computer and Information Sciences at the University of Delaware. In this paper, I discuss a user modeling architecture for ICICLE, a natural language system intended for use as a writing tutor for deaf learners of written English. This proposed design, intended to model dynamic aspects of a learner over the passage of time, the acquisition of new knowledge, and multiple sessions with the system, includes components to track the history of interaction with a given user as well as a very complex, dynamic model of user interlanguage grammar and domain knowledge. It has been based on research in language acquisition and in the acquisition of cognitive skills. The focus of the work described in this proposal is the development of the model of interlanguage status, which will be used in the analysis of user language production and in the generation of usertailored explanations. Contents 1 Introduction 3 1.1 The ICICLE System: Motivation and Goals . . . . 3 1.2 The User Model: A Proposal . . . . . . . 5 1.2.1 The Demand for a Model . . . . . 5 1.2.2 Components of the Model . . . . . 6 1.3 Guide to this Proposal . . . . . . 7 2 Related Work 8 2.1 Early Explanation Systems . . . . . . . . 8 2.1.1 XPLAIN . . . . . . . . . 8 2.1.2 TEXT . . . . . . . . . . . 9 2.1.3 EES . . . . . . . 10 2.1.4 Discussion . . . . . . . . . 11 2.2 Toward User Modeling in Explanation Systems . . . . . . . 11 2.2.1 TAILOR . . . . . . . . . 12 2.2.2 Menotutor . . . . . . . . 13 2.2.3 EDGE . . . . . . . . . . . 14 2.2.4 Discussion . . . . . . . . . 16 2.3 ComputerAssisted Language Learning . . . . . . . 17 2.3.1 HyperTutor . . . . . . . . 18 2.3.2 Mr. Collins . . . . . . . . 18 2.3.3 German Tutor . . . . . . 19 2.3.4 Discussion . . . . . . . . . 19 2.4 Summary . . . . . . . . 20 3 ICICLE System Overview 21 3.1 Architecture . . . . . . . 21 3.1.1 Error Identification . . . . . . . . . 21 3.1.2 Response Generation . . . . . . . . 22 3.1.3 The User Model . . . . . . . . . . 23 3.1.4 The Domain Knowledge Base . . . . . . . . 23 3.1.5 The User Interface . . . . . . . . . 24 3.2 Motivation . . . . . . . 24 3.2.1 A Cyclic Approach . . . . . . . . . 24 3.2.2 Teaching a Second Language . . . . . . . . 25 3.3 Implementation Status . . . . . . 26 4 Text Generation in ICICLE 27 4.1 Planner Overview . . . . . . . . . 27 4.1.1 Content . . . . . . . . . . 28 4.1.2 Method . . . . . . . . . . 28 4.1.3 Form . . . . . . . 29 4.1.4 History . . . . . . . . . . 30 4.1.5 Manner . . . . . . . . . . 31 4.2 Operationalizing a MultiPhasic Text Planner . . . 31 4.2.1 Method: a Brief Sketch . . . . . . 34 4.2.2 Form . . . . . . . 35 4.2.3 History: a Revision Approach . . . . . . . . 37 4.2.4 Manner . . . . . . . . . . 40 4.3 Realizing the System Response . . . . . . 40 4.3.1 Comprehensible Input . . . . . . . 40 4.3.2 Using FUF . . . . . . . . 41 4.4 Presenting the Explanation to the User . . . . . . 42 4.5 Recovering from Failed Explanations . . . . . . . . 42 5 Proposal: A User Model for ICICLE 45 5.1 Reviewing the Demands on the Model . . . . . . . 45 5.2 Modeling Second Language Acquisition . . . . . . 46 5.2.1 Interlanguage . . . . . . . 47 5.2.2 Focusing on the Frontier of Acquisition: the ZPD . . 48 5.2.3 Toward Modeling the Interlanguage . . . . 49 5.2.4 SLALOM: A Proposed Model Architecture . . . . . 51 5.3 Modeling Explicit Language Knowledge . . . . . . 52 5.4 The History Models . . . . . . . 53 5.5 Representing a Changing User . . . . . . . 55 5.5.1 Initialization . . . . . . . 56 5.5.2 Retrieving the Information . . . . . . . . . 58 5.5.3 Updating . . . . . . . . . 59 6 Summary and Future Directions 61 6.1 Completing the User Knowledge Model Architecture . . . . 61 6.2 Implementation Goals . . . . . . 62 6.2.1 Error Identification Using the Model . . . . 63 6.2.2 Knowledge Model Updating after Text Analysis . . . 63 6.2.3 Pruning the Error List . . . . . . . 64 6.2.4 Response Planning . . . . . . . . . 64 6.3 Evaluation . . . . . . . . 65 6.4 Conclusion . . . . . . . 66 6.5 Acknowledgments . . . . . . . . . 66 Chapter 1 Introduction Approaches to explanation planning and generation in natural language systems have generally moved from an origin in simple, highlyrestrictive techniques to those with greater flexibility in accommodating the context of the generation activity. Generating explanations which are sensitive to their context has been a goal explicitly or implicitly and to varying extents in many systems, but while many text generation systems define "context" as the preceding dialogue alone, in this work I prefer to see the context as encompassing a much broader scope, including: concepts in the domain which can be compared to the topic at hand; the user's skills in the domain; his or her knowledge about domain topics and their supporting concepts (important if the system wishes to select the depth of its explanation or to generate additional explanatory material at need); and the suitability of different tutorial techniques to the strengths of the user. While the relationships of concepts in the domain can be assumed to be static, the other aspects of this redefined context are dynamic (see Figure 1.1) and they form an everchanging atmosphere which must be taken into account when generating explanations if the result is to be maximally effective with this particular user --- and because these dynamic context elements are all artifacts of the user, to be aware of them the system must model the user and account for how he or she changes over time. This work addresses the user modeling issues entailed by the ICICLE system, a natural language system under development which uses both language analysis and language generation to tutor users on their English writing skills. I will focus on the following topics with respect to ICICLE's user modeling: what must be modeled, how it will be modeled, why it can be modeled that way, and where I intend to take this design in the scope of my dissertation work. 1.1 The ICICLE System: Motivation and Goals The name ICICLE represents "Interactive Computer Identification and Correction of Language Errors" and is the name of an intelligent tutoring system currently being developed at the University of Delaware (McCoy and Masterman (Michaud), 1997; Michaud and McCoy, 1998; Schneider and McCoy, 1998; Michaud and McCoy, 1999). The system's primary goal is to employ natural language processing and generation to tutor deaf students on their written English. Of paramount importance to this goal is the correct analysis of the source and nature of usergenerated language errors and the production of tutorial feedback to student performance which is both correct and individualized, taking into account the language knowledge, proficiency, and learning style of the student, as well as the context of previous explanations and related concepts in the domain. ICICLE's interaction with the user takes place primarily through a cycle of user input and system response. The cycle begins when a user submits a piece of writing to review by the system. The system then performs an analysis on this writing, determines its grammatical errors, and constructs a response in the form of tutorial feedback. This feedback is aimed toward making the student aware of the nature of the errors found in the writing and toward giving him or her the information needed to correct them. When the student makes those corrections and/or other revisions to the piece, it is resubmitted for analysis and the cycle begins again. As ICICLE is intended to be used by an individual over time and across many pieces of writing, the cycle will be repeated many times. Figure 1.1: Elements of context. Since ASL is a distinct and vastly different language from English (Baker and Cokely, 1980), we view the acquisition of written English skills to be a task in second language acquisition for these learners (Michaud and McCoy, 1998). While providing this instruction, ICICLE will therefore try to satisfy the deaf learner's need for understandable second language input. With poor or no aural capabilities, deaf learners receive nearly all of their English input through written material, often academic texts aimed at the comprehension level of their hearing peers (Anderson, 1993). Since the consensus among most researchers in Second Language Acquisition (cf. (Krashen, 1985)) holds that second language input at or near the learner's level of existing proficiency is most beneficial for learning, we would like to address this poverty of suitable input in our systemgenerated explanations. The intent for the surface form of our generated text is to focus upon grammatical constructions which involve those aspects of English the student is currently attempting to master, providing positive examples at a level of accessibility our target learners do not always have access to. Another way in which ICICLE will address the unique needs of the deaf population is by providing the user with corrections on his or her errors without involving a human teacher. Because this form of instruction may entail less "loss of face" for the learner than a situation with a human tutor, the hope is that this will put the students more at ease and encourage them to write more. Furthermore, it is our hope that the presentation of the feedback will also allow for a student to further explore concepts which he has not fully understood; in the evaluation of other systems producing usertailored output, users found the system more accessible than the human authority they would otherwise be consulting (Carenini et al., 1994; Moore and Mittal, 1996). 1.2 The User Model: A Proposal The current status of the ICICLE system is a functioning text parser with the ability to recognize and label morphosyntactic errors, delivering "canned" onesentence explanations of each error (see Section 3.3 for more details). In order to extend this system to obtain more accurate parses [The system currently chooses the first grammatical parse of any list of multiple parses for a sentence, or the first of all of the parses if there is no grammatical possibility.] and to involve the generation of original explanations in a manner tailored to the individual learner, the system must be able to collect and refer to information about that learner. It requires a very complex user model which can store and maintain information about a student across multiple sessions of system interaction in order to adapt itself to the changing needs of a student across the learning journey. The purpose of this paper is to motivate and outline a proposed design for that model and to detail how the development and the implementation of part of that model will proceed as part of my doctoral work. Part of the user model design has previously been sketched in earlier work including (McCoy et al., 1996; Michaud and McCoy, 1999), but this paper will be the most comprehensive description of the current design, what remains to be developed, and what questions still need to be answered. 1.2.1 The Demand for a Model Both the error analysis and the system response processes in the ICICLE architecture place demands on a model of the system user. This section addresses those demands in order to motivate the components of the model being proposed. In order to obtain a correct analysis of the source and nature of user errors, the error identification module needs to determine between multiple parses or interpretations of a sentence. Some of these parses represent different structural representations of the text, and in the case of ungrammaticality may place the "blame" for the error on different constituents. Other parses may involve the same violated grammar constituent but with different "sources" for the error. Since determining the nature and cause of student errors is an integral step to deciding how to approach instruction (Matz, 1982), the parser must be able to make principled decisions between these options. For instance, if the phrase "My brother like to go..." [This example has been taken from our corpus of deaf writing samples.] has occurred in the writing of a student, there are several possible situations that could have led to this mistake: the student could be entirely unaware of the English rule for subject/verb agreement; the student could know about the rule, but has applied it incorrectly here due to incomplete knowledge; or the student could have simply mistyped. To determine which of these possibilities is correct, it is necessary for the error analysis component to have at its disposal a model of the student's grammatical proficiency which indicates his or her mastery of such language rules, or features, as the concept of subject/verb agreement (McCoy et al., 1996). This knowledge would also aid in choosing between structurallydifferentiated parses by providing information on which grammatical constructs the user can be expected to use correctly or incorrectly. Another responsibility of the error analysis component is to pass a list of errors to the tutorial response component for the generation of instructive text. It is our wish that ICICLE give instruction only on those language components which are at the user's current level of acquisition; errors on those above this level are likely to be beyond the user's understanding, while errors on those which are wellestablished are likely to be simple mistakes which do not require instruction. This places an additional demand on the user model: not only must it show the user's depth of knowledge on a given feature, but it also must indicate a "current level" to which the features may be compared. With such knowledge, the error analysis component may trim away those errors outside this indicated realm of accessible and productive learning. Another part of the system requiring user modeling is the system response module. It is our goal to generate explanations which are individualized, taking into account a broad spectrum of factors which constitute the context of the generation activity, the components of which were outlined at the beginning of this paper: related concepts in the domain, the user's knowledge about the topic and supporting concepts, the dialogue history, and the user's history of system use. A need has already been established for a model of the user's grammar proficiency; added to this now is a hierarchical model of the user's domain knowledge --- metalinguistic knowledge of the terms and concepts used in grammatical explanations. For instance, an explanation about subjectverb agreement requires at the very least an understanding of the concepts subject and verb, and furthermore may require an understanding of the person property of nouns. This model will need to represent both the user's knowledge of these concepts and the relationships between them. Another need of the response module is to have history models which not only store the dialogue history in order to facilitate contextual references to recent explanations and to avoid repetition, but which also track how different types of explanations have succeeded or failed with this user. This information would be used when choosing between different explanation types in order to maximize the learner's potential for understanding the explanation. Finally, Section 1.1 established that one of ICICLE's goals is to provide generated text whose surface form is at an accessible level of syntactic complexity for the user, using grammatical constructs from the "current level" of acquisition in order to aid learning through the provision of positive examples. The final phase of the response generation therefore also needs to make use of a user model, obtaining from it information about which constructs are at this level in order to weight its surfacelevel generation decisions more heavily toward them. 1.2.2 Components of the Model In cataloguing the demands which the ICICLE system architecture places on a dynamic user model, I have established that this model must have the following components: o Knowledge Models -- A representation of the user's grammar competence in terms of individual morphosyntactic constructions. -- A representation of the user's knowledge of domain concepts underlying the constructions mentioned above. o History Models -- A dialogue history model containing a representation of all of the explanations which have been provided to the user during the current system session. -- A system history model holding information about what tutorial approaches have been attempted with this user and their relative success rates over all sessions. In this paper, the ICICLE system component referred to as the "user model" will generally refer to the large knowledge base spanning all four of these components. Where appropriate, the terminology will be refined to "user grammar model," "domain knowledge model," "dialogue history," and "system history." In some cases, the process of modeling the user may be referred to in terms of the knowledge models alone; these are the largest, most complex elements of the user model as a whole, and they will be the primary focus of this proposal and my subsequent research. 1.3 Guide to this Proposal The rest of this paper will proceed as follows. I will discuss the relevant previous work in the field of user modeling within tutoring and explanation systems in Chapter 2. In Chapter 3, I will then give a short overview of the architecture and approach of the ICICLE system as a whole. In Chapter 4, I will focus upon the generation aspect of the system, outlining an intensely contextaware text planner which will be making use of the user model. Finally, in Chapter 5 I will discuss the specifics of my user model design and address the implementation issues for placing this model within the ICICLE system. Chapter 6 contains a summary which outlines my plan of attack on this work. Chapter 2 Related Work This chapter overviews the efforts of previous explanationgeneration systems both within and outside of the field of Computer Assisted Language Learning. My main intent is to sketch the evolutionary direction of systems which provide tutorial instruction and to compare this direction against ICICLE's design and goals. 2.1 Early Explanation Systems As mentioned in the Introduction, the tendency of explanationgeneration systems has been to move from relatively inflexible beginnings to systems with higher levels of adaptivity to context. In particular, while domain knowledge bases have been a required source of information from the beginning of generation efforts, the extent to which the systems have modeled userspecific information such as the user's knowledge and the dialogue history has increased greatly over time. This section briefly overviews early explanationgeneration systems in order to illustrate this progression. 2.1.1 XPLAIN Williams Swartout gave his XPLAIN system (Swartout, 1983) the task of explaining how an expert consulting system arrived at conclusions or why it asked the user certain questions. Its primary goal was to allow a user to understand the reasoning behind an expert system's actions in order to ensure that the user had faith in the recommendations made by the system. XPLAIN acquired this capacity to explain an expert system through providing the programmer with an environment in which to design the expert system. During the design process it tracked how the programmer connected the descriptive domain model (containing facts in the domain of the expert system) with prescriptive domain principles (containing the heuristics and methods for operating in that domain) and then stored these connections for reference when it needed to explain the methods or heuristics. XPLAIN was implemented as part of a reimplementation of Digitalis Therapy Advisor, a medical advising program for doctors. XPLAIN's generation process was essentially what is called "database tracing," where the generator iterates through a relevant portion of a database and outputs phrases whose organization mirrors that which is hardcoded into the knowledge representation of the system. Each step of reasoning encoded in the system was transformed into a phrase, and the only way in which XPLAIN was able to diverge from the structure of the database was to omit the phrase explaining a given step in a process. One situation in which the system did this was if the step was deemed redundant; if the user was asking about that step, the statement that the step occurred was deemed unnecessary. The system also used a notation called a "viewpoint" marked on each element in the database to determine inclusion or exclusion in an explanation. In the actual implementation of the system, the only viewpoint which was relevant was the "computer" viewpoint, which was used to indicate that the step was deemed to be important only to the internal workings of the system. This was the case when the step was an artifact from the process of breaking down the procedure into a computer algorithm, and was therefore at too primitive a level to be relevant to an explanation presented to a human. Beyond this, the only viewpoint the system made use of was that of a medical professional, the intended audience for the Digitalis Therapy Advisor program. The design of XPLAIN did not consider the user as an individual, although the "viewpoints" were intended to be extended in that direction. There was no attempt by the system to establish, maintain, or reference a model of user knowledge; instead, the system assumed a "perfect learner" who understood everything that was explained. This was typical of early explanation work, as shall be illustrated in the next few sections. 2.1.2 TEXT Another early explanation strategy is the wellknown founding work in natural language generation, Kathleen McKeown's schema approach (McKeown, 1985). Operating from the premise that a generation system can use the same discourse strategies humans use in structuring their discourse, McKeown cataloged the rhetorical techniques humans use to present information as rhetorical predicates. Examples included analogy with a known concept and evidence supplied for a given fact. She combined these predicates into four schemata which generalized the paragraph structures she found in naturallyoccurring text: Attributive, Identification, Constituency, and Compare and Contrast. Essentially, the four schemata represented patterns of rhetorical predicates belonging to coherent paragraphs. They were used for generation in the TEXT system when answering user questions about the structure of a database. The resultant text, structured by these schemata, was organized in a manner independent of the database structure, freeing the database to be lain out in the manner most suited for the system's internal representation of the data, while the schemata could build from this the structures most suitable for human consumption. The schemata also enabled a generator to produce far more variety than the database traces of Swartout's system. Different purposes for the explanation could result in entirely different structures, not just a change in the level of abstraction, and there were decision points within the schemata where focus constraints could select between options to vary the structure. The result was a certain level of flexibility both with respect to the previous dialogue and with respect to the question that needed to be answered. There was no user model in TEXT; it assumed a "static, casual and naive user." This user was taken into account, since the text was structured specifically to present the information in a way tailored to a generic human user, but no individuality was acknowledged. As in XPLAIN, no allowance was made for the user misunderstanding the text, either; because the user was assumed to have always understood, the system always trimmed subsequent explanations to avoid details which had been stated earlier. 2.1.3 EES In the next step of abstraction from the rigidity of database training we find the Explainable Expert System (EES) (Moore and Paris, 1989; Moore and Paris, 1992), originally implemented in the Program Enhancement Advisor system (PEA), a tool which assisted users in writing better Lisp programs. Instead of laying out entire paragraph structures as in McKeown's approach, Moore and Paris used Rhetorical Structure Theory (Mann and Thompson, 1988) to recursively structure text with the nucleus/satellite structure defined in RST, where intentional relations linked "spans" of text together. Unlike the approaches described above, EES modeled its user; the representation it used was a collection of beliefs about the domain and goals within the domain. Its user was dynamic, learning as time passed, and imperfect, capable of misunderstanding an explanation. This imperfection was handled through a text planning approach which stored detailed information about the decisions that were made so that explanations could be reattempted with intelligent modifications that took into account the likely causes of an explanation's failure. The EES planner used an agendabased mechanism to post communicative goals represented as effects the system desired to have on the beliefs and/or goals of the user. A library of planning operators were available to apply "linguistic resources," or rhetorical techniques similar to McKeown's predicates, to meet a given goal. In that way, each operator was a kind of miniature schema, detailing some short sequence of actions to achieve a given communicative goal. The planner began its process by posting a general communicative goal on the agenda and then searching for planning operators which solved that goal. Selection of a given operator depended on the satisfaction of its constraints, which referenced the domain database, the user model, and the dialogue history in order to limit the situations in which it could be applied. While some of these constraints were considered "rigid," the constraints on the user model were treated in a loose fashion; if nothing in the user model gave any information about the user's knowledge concerning a specific topic, the constraint was considered satisfied and the assumption that the user knew this topic (in the absence of information to the contrary) was recorded in the plan if this operator was chosen. Once selected, the operator's subgoals would be examined. Since the operators were based on RST relations, the subgoals were classified as nucleus and possible satellites, between which was a specific RST intentional relation. Each subgoal was either a semantic specification of some primitive speech act or an additional communicative goal to be posted the agenda. The planner continued until all goals from the agenda had been processed and refined down to speech acts. The hierarchical plan built by EES maintained the intentional relationships between each nucleus and satellite, from the upper level spanning the entire explanation to the lowest level between each semantic proposition. McKeown's schemata had also represented intentions but only at the top level spanning all of the schema's text; by associating intentions with the smaller pieces of text in the hierarchy, the planner of EES could reason on what effect each part of the text was intended to achieve. In the case of explanation failure, that information allowed the system to reevaluate just the relevant pieces of the text when generating a new explanation. Another part of the plan which allowed for reevaluation was the recorded user model assumptions; since there was no strong evidence in favor of the assumption, it was considered suspect on explanation failure. 2.1.4 Discussion The earliest machinegenerated explanations, as (Swartout, 1983) points out, were canned, pretyped text that were presented to the user. The flexibility of this approach is of course nil; not only is there one and only one explanation available for a given concept with no variation, but all of the user's explanation needs must be anticipated during the design of the system. Similar to early text adventure games where you could not PUT FISH IN BOWL unless the game creator anticipated that you would try that action, those early systems could not provide any explanation the programmers had not foreseen that you would need. XPLAIN and its contemporaries generated text by iterating through a database and converting the data into text as it was found. This freed the system to generate explanations on any aspect of the system without having to prestore the explanations first, but this approach produced text which was entirely dependent upon the structure of the database; the knowledge engineer predetermined the text's structure by the way he or she designed that database, and there was no variation from this structure. No account is made for the possibility of different explanations serving different purposes, except for rough modifications like producing the explanation at different levels within the database hierarchy. McKeown's work was a significant step forward from this because it enabled the system to impose a structure on the text which was independent of the way it was represented in the database. Her schemata allowed for different structures to be used according to the purpose of the explanation. She was taking the user and the system's interaction with the user into account, letting the context influence more of the generation process. However, her user was generic, unchanging from one individual to the next and static in his or her knowledge. In systems like EES, we see the emergence of the idea of accounting for the user as a unique person who has individual needs that can influence how the system addresses him or her. The result is the introduction of a user model containing what the system believes are the beliefs and goals of the individual, acknowledging that these beliefs and goals may differ from those held by another individual. With the introduction of the user model, however, comes a host of issues that the EES work did not address. While the EES planner made use of a model, the issues of model initialization, updating, and correction were outside of the scope of that work. In the following section, I discuss systems which explored these issues in more detail. 2.2 Toward User Modeling in Explanation Systems The primary goal of user modeling could be defined as assessment of the user's changing knowledge over time in order to adapt the communication of knowledge to that user (Spada, 1993). ICICLE's design certainly espouses this view, since we desire to accommodate a dynamic, learning individual and to tailor the delivery of knowledge to that individual as closely as we can. Another of Spada's observations on user modeling is to categorize models into three basic types of user model: ideographic, modeling one specific person; prototypic, modeling a population of individuals with no variation; and individualized, starting out with population assumptions and adjusting for individual variation. One could argue that XPLAIN and TEXT were using prototypic models that were built into the systems themselves; with no individual variation, all decisions based on the population of users could be predetermined and hardcoded into the planning mechanism of the system. EES, on the other hand, used an ideographic user model, representing the individual as a collection of beliefs and goals without taking the population into account. In this section I will introduce several explanation systems whose approaches span both the ideographic approach and, later, the individualized approach, augmenting the modeling of an individual with information about the population; it is this type of modeling system which ICICLE is going to use for its own purposes. 2.2.1 TAILOR C'ecile Paris' TAILOR system (Paris, 1987; Paris, 1988) presented the idea that the user's domain knowledge should be used to select the discourse strategy for structuring the text. A common approach predating her work was to accommodate user knowledge by simply changing the amount of information presented to the user, such as the approach in XPLAIN and TEXT, where they left out elements the user was assumed to know because they occurred in recent discourse. Paris stressed the need to affect the type of information as well, pointing out that different kinds of explanations are needed for experts and naive users; their different domain knowledge leads them to be capable of understanding different representations of the information available on the topic. In order to implement her approach, she designed two discourse strategies. The constituency schema, based on McKeown's work, was a strategy designed to describe an object by its constituent parts; the process trace explained the processes associated with the object. She also developed the distinction between the "declarative" constituency approach, which structured text according to the abstract organization in the schema, and the "procedural" process trace, which followed the database structure closely byway of following directives on how to access the knowledge. The choice between these two strategies depended partly on what information was available in the database, but mostly on what the user knew about the topic, as described below. As in previous approaches, TAILOR also used the user model to control the information selected for inclusion in the description as well, pruning information that was wellknown or easily inferred and inserting that which was unknown. There were two types of knowledge she represented in the user model: knowledge about objects in the domain, and knowledge about the basic concepts underlying the domain. The knowledge base represented objects in a way that included the related mechanical processes, and an expert's knowledge included the functionality of most objects and processes in the domain, while a naive user did not know about the specific objects and did not understand the underlying basic concepts. Since a given learner could be anywhere between this definition of expert and naive, the user's knowledge was represented by listing which of the objects and underlying concepts are known. In that way, she could represent a "continuum" of expertise along which the user resides. This design is what is termed an "overlay" model, where the user's knowledge is represented as some subset of the knowledge represented in the system. TAILOR executed the discourse strategies by iterating through augmented transition networks representing the two strategies. The entry into the network at the initial level was based partly on the availability of information in the database --- if there existed no process information connected with the given concept, the constituency schema had to be chosen --- but the decision rested mostly on the user domain expertise. If the user had no local expertise (knowledge about the specific object), the system chose a process trace; otherwise, the system could opt for a constituent description on the basis that the more advanced user would be able to infer the processes involved without explicit explanation. The constituent schema was also chosen if the user had local knowledge of most or all of the functioning subparts of the object. At key points in the networks, when a new part or superordinate was introduced, the process recursed and could choose either strategy for the new object. As the network was traversed, information was retrieved in the database to fill out semantic propositions. Roughly utterancesized, these semantic propositions would be translated into text at the completion of network traversal. TAILOR did not establish the initial state of its user model; it was given as a series of parameters in its input. Because of this, the system did not have any need to make complex reasoning about the possibly incomplete nature of the model; it operated from the assumption that what it was given was correct. It did, however, update the model over time; embracing the perfect learner assumption, TAILOR changed a user model item to "known" whenever it was explained. 2.2.2 Menotutor Beverly Woolf's Menotutor system (Woolf, 1984; Woolf and McDonald, 1984), although predating the work described above, was more sophisticated in the way in modeled its user's knowledge and incorporated that model into its planning decisions. Woolf stressed the importance of planning text which was "contextdependent," adapting to the context of the student and the discourse history. As with previous systems, she based her instructive approaches on observations of human strategies, but one important difference from Paris' work was her emphasis on the responsibility of the system as teacher, not only to make highlevel distinctions between different methods of explanation, but also to have a level of flexibility detailed enough to avoid teaching above or below a user's current level of understanding. An important and novel aspect of Woolf's user modeling work was her concern with model establishment and updating; she set goals for Menotutor to model student knowledge accurately, to update that model over time, and to use the model to find the most effective presentation method for that individual. This model was not given as input like in TAILOR, and it was updated over time by the system both to become more accurate and to reflect changing user expertise. Although one could describe Menotutor's user model as an overlay model, it had several qualities to it that differed from the standard. The student's knowledge was modeled as an annotation of a domain knowledge base which included both correct concepts in the domain and misconceptions that a student could possess. [Some definitions of the "overlay" approach would restrict it to describing a model which represents the user's knowledge as a subset of the system's correct knowledge about the domain; I extend it here to include both correct knowledge and misconceptions because the user's knowledge is still considered to be a subset of a predetermined set of facts, including incorrect facts; i.e., the user cannot be noted as having a misconception outside of those provided in the model.] Also, instead of merely taking note of which items were known by the student, Woolf's system labeled each item with a numerical Expected Competence rating indicating the strength of the user's knowledge. This number was set to a default value at the beginning of interaction with a student, and then revised over time as the system interacted with the user and garnered more knowledge about his or her competence; Woolf held that studentmade errors were "powerful clues" for the tutor who was able to use them to determine the strengths and weaknesses in the student's knowledge. If the student answered questions at that level of competence successfully, the value went up, but if he or she failed to answer questions on that or lower levels, the value was altered downward. The value could, therefore, dip below the default value to which it was initialized, if that value had overestimated the user's knowledge. This model allowed for the system to ignore isolated errors, because if a user had a high level of Expected Competence on a concept, a mistake answering a question was likely to have been just a mistake and not a true indication of error. Also, this detailed rating allowed the system to tailor instruction to the precise level of the student, avoiding topics which were below the student's "threshold of learning" because they would be too easy, and avoiding those which were above because they would be too hard. The text generation process in Menotutor iterated through Forty networklike states, progressing through three distinct planning levels in a discourse management network, or DMN. In a way, Woolf's DMN echoed the schemata approach, laying out basic patterns from discourse with choice points at various locations where the context could affect which alternative was taken. At each step in this multilevel network, information described what text was to be produced and what paths extended from that point. Although there were "default" paths to take from each point, representing the "contextindependent" way of implementing the chosen method of discourse, one of twenty meta-rules could be fired at any point by matching conditions with the context --- the student model or the discourse history --- in order to divert the explanation's development along a "contextdependent" path to a new state somewhere else in the network. The conditions they tested involved the user's command of certain topics, the system's confidence ratings, and the existence of related topics in the domain. Since the user model was assumed to be possibly incomplete, when the system did not have a high certainty of user knowledge, it questioned the user in order to determine how to proceed. The first level in the network, the pedagogic level, involved selecting a general tutorial approach. This choice established the overall expository style, determining the number of times the system would allow the user to interrupt and the amount of questioning the system would perform. The two possibilities were "Socratic" versus "coachlike," translating to a style which would involve a high level of interaction or a style with a low level. Chosen at the beginning of the extended discourse, the strategic selection would remain active until the system perceived troubles with the student, in which case the pedagogic approach could be switched in hopes that the other method would be more effective with this individual. The second level entailed the construction of a strategy to implement the pedagogy. Choices here might be between questioning the student, describing a concept, or choosing a new topic, driven partly by the pedagogic approach, partly by the user knowledge, and partly by the discourse history. At the third level, the tactical choice was between the speech patterns and language structures that implemented the strategy. Over the long term, Menotutor tracked the success of its chosen pedagogical approaches, and it could change which strategies it used if the current one was not succeeding with a given student. Planning terminated according to the length of the explanation, ending when it reached a certain size. 2.2.3 EDGE Alison Cawsey's work with EDGE (Cawsey, 1990; Cawsey, 1993) continued the emphasis on what she termed informative explanations --- explanations which take into account the user's knowledge, linking with the user's existing understanding and leaving out superfluous information. As Paris and Woolf had also concluded, Cawsey defined the actions necessary for accomplishing informative explanations to be deciding what material should be included in the explanation and choosing between different ways of structuring the text. Her agendabased planner operated in the domain of electronic systems and consulted its model of the user's domain knowledge when selecting a discourse strategy and when deciding on exclusion or inclusion of information, as is described below. Like Menotutor, EDGE both established and updated its model over time. Cawsey held that an initial user knowledge model, based on some gross generalization about the user, was going to be inherently inaccurate, but that an interactive system could improve that accuracy over time by continually making note of new information about the user and marking it down. The model also needed to change in order to reflect growth in the user's knowledge. Her model was therefore initialized according to the level of expertise the user assigned himself. Each concept in the user model bore a rating to show which of the four levels of expertise was stereotypically required for knowing it, so with a broad categorization of the user's expertise, the system could make certain guesses about specific user knowledge on each concept until more information came in. As the tutorial dialogue progressed between human and computer, information about the user's knowledge on specific concepts was derived from that dialogue and marked in the model, overwriting the initial guesses. This was similar to Woolf's approach except in that Cawsey's model took the individual into account at least at a rough level when assigning the initial settings. The model in which this information was recorded was a database overlay whose precision lay somewhere between those of TAILOR and Menotutor; backing off from a numerical scale which might introduce finer detail than the system could accurately support, the tags on each concept consisted of the basic known and unknown plus maybeknown and undecided to indicate different levels of system confidence. The contents of the model were hierarchically organized into topics and subtopics, from general subjects (e.g., how certain types of devices work) to specific knowledge (such as what a certain indicator on a certain device means). This was supplemented by a general estimate of the user's overall expertise level, starting at the level the user assigned himself at the beginning of system use and changing as the system made it more accurate or updated it when the user progressed. In the course of systemuser interaction, the user both answered questions posed by the system to determine his knowledge and posed questions of his own. These actions gave the system new data about the knowledge of the user, and old data stored in the model was overwritten, improving the model's accuracy. User actions were not the only instigator of model updating, though; changes were motivated by system actions as well. As in earlier systems, EDGE wished to believe that once a concept has been explained, it was known; but taking into account a possibly imperfect learner, the system only went so far as to revise the tag on a concept to maybeknown status once an explanation had been delivered. In some cases, the system needed to know information on a concept about which it had no explicit judgment yet. In those cases, the generalization hierarchy could be used to infer the level of user knowledge according to specific inference rule. For instance, if all subconcepts were known, then the parent concept could be inferred as known; on a more general and less reliable level, if the "concept difficulty" rating was greater than the current level of general user expertise, the concept was inferred to be unknown. Since the assumptions on which these deductions were made changed over time, implicit information was not recorded in the user model; it was derived from the current model on each occasion that it was needed. In cases where the rules of inference did not give the system any more data about a particular concept and it really needed to know the user's knowledge about it in order to proceed, it questioned the user directly to determine the user's level of knowledge before planning the explanation. Since the user's general level of expertise was used not only in model initialization but also for these implicit decisions, it was also dynamic, and like with Menotutor, the level of expertise could be revised either upward or downward; if the user answered difficult questions, the system increased its estimation of the user's knowledge, and if the user asked questions about easy concepts this led to a downward revision. This complex user model was used in three aspects of planning in Cawsey's system: the planner referred to it to determine which strategy to use for structuring the overall explanation, what level of detail to use, and what background or optional information to include. The planner was agenda-based, starting with the overall goal to describe a given circuit and posting this on the agenda, and then searching for contentplanning rules to accomplish this goal. EDGE had 25 of these rules, each representing a pattern of subgoals which defined one possible way of describing some aspect or aspects of the circuit; for instance, one rule made a comparison to a similar circuit the user was familiar with, while a different one identified the type of the device, listed its components, and explained its function. These rules made a distinction between "subgoals" and "preconditions," where subgoals always had to be satisfied by planning text but preconditions only resulted in text if not already satisfied by the user model. In the case of the rule comparing the circuit to another one, that was a subgoal and would always result in text; in the case of the other explanation, each of the three subparts were preconditions, so if the user already knew the components of the device, they would not be listed. As a result, the details already known to the user were not reiterated unnecessarily. The planner tracked every time it opted not to expand a goal into text because of the possibility that the decision was based on faulty information; if the user's subsequent actions indicated that the explanation was not fully understood, this list of modelbased assumptions that led to omissions in the text were the first suspects on the list of reasons why the user failed to understand [Note the difference between "assumptions" in EES and EDGE; in EES, "assumptions" were recorded when the system made a decision not based on the user model because it was incomplete; in EDGE, they were recorded when the system made any decision based on the user model because it might be incorrect.]. Another way in which the planner referred to the model was to decide whether to plan out dialogue actions in the case of dubious user model information. I mentioned earlier that the system was uncertain at times whether a user was familiar with a given concept. The planner would poll the user's understanding first in those cases rather than explain something the user already knew. When the planner reached a primitive, it was generated at that time, so EDGE employed an incremental form of realization; each time a goal was refined down to text, that text was passed through a level of discourse planning (where discourse markers were added for coherence) and then realized byway of filling in utterance templates in the planning rules. One of the reasons why EDGE did this was because the explanations tended to be long and involved, and the system might interrupt itself at any time to ask questions of the user; the user might also ask questions along the way. Since this interaction could result in different user model information along the way, executing dialogue actions as they arrived prevented the system from making extensive plans that would have to be scrapped in light of new information. 2.2.4 Discussion These three systems have significance for the work on ICICLE both for their user modeling efforts and for their generation techniques. In the area of user modeling, all three of these systems used variations on the overlay design, a representation that allows a system to model users not as merely belonging to rough categories of experience but as unique individuals anywhere between complete naivet'e and total expertise, possessing knowledge about some concepts in the domain but not others, a level of flexibility important in a system designed to adapt closely to its user. Menotutor augmented this concept by not only modeling the user's knowledge on a set of correct beliefs about the domain, but also on a set of misconceptions. EDGE extended the concept by structuring its model to allow for reasoning which not only reflected the individual but also a population of learners of which the user was a member, achieving the type of model which Spada calls individualized. EDGE and Menotutor introduced the idea of a user model which must be established and maintained by the system rather than taken as given. Since any conclusions the system makes about the user had the potential to be incorrect or incomplete, those systems also had to deal with the possibility of an imperfect or incomplete model. EDGE coped with incompleteness through a hierarchical organization of the domain concepts and rules which allowed the system to infer what it did not know about the user's knowledge from what it did know (an improvement over EES' lessprincipled approach which decided anything undecided was known by the user), and both systems also had the ability to question the user directly to find out more. They dealt with incorrectness by revising the model according to new data as it came in. In this way, they were also able to deal with a model that started out correct but became incorrect because the user's knowledge changed. The user knowledge modeling aims of ICICLE would be wellserved by basing our model on the work discussed here. Like Menotutor and EDGE, ICICLE needs to be able to establish an initial model, and I believe that EDGE's choice to base the initialization on information about the user is a step in the right direction, rather than just initializing everyone the same way as in Menotutor. ICICLE will also need to maintain and update the model over time through information obtained from the user, namely in his or her language performance. This model will have two categories of knowledge represented within it: grammar proficiency and metalinguistic underlying domain information. In each of these, we have a specific and set of items about which we want to know the status of user knowledge, so an overlay design would function well for our needs (see Section 5.2.3 for a discussion). The explicit information marked directly in the model would be the data drawn from the utterances entered for analysis. We would also like to infer implicit information from relationships between the concepts as in EDGE, however. In the domain knowledge model we should be able to arrange the concepts into a hierarchy of concepts and subconcepts, allowing for rules of inference to be applied when reasoning about related concepts. In the grammar model, we may be able to draw implicit conclusions as well if we can establish certain observations about the acquisition of grammatical forms. This will be discussed more in Chapter 5. Another way in which ICICLE benefits from this earlier work is by examining the way in which the user model affects the text generation process. All three systems focused on using the knowledge of the user as a primary decision factor in multiple areas of text planning, including affecting what is said and in what manner it is expressed. This is essential in a system which desires to modify its approach for a wide variety of learners, some of which may require different types of information presentation to excel at the learning task. Woolf's idea of tracking the success of particular tutorial approaches in order to use the one most successful with a user is particularly important. If ICICLE is to succeed as a tutoring system, it will need to choose its pedagogical tools wisely and to produce explanations which are cohesive and meaningful to the individual. The text planner we propose for ICICLE will be covered further in Chapter 4. 2.3 ComputerAssisted Language Learning ICICLE as a system fits into many categories; while it is an explanation generation system and a user modeling system, it is also a ComputerAssisted Language Learning (CALL) system. In this section I will present brief descriptions of some contemporary systems in the field of CALL, selected because of their relevance to the goals and approach of the ICICLE project. In these descriptions, "L1" is used to denote the learner's first or native language, and "L2" the language he or she is trying to acquire, also called the target language. 2.3.1 HyperTutor The HyperTutor system (Schuster and BurckettPicker, 1996) is a learning tool for Spanishspeaking ESL learners of reasonable proficiency. It interacts with the student through a series of translation tasks, presenting Spanish sentences which the student then translates into English. It gives the student notification of whether the English sentence is right or wrong, and in the latter case gives an explanation about the error. Its goals, like those of ICICLE, are to be able to correctly identify the source of an error in order to focus individualized, appropriate instruction. The authors characterize the HyperTutor user model as an interlanguage, or "languageinprocess," a concept I will discuss further in Section 5.2. The essential nature of their model is a store of the language learning strategies the system has observed the student using, where the possible strategies are: o Direct use of the L1 instead of the L2 when the L2 construct is unknown. o Negative transfer of L1 grammar into the L2 (using Spanish grammar rules for English constructs to which they do not apply). o Simplification, where the nonmeaningful words in English have been omitted from the English utterance. o Reduction of redundancy, where errors in morphology reflect the speaker's deletion of what he or she considers to be redundant. o Positive transfer of the L1 structure into L2 (when it is grammatical in the L2 as well). o Overgeneralization of an L2 construct to a larger set than is appropriate. The model containing these strategies is dynamic, growing as the user performs the translation tasks. Whenever the user commits an error which can be attributed to one of the six strategies, the system makes note of this by adding the strategy to the model, and then generates a message describing the error from the point of view of the strategy the user was observed to be using. In this way, the feedback presented to the user can be specific to the real cause of the error, a quality which ICICLE would like to emulate as well. 2.3.2 Mr. Collins The CALL system Mr. Collins ["Mr. Collins" actually refers to just the user modeling component of a larger system, but the name is also used for the entire system for simplicity.] (Bull, 1997) addresses the learning processes of English speakers acquiring Portuguese, specifically within the restricted domain of Portuguese pronoun usage. Its primary goal is to interact with the student through exercises and discussion, instructing him or her on the efficient use of learning strategies to bolster second language learning. The strategies it teaches include the positive transfer strategy mentioned above, and also deduction, inferencing, grouping, and actively looking up answers in the resources provided by the system. Because of the restricted domain of L2 constructs being acquired, the instruction of Mr. Collins is almost entirely centered on these strategies and on how they might improve the student's performance. Most of the exercises in Mr. Collins involve Portuguese sentences being presented to the user without the object pronoun. The student must find the correct position for the pronoun in the sentence. The system passively observes the student navigating through the information space available and solving the exercises, only providing instruction when the student requests it or when the system decides the student's performance is suffering due to poor strategy use. With its focus on explicit discussion of strategy use and its restricted domain of instruction, Mr. Collins does not seem to be particularly related to the goals of ICICLE. However, one trait of the system which is of great interest is the flexibility which it brings to bear when presenting its instruction to the user. Mr. Collins presents its material in a variety of different formats, sometimes quoting relevant sentences illustrating its point, sometimes presenting the relevant grammar rules explicitly, and at other times making direct comparisons to the L1. This variety will be discussed further below and also plays an important part in ICICLE's generation goals. 2.3.3 German Tutor German Tutor (Heift and McFetridge, 1999) is a CALL system under development designed for the instruction of learners of German. Its current implementation is designed for native English speakers and accepts single sentences, parses them, and provides the user with feedback on single errors found. The student modeling architecture in German Tutor is similar to the one described for Meno-tutor (Woolf, 1984) in the previous section. It utilizes a database containing all of the grammatical constraints the parser can recognize as met or broken, holding a score from 0 to 30 representing the user's knowledge on each constraint. A score from 09 represents expert knowledge, 1020 is intermediate, and 2130 is novice. The score for each constraint is initialized to 15 at the beginning of a session with the user, and is incremented with each failure, decremented with each success. This model is not stored from one session to the next. Both the parser and the feedback generation process make use of the student model. The parser selects between multiple possible parses by averaging the student's proficiency score across all constraints, yielding a general proficiency level, and comparing this against an ordered list of possible parses; the proficiency level will place the student among the possible parses, from those using simplistic forms to those attempting more complex possibilities. When selecting the subject matter for instruction, German Tutor prioritizes the errors in the sentence and selects the most frequent or relevant one for the purpose of feedback. The feedback the system generates is presented in three formats representing three levels of expertise, varying in the level of abstraction discussing the violated constraint from specific knowledge to be presented to a novice to abstract knowledge for an expert. The student's knowledge level on the topic is used to choose between the three, and the result is presented to the user in English. 2.3.4 Discussion All of the systems discussed in this section contain some part which matches the goals of the ICICLE system. In HyperTutor, its claim that the user model is capturing the interlanguage state of the user is very closely matched to what I discussed in the Introduction as being the goal of the user knowledge model for ICICLE. However, simply storing the strategies the user is executing to build this interlanguage seems to be an insufficient technique if one wishes to really model the user's internal language hypothesis; HyperTutor's efforts yield no information about the user's knowledge on specific concepts. The authors do not address the possibility of errors whose source is ambiguous, or sentences which hold more than one error; and furthermore, the revision of the model over time merely adds strategies, not taking into account the possibility of the user's changing proficiency leading to different strategies being used rather than just more strategies. In ICICLE's model of the L2 knowledge, what is in the interlanguage will take preference over how it is being built, yielding us much more specific information on which to base instruction. While HyperTutor does provide the possibility of different explanations depending on what the source of the error is deemed to be, it does not match the flexibility of Mr. Collins' varied explanation presentation facility or of the explanation systems discussed in the previous section. The ability to present information in different forms, originating with TEXT and later reflected in the other systems such as TAILOR and EDGE, is very desirable in an instructional system which desires to reflect the individuality of its user, since, as Paris asserted, different people may benefit from different types of information presentation. ICICLE will most certainly embrace this goal as well, to a much larger extent than HyperTutor or German Tutor have accomplished. Finally, German Tutor's user modeling technique most closely resembles that which has been proposed for the ICICLE user knowledge model, by representing a kind of languageelement overlay model with varied markings according to user proficiency level. The main drawback to the approach as implemented in German Tutor is that instead of using the individual ratings to make decisions and relying on a general estimate of expertise only in the absence of other information (as in Menotutor), German Tutor's lumps all of its data on user language proficiency into one average sum for use in selecting the appropriate parse. This does not seem to be a very accurate way of selecting between parses; a more selective approach that uses the individual markings rather than an overall judgment of user proficiency will be discussed in Chapter 5. 2.4 Summary In the first two sections, I overviewed some explanation systems Which have led up to and contributed to the design of ICICLE's user modeling component and its proposed text planner. We wish for ICICLE to establish and maintain a complex, dynamic user model which is highly important to the text planner, affecting the planning decisions in many ways so as to produce text which is maximally tailored to the individual. In order to accomplish this, we will draw from several aspects of the systems I discussed, including the overlaybased, hierarchical knowledge model design and the use of both direct and indirect information stored in the model. In the third section of this chapter, I briefly described some of ICICLE's contemporaries in the field of ComputerAssisted Language Learning in order to illustrate how the modeling and generation techniques reviewed earlier could add to and improve upon the current state of the art in that area. It is our intent that ICICLE prove more flexible, more widereaching, and more informative than these other systems. In the next three chapters I will put forth the main essence of how ICICLE will accomplish these goals. Chapter 3 outlines the general architecture of the ICICLE system as a whole. Then, in Chapter 4, I will discuss the proposed text planning element for the system in order to motivate the main thrust of my research, the user knowledge modeling component, which will be the focus of Chapter 5. Chapter 3 ICICLE System Overview This chapter overviews the ICICLE system architecture as a whole, and gives a brief status report of its current state of implementation. The purpose of this is to provide a view of the larger picture in which the user model will play a part. 3.1 Architecture To accomplish its goals, ICICLE will use a multicomponent architecture represented as a conceptual drawing in Figure 3.1. The primary active components of ICICLE's design are those which accept the user's input (the error identification module) and provide the response (the response generation module). Both of these draw from two knowledge base components: the first is a domain knowledge base containing information on English, ASL, and the errors recognized by the system; the second is the user model I have previously described, capturing the user's grammar and domain knowledge, the dialogue history, and the history of the specific user's interaction with the system. 3.1.1 Error Identification The analysis of a student's errors is accomplished in ICICLE via a chartbased parser with a coverage of English that has been augmented by errorproduction rules or malrules (Sleeman, 1982; Weischedel et al., 1978) which were derived from an error taxonomy compiled out of actual writing samples from deaf students (Suri, 1993; Suri and McCoy, 1993). These additional rules enable the grammar to recognize syntactic and morphological constituents containing errors produced by the target population. In Section 1.2.1 I addressed the fact that such a parser can and will produce multiple possible parses of a given input sentence, and that the selection of which parse to use requires the use of the user model. Given a possibly large set of parses, the error identification module will select the single one whose grammatical and ungrammatical constituents most closely match the grammar model's representation of what constructs the user can be expected to use with or without error. This choice must also take into account multiple possible accounts for why an error occurred, and thus must have some flexibility with respect to how closely the parse matches the model; i.e., the system cannot necessarily throw out a parse which contains an error in a construct the user knows well, for that error could be a simple mistake and not a true reflection of the user's grammatical competence. Lastly, since the same "erroneous" constituent may have one of a list of causes, the parser must not only identify that an error exists, but must "tag" it with a note indicating its nature (e.g., incorrect because it is beyond the learner's understanding, incorrect because of faulty knowledge, or incorrect because of a simple mistake). Figure 3.1: ICICLE system architecture. Once a single parse for the sentence has been selected and the errors it contains have been fully identified, those errors are passed back to the user interface so that sentences containing problems may be highlighted. The error identification component also consults the user model to create a pruned list of errors, containing only those which are relevant for tutoring, and passes those to the response generation component. The determination of relevance relies on the model's representation of those language structures which are currently within the student's grasp to learn about. 3.1.2 Response Generation The response generation module is charged with creating the tutorial Feedback which will enable the user to correct the errors that have been found. To do this, it will present to the student a natural language explanation of the errors, after which changes to the text will be encouraged. The goals of our response generation module are: to be capable of producing a wide variety of tutorial approaches as discussed in Chapter 2; choosing between these approaches, planning their structures, and determining their information content according to the learning styles and knowledge of the student; and enriching its text with relevant information from the dialogue history and the student's domain knowledge. To accomplish this, a multilevel library of planning operators will apply the information resources of the system (both the user model and a large database of system knowledge about the domain) toward forming and revising a hierarchically structured text plan. This module of the ICICLE system is the focus of Chapter 4. The completed text plan will consist of semanticlevel utterance specifications which can be fed into a surface text generator. This generator will produce the actual English text that will be displayed to the user through the user interface. 3.1.3 The User Model The ICICLE user model has already been presented as a complex, dynamic model of user grammar and domain knowledge, dialogue history, and system use. As is indicated in Figure 3.1, this model has bidirectional information flow with both the error identification module and the tutorial response module. The error identification module relies upon the grammar model to select the most appropriate parse of an input sentence; when the parse of a given writing sample is complete, it will then send information back to the user model in order for the grammar mode to be updated with new statistics and for the system history model to receive information so it may analyze the success or failure of tutorial methods which it is tracking. In turn, the response generation module also consults both the knowledge and history parts of the user model in order to plan its explanations, and then sends the completed plans back to be stored in the dialogue history. Model establishment and maintenance issues will be discussed further in the central part of this proposal in Chapter 5. 3.1.4 The Domain Knowledge Base While the user model is dynamic in ICICLE, its counterpart, the domain knowledge base, is considered static and this is illustrated in Figure 3.1. Although both of the "active" modules draw information from this knowledge source, neither makes modifications to it, as it is assumed that the parsing grammar and the grammatical concepts discussed by the system are unchanging entities. The "Domain Knowledge Base" is a store of the system's domain knowledge and should not be confused with the domain knowledge component of the user model, which stores information about the user's domain knowledge. The Domain Knowledge Base contains two main components, including the augmented parsing grammar used to cover ungrammatical input as discussed above. The purpose and function of the other element of this knowledge base (labeled Database of Grammatical Concepts in the figure) is to supply information to the explanation generation process. This component stores the domain knowledge from which the system's explanations about English grammar are generated, so it must include information about how to define all of the grammatical forms recognized by the parser. For instance, if the parser can identify errors in subject/verb agreement and in preposition placement, this database must include information about how to explain those errors and the concepts involved in those explanations. We are investigating whether the concepts in this domain lend themselves to a generalization hierarchy in which children inherit parts of their definitions from their parents (such as one representing German Shepherds and Collies as Dogs, which in turn with Rabbits are Mammals, which are Vertebrates, etc.). In any case, the relationships between the concepts do need to be represented. Because explaining these concepts may involve mentioning other concepts which the user must also understand in order to absorb the explanation, this definitional dependency relationship needs to be noted in some way by indexing the concepts on which an explanation depends from the definition information stored in a concept node. Also, the system may wish to draw comparisons between related concepts, so the database must also include information on the features certain concepts have in common, or those which contrast. The organization of this component will be correlated to that of the domain knowledge component of the user model mentioned above and discussed in more depth in Section 5.3, but again the two knowledge sources will remain distinct because of their vastly different purposes. The exact design of this database has not been fully determined and is a topic of future research. Figure 3.2: A cycle of user input, system response. 3.1.5 The User Interface The interface component of the system is responsible for accepting the user's text and passing it to the error identification module, and for displaying the results of that analysis (in the form of highlighting those sentences which have errors) back to the user. It also displays the tutorial text generated by the response generation module and allows the user to make corrections based on the explanations or to request additional information if an explanation has not satisfied him or her. This last function of the user interface involves handling one of the possibilities for initiating a re-planned explanation. Re-planned explanations in ICICLE will be initiated in two ways: when the student accepts the explanation, but then fails to improve his or her performance with respect to the concept involved; or when the student does not accept the information immediately, asking instead for additional/different explanations. This will be addressed further in Chapter 4. 3.2 Motivation Having presented the essential architecture of the system, I would Like to take a moment to outline our general approach and compare it against literature relating to tutoring systems and second language instruction. 3.2.1 A Cyclic Approach As mentioned in the Introduction, ICICLE's interaction with the user has a cyclic nature; the user submits text to the system for review, the system presents the user with constructive feedback, and the user can make revisions and submit new text. This cycle is portrayed in Figure 3.2. In it is reflected the two tasks of a tutoring system Which were lain out by (Glaser et al., 1987): that of the diagnostician, who must discover the nature and extent of the student's knowledge (in our system by accepting and analyzing userproduced second language text), and that of the strategist, who must plan a response to this discovery (manifested in ICICLE when the system plans and produces tutorial feedback tailored to the learner). Note that the two participants in this cycle (the user and the system) essentially take discrete turns; the user completes his or her composition before giving it to the system to analyze, and the system controls most of the session during the delivery of tutorial feedback. This approach to tutorial instruction, where a user completes a task before receiving any instruction on his or her performance, is motivated by the theory that the cognitive demands of some tasks are so intense that learning is hampered during their execution (Owen and Sweller, 1985; Sweller, 1988), necessitating a post-completion review. It is our belief that the composition of original text in a nonnative language is a task of this level of cognitive difficulty. Researchers in the field of computeraided learning have found that postperformance review or reflection is a powerful strategy for learning, and that computerbased learning tools are ideally suited to such approaches since they can perfectly capture the user performance and then review any aspect of it (Collins and Brown, 1988). ICICLE therefore endeavors to utilize its input/response cycle to provide an optimal learning environment. It maximizes the knowledge derived from the composition experience through such a strategy, enabling the user to execute selfcorrection through review and instruction. 3.2.2 Teaching a Second Language Having outlined the structural nature of the system interaction with the user, I should also address the content of that interaction. ICICLE is a system whose functionality is based on giving a second language learner explicit feedback on the nature of his or her errors, and yet researchers in the field of second language acquisition have questioned the effectiveness of explicit instruction in the acquisition of new language forms (Krashen, 1981; Beck et al., 1995; Carroll, 1995). These researchers draw a distinction between positiveor "Type 1" data, which is exposure to the language being acquired, and negative, explicit, or "Type 2" data, which is explicit instruction on what forms are and are not part of the target language. Krashen claims that explicit (Type 2) data results only in the modification of a "Monitor" which can correct an utterance before or after realization in speech, and he takes a strong stance on the distinction between genuine "acquisition" of second language grammar and the superficial "learning" which results from explicit instruction (Krashen, 1981). The result of this learning is also called "Learned Linguistic Knowledge" and is held to be a separate, distinct area of knowledge representation in the mind (Schwartz, 1993; Beck et al., 1995). However, the stance against the explicit approach is not absolute. When examining whether or not Type 2 input leads to the restructuring of the learner's internalized second language grammar, (Carroll, 1995) finds that input must be recognized as corrective and not communicative in order to be effective, and must present novel data to the learner in order to initiate restructuring, but that the metalinguistic capability of experienced learners may be welldeveloped enough to make use of explicit, specific correction. (Cook, 1991) cautiously points out that explicitlytaught learners can and do achieve fluency in practice. Krashen also dissolves some of the absoluteness of the Monitor Theory in (Krashen, 1981), where he holds that formal classroom learning does indeed result in acquisition when the classroom is a "high intake" environment. He defines "intake" as input which aids acquisition. In Section 4.3.1 I will discuss Krashen's Comprehensible Input Theory in more detail, but to briefly describe it, he maintains that the input which aids acquisition is that which occurs just beyond the learner's current level of proficiency. Therefore, when explicit classroom instruction on grammatical forms takes place using the target language at that accessible level for the learner, it results in acquisition --- but of the forms being used in the instruction itself, not of the forms being taught. Krashen goes further to state that for an adult learner, the formal classroom situation is more likely to provide the intake needed for acquisition than informal conversational situations, so that the classroom may actually excel over informal situations for adult learners (Krashen, 1982). Since ICICLE generation component will be taking care to provide the "intake" that Krashen describes, it will therefore be contributing to acquisition even if one espouses the view that the actual content of the explanations will only be resulting in "learned" knowledge; and even those researchers who draw this distinction admit that both are present in the language performance of any learner, and some further state that both aspects are required for high literacy in either a first or second language (cf. (Bialystok, 1981; Vygotsky, 1986)). Furthermore, an Interface Theory of second language acquisition (Bialystok, 1978; Ellis, 1993) holds that these two areas of knowledge are not entirely segregated, and that explicitlytaught knowledge can become internalized knowledge over time, although the acquisition may be constrained by "learnability" concerns tied to the natural order in which learners acquire forms --- i.e., a learner will only acquire what he is taught when he is developmentally ready to acquire it. The conclusion I draw from this is that the prevailing research supports rather than undermines ICICLE's approach to second language instruction. Not only will ICICLE be designed to provide intake by focusing its language production on the level of comprehensible input for the user, but since it will constrain the topics of its instruction to those which the user is ready to acquire (see Section 5.2), it will satisfy learnability constraints as well, so both the content of its message and the form it takes should lead to positive effects on the learner's production of English. 3.3 Implementation Status ICICLE's error identification component has been implemented with partial functionality. We have developed an augmented grammar for a parser which is descended from the one presented in (Allen, 1995). Our implementation makes use of the COMLEX Syntax 2.2 lexicon (Grishman et al., 1994). Since there is no user model at this time, choices between multiple parses found by the system are made arbitrarily. A Tcl/TK windowbased interface allows the user to type in or load a text file, request an analysis, and view the results. Sentences containing problems are highlighted in colors corresponding to the type of error and "canned" onesentence explanations of the error can be accessed. The existing system makes no attempt to model the user or the domain, and does not employ actual text generation. The user model I am outlining in this proposal should lead to a revised system in which a bi-directional flow of data has been established between the error identification component and the grammar model, basing the parse selection on data in the model and then updating the grammar model according to error statistics from the analysis of the student's text. The user model will also provide the foundation for our text generation module, which is discussed further in Chapter 4. Chapter 4 Text Generation in ICICLE In Chapter 2 I reviewed the work of many explanation systems which were driving toward a certain common goal --- that of generating text which was "informative," or "contextdriven." This same goal motivates our work with ICICLE. The essential goal behind the text generation component we are developing is to be highly sensitive to the context of the generation activity. As the Introduction established, "context" is defined in this work as encompassing all of the following components: the preceding dialogue, the related concepts in ICICLE's tutoring domain, the user's domain skills and underlying knowledge, and the user's history of system use. In order to achieve sensitivity to such adverse context, it is important that the generation component we propose employ a high level of interactivity with the knowledge bases which provide information about the user and the domain. It is our hope that we are proposing a text planner design that would be able to accomplish this level of interactivity. This chapter will overview our design for a text planner which relies heavily on multiple sources of knowledge in order to make its planning decisions. This discussion precedes that of the focus of the proposal (the user modeling component) in order to clearly designate how we will need to design the user model in order that it may provide the needed information to the planner. The goals of this planner were outlined in Chapter 3 and I will repeat them here. We wish the planner to be capable of: producing a wide variety of tutorial approaches; choosing between these approaches, planning their structures, and determining their information content according to the learning styles and knowledge of the student; and enriching its text with relevant information from the dialogue history and the student's domain knowledge. By accomplishing this, it will not only meet the unique needs of the individual learner, but it will promote an environment of "meaningful learning" (Brown, 1994) where related information is tied together to form stronger and more permanent associations. 4.1 Planner Overview What follows is an overview of the model which has been developed for planning the tutorial responses of the ICICLE system, previously described in (Michaud and McCoy,1998).The complexity of the information exchange in the model suggests a need for breaking down the planning process into many stages, combining both bottomup and topdown processing techniques. In the bottomup phase, the model first organizes the explanations to be generated into a linear order based on the topic of each; in the topdown phase, each explanation is fleshed out one at a time and then revised before realization. The multiple phases of processing are driven by successive foci of attention, represented in an "Anatomy of a Response" [My thanks to Chris Pennington, who originated the Anatomy of a Response idea.] which consists of: o content: the error or errors being discussed in a given system action (explanation) o method: the pedagogical approach employed in discussing the content o form: the semantic structure of sentence specifications (each containing a specific rhetorical force) that will eventually realize the method o history: the discourselevel modifications and annotations that result in an explanation which explicitly realizes its context in the domain and relevant domain concepts o manner: [Please note that the meaning of the term "manner" referred to in the previous publication (Michaud and McCoy, 1998) has since shifted; the "history" phase now covers what was termed the "manner" component in the past.] preprocessing performed directly preceding surface text generation to establish a linear order of propositions The subsequent sections will illustrate how these phases have been conjoined to form the framework of an elaborate explanation planner to fit the needs of the ICICLE system. An illustration of the phases showing both the bottomup and the topdown processing can be found in Figure 4.1. 4.1.1 Content The content of an ICICLE explanation, in terms of this initial phase of explanation planning, is expressed at the most general level: it specifies the specific error or errors from the user's text which are being discussed. The error analysis phase of ICICLE passes to the tutorial response generator a list of errors the user has committed; these are the seeds from which the generator will construct its feedback. In a contextaware system, these seeds cannot be treated as autonomous content units, since part of the context of an individual unit is made of those units that precede and follow it. They must be arranged and ordered so that the resulting text presented to the user is cohesive at the large level (over all of the individual responses). Research in language pedagogy (Anderson, 1993) and empirical studies on learning from written texts (HayesRoth and Thorndike, 1979) both suggest that grouping together related information would be more effective (in terms of the learner's absorption of the information) than explaining each error in the order in which it occurs in the essay. The first phase of explanation planning, therefore, is to group related content units and to give them an overall order. This will be accomplished through referring to the domain knowledge base; it will contain information about the errors recognized by the system as well as possible grouping strategies for clustering them according to shared features. As part of this clustering, errors of identical type will be merged into one explanation to avoid duplication of effort. Next, the order of the clusters will be determined using information on how to best structure the overall discussion flow, completing the bottomup phase. 4.1.2 Method Once the explanations have been placed in order, the first can be sent to be processed by the method phase. This next part of the planning process (and the first part of building the topdown plan) selects a tutorial approach for addressing each error. Given the highlevel goal of instructing the user about a given error, the system must now begin building the topdown plan to accomplish that goal through text. ICICLE will have several possible tutorial methods at its disposal, based on research in second language pedagogy. Each of these approaches may appeal to a different style of language learner. Among the possibilities maybe: o To simply provide a corrected form of the sentence. o To explain the grammar construction that was used incorrectly. o To provide examples of sentences that illustrate proper usage of the faulty grammatical construction. o To compare and contrast the grammatical construction involved with its corresponding construction in ASL. Figure 4.1: The Anatomy of a Response, separated into bottomup, topdown, and revision phases. The choice between these methods will be motivated by the user model discussed earlier in this paper and detailed more specifically in Chapter 5. The domain knowledge model provides information on what concepts in the domain the user knows and what he or she is likely to understand, and the system history informs the planner on the user's longterm performance given what methods have been attempted in the past. Over time, the system should be able to make principled decisions on what style of instruction is best suited to this individual. 4.1.3 Form The selection of the tutorial method sets a general course for the explanation, but in the determination of the specific structure, or form, there are still details to be processed. One could see the method selection as having chosen a general schema for the explanation; the form phase processes the options within that pattern according to the domain knowledge of the user. Of primary interest here is the requisite knowledge for understanding an explanation. A method which involves explicitly discussing certain grammatical concepts, for instance, would have been chosen only if the student either knows these concepts or is in a position to be able to grasp the concepts if instructed on them. In the latter case, the form constructed by the system must include additional explanatory material to fill in the requisite knowledge. At points in the explanation where an unknown but learnable concept is named, a recursive explanation must be generated. To show an example, take the method of stating the grammar rule that has been broken. If the user is familiar with the rule, the system can just inform the user of the type of error, and generate a sentence like: "This sentence contains an error in subjectverb agreement." Alternatively, the user model may indicate that the user is not familiar with the concept of the grammar rule, but is ready to learn about it. The system would then generate a short explanation of what this rule is in English: "This sentence contains an error in subjectverb agreement. In English, thirdperson singular subjects require a present tense verb to have the agreement marker S at the end." If the user model indicates the user could use a reminder what thirdperson singular subjects are, a second level of recursion could be added: "This sentence contains an error in subjectverb agreement. In English, thirdperson singular subjects (pronouns like HE, SHE, and IT, singular noun phrases like THE DOG, and names like JOHN) require a present tense verb to have the agreement marker S at the end." A particular quality of the recursion here is that the additional propositions may also follow the same structure possibilities defined at the top level; the first extra sentence was just another simple definition of a concept, where the second was a list of examples. Because of the recursive nature, the same structuring decisions could be used at each level. To avoid an infinitely recursive explanation of this type, the method selection will need to calibrate its decision metrics so that the user's lack of domain knowledge is not so profound that the planner needs to define the terms in every definition. Intuitively, this should not be the case if this topic is in the "current" realm of learning for this user; the vast majority of the prerequisite concepts should already be acquired in order for this topic to be deemed "learnable." Note that the examples written out above are not intended to proscribe the exact order the propositions containing these concepts will occur in, or to indicate exact sentence structure and complexity; the form phase will only generate propositions at the semantic level, and the manner phase will have the responsibility of ordering the propositions, while the realization process will be selecting between the syntactic choices. As mentioned in the Introduction, syntactic complexity will be influenced by those grammatical constructions the learner is attempting to acquire. It is therefore divorced from semantic content decisions. While making those semantic decisions, however, the form phase will retain the rhetorical connections between the propositions it generates for the purposes of revision in the history and manner phases, as will be discussed later. The form phase obviously makes heavy use of the user domain knowledge model. It does not expect the user model to always be infallible or complete, so it will take into account the possibility of explanation failure and the need to repair. The issues behind handling possibly incorrect user models will be discussed briefly in Section 4.5. The text plan at the completion of the form phase contains a first draft of the basic propositional structure of the discourse, molded by the tutorial method chosen and fleshed out with prerequisite data. The next step in the process is to complete the process of informing the explanation of its context by adding explicit contextual references. 4.1.4 History The primary job of the history component is to take the dialogue history into account. This involves both tying new knowledge into existing knowledge through references to the recent and established past, and making certain that history does not repeat itself in the form of redundant explanations. In order to achieve these goals in a text plan which is otherwise mostly prepared for sending to a surface realizer, our approach views the history phase as a revision of the existing plan to add this contextual material. It has been observed (Moore, 1993; Rosenblum and Moore, 1993) that comparison and contrast to recent and established material is a powerful tool of humangenerated explanations, and essential for generating comprehensible tutorial discourse. At this time, therefore, the text planner needs to begin to make adjustments to its plan in order to insert comparisons and references where possible. The domain context which needs to be exploited for this step is found in three tiers of proximity: the information discussed within this group of explanations, the information discussed earlier in a session, and established information the user has already learned. The strategies for referring to this context through revision need to accept the propositions planned by the form phase and, operating on their relational structure and the sources of relevant information on hand, perform modifications to accommodate established knowledge. In some cases, this involves modifying propositions which exactly duplicate explanatory material from earlier in this group of explanations; exact repetition is unnecessary, or at least it should be explicitly marked as a repetition. In other cases, the actions taken at this step will involve generating additional propositions comparing parts of the explanation to previous ones and to related domain information known by the user. The revised structure is of the same format as that coming from the form phase: semantic propositions linked by their rhetorical relationships, ready for the final processing before surface realization. 4.1.5 Manner While the previous phase produces a plan which is semantically developed and contextually appropriate, at this phase in the planning the semantic propositions plan are not constrained to a linear order, a necessary step before realization. It is the job of the manner phase to make the final decisions about the linear flow of the explanation, serializing the propositions for generation. This step may also involve some preprocessing of the utterances to be sent to the realizer so that the clause structure reflects the syntactic goals of providing the user with example constructions from his or her current level of syntactic proficiency. Once this phase has completed these adjustments, the plan is ready to be sent to the surface generator, realized into English text, and displayed back to the user. 4.2 Operationalizing a MultiPhasic Text Planner An integral facet of the proposed planner that has just been outlined is its multiphasic nature. Division of language generation into phases is not novel, although many systems have differed on how to divide the process. Woolf's division of planning levels between selecting a pedagogical method and choosing a framework for implementing that method is similar to ours (Woolf, 1984). Likewise, Cawsey's EDGE system made a distinction between choosing the content and mapping out the type of explanation (Cawsey, 1993). As in our division between selecting a method and planning out the form, both systems addressed the need to first decide on a basic method, and then what to include or leave out. Most systems seem to have divided the process into only two phases, but division into more than two levels of planning for multisentential text was previously implemented by other systems including (Rambow, 1990). As introduced above, the planning process for ICICLE's tutorial response generation will be five-tiered, mapped out in an "Anatomy of a Response." First, the content planning will group and order the content units passed to the response generator from the error identification module; then the method planning phase will apply a first set of planning operators to select appropriate tutorial techniques for discussing each unit of content in order; then the form planning phase will address those goals posted by the method selection by selecting planning operators to flesh out a hierarchical text plan using that technique, and the history phase will revise the plan to add context from the dialogue and domain information; finally, the manner phase will provide specific preprocessing in order to establish a linear order for surface realization, and then the resulting plan will be ready to be fed into a text realizer and passed to the user interface. Information about this Explanation (CORRECTFORM ?original ?correction) The correct form of the sentence ?original is ?correction. (BROKENRULE ?original ?rule) The broken grammar rule in ?original is ?rule. Information about the User (GOODMETHOD ?hearer ?method) The system history model indicates that the ?method is a good tutorial method to use with this user, or at least does contradict that possibility, where ?method can be one of: CORRECT, TELLRULE, EXAMPLES, or COMPARE. (KNOWS ?hearer ?concept) The domain knowledge model indicates that ?concept is in the user's wellestablished knowledge. (LEARNING ?hearer ?concept) The user model indicates that ?concept is in the hearer's zone of variation in his domain knowledge --- it is currently being learned. (UNFAMILIAR ?hearer ?concept) The user model indicates that ?concept is beyond the hearer's current zone of acquisition. (CANCORRECT ?hearer ?original) The hearer is competent to correct the problems in the original sentence. (CANLEARN ?hearer ?concept) The hearer is competent to learn this concept at this time; this means that the language feature ?concept is at the border of the hearer's zone of acquisition, or more specifically that a majority of the subconcepts involving in explaining ?concept are known. Domain Information (EXAMPLESOF ?rule ?exs) The list ?exs contains correct examples of the language feature represented by ?rule, preferably culled from the user's own work. (SUBCONCEPTS ?concept ?subs) The subconcepts involved in a definition of ?concept are ?subs. (DEFINITION ?concept ?def) The definition of ?concept is ?def. (ASLEQUIVALENT ?rule ?aslequiv) The closest ASL equivalent to the English grammar rule ?rule is ?aslequiv. Goals and Speech Acts (INFORM ?speaker ?hearer ?proposition) A speech act goal to inform the the hearer that a certain proposition holds true. (COMPARELANGS ?rule ?aslequiv) A goal to compare and contrast the difference between the English grammar rule ?rule and the ASL grammar feature ?aslequiv. Figure 4.2: Propositions and Goals Used in the Method and Form Operators. This section contains the basic designs of parts of the library of planning operators that will implement these multiple phases of planning in the ICICLE response generation module. What is included in this work is the depth to which these operators have been developed, represented to this level of detail so that the needs they place on the user model will be clear. Because the content level does not reference the user model, its implication will not be addressed. The operator design is largely inspired by the operators used in the EES system (Moore and Paris, 1992). They are designed for hierarchical agendabased planning with constraints that reference multiple sources of knowledge and the ability to recursively post subgoals, as discussed in Section 2.1.3. The subgoals are in the form of a CORE and possible CONTRIBUTORS. As with the EES work, the operators therefore represent not only one or more spans of text [Here we are using the RST definition of "span." ] , but also one "relation" connecting the core to each contributor in the operator. Note that the subgoals of EES were called NUCLEUS and SATELLITE, but we have not use these names. Although RST formed the original basis for Moore's work, she has since found issue with some of the shortcomings of RST relations (Moore and Pollack, 1992). A proposal for an improved discourse analysis theory was put forth in (Moser and Moore, 1996) and we intend to make use of this work. One of the issues addressed in the 1996 work is the need to maintain both intentional and informational structure. In (Moore, Unpublished) she defines this informational structure as containing "contentbearing relationships between the propositions express in discourse elements," and she notes that the organization of certain types of text follows this informational structure rather than the intentional structure, where the "natural" structure is to place the "core" element before its "contributing" definition. This representation is more pertinent to ICICLE's needs than an intentional one, as intentional relations will have very little variation in our type of explanation structure (Hobbs, 1996), but the conceptual relationship between entities mentioned in the propositions can not only be used to determine the relationships between the contents of different propositions, but can also be used by the realizer to determine the linear order of the clauses (Moore, Unpublished). For this reason, we have chosen to label the subgoals as CORE and CONTRIBUTORS, and to maintain informational rather than intentional links between subgoals generated by the same operator. This information will be used by both the history and manner phases when preprocessing before realization. ICICLE's operator design will not be homogeneous across the tiers; since each phase has a different objective, the design of the operators available to each must be different. The method and form operators discussed in the following sections are specified by the value of each of their fields, which is notated in a LISPlike format. For the interpretation of these values, Figure 4.2 lists the propositions used and gives a brief definition of their meanings. The fields that these propositions will occur in are: o EFFECT: the proposition this operator can be applied toward making true o CONSTRAINTS: those propositions which must be true or must be satisfied in order for the operator to be applied o CORE: the core subgoal or speech act to achieve the effect of the operator o CONTRIBUTORS: any additional subgoals which will assist the core in achieving the affect Some systems have drawn a distinction between "constraints" which must be satisfied without additional planning by the system, and "prerequisites" which can motivate additional planning to satisfy in order to select this operator (Littman and Allen, 1987; Moore and Paris, 1992). There is no "prerequisites" field in the design we propose, but the subgoals posted by the CONTRIBUTORS field indicate more or less the same idea by being optional as far as whether or not they are placed on the agenda. These goals must be satisfied in order to satisfy the main goal of the operator, but if they are already satisfied by the user model the system will not place them on the agenda. The CORE, on the other hand, must always be satisfied through planning. Since the planner is not constraining the linear order of the propositions generated until the manner phase, there is no need for the distinction between prerequisites and subgoals, both of which are CONTRIBUTORS. The rhetorical links between planned propositions will lead to an appropriate order when the time comes. Following are the prototype operators for the tiers method and form, and brief discussions of the design issues behind developing the history and manner phases. 4.2.1 Method: a Brief Sketch The complete specification of the method tier of operators will require research and analysis of second language instructional discourse in order to determine the general structure of typical explanations of this type. This kind of analysis has been used by other researchers in generation (e.g., (McKeown, 1985; Paris, 1987; Moore and Paris, 1992)) to develop schemata on which to base their explanations. The method and form operators of ICICLE would be somewhat similar to schemata, where the method selection would entail basically choosing between schemata, and the form phase would plan out the alternatives within that general structure. As a token example, we will postulate four operators along the guidelines of the original four possibilities for methods in ICICLE, which are restated here: o To simply provide a corrected form of the sentence. o To explqain the grammar construction that was used incorrectly. o To provide examples of sentences that illustrate proper usage of the faulty grammatical construction. o To compare and contrast the grammatical construction involved with its corresponding construction in ASL. These four choices are sketched in Figures 4.3 and 4.4. Note that all but the first posts goals that will need to be further refined; an INFORM statement in the first operator indicates that the system will merely inform the user of a fact. At this point, none of these operators is recursive and none is set up to combine more than one method into a single explanation; but these are merely sketches from which we may proceed in specifying what type of information the user model would need to supply for this tier of operators. NAME: CORRECT (Give a corrected form of the sentence.) EFFECT: (CANCORRECT ?hearer ?original) CONSTRAINTS:(AND (BROKENRULE ?original ?rule) (GOODMETHOD ?hearer CORRECT)) CORE: (INFORM ?speaker ?hearer (CORRECTFORM ?original ?correction)) CONTRIBUTORS: nil NAME: TELLRULE (Tell the user which grammar rule was broken.) EFFECT: (CANCORRECT ?hearer ?original) CONSTRAINTS: (AND (BROKENRULE ?original ?rule) (OR (KNOW ?hearer (CONCEPT ?rule)) (AND (UNFAMILIAR ?hearer (CONCEPT ?rule)) (CANLEARN ?hearer (CONCEPT ?rule)))) (GOODMETHOD ?hearer TELLRULE)) CORE: (KNOWS ?hearer (BROKENRULE ?original ?rule)) CONTRIBUTORS: nil Figure 4.3: Sketches of the Method Operators, part I. 4.2.2 Form The primary function of the form operators is to take the main goal posted by the chosen method operator and to refine that goal down to speech acts by generating subgoals as needed to provide additional explanatory material. This approach is similar to that described in (Moore and Paris, 1992), where the system opportunistically defines new terms if necessary at the point of generating specification for their surface generator. Instead of waiting until that point, however, these form operators will recursively specify clauselevel semantic propositions to provide the needed material during this earlier phase of planning. The two operators suggested in Figure 4.5 are possibilities for addressing the subgoal posted by the TELLRULE method operator from Figure 4.3. Here you can see an additional constraint that did not appear in the method operators: (CORE) or (CONTRIBUTOR) depending on whether the operator may be used to satisfy a goal posted in the core or the contributor of an operator. This constraint has been added in order to separate the operator which plans the core part of the explanation from that which recursively generates explanations of relevant subconcepts. In this case, the first operator that would be chosen would be the RULESTATEMENT operator, whose core is: (INFORM ?speaker ?hearer (BROKENRULE ?original ?rule)) To use an example from the Introduction, this could generate the simple statement: "This sentence contains an error in subjectverb agreement." [The exact syntactic structure of this utterance would depend upon the language level of the user. For instance, a more advanced learner who is trying to perfect relative clause usage may benefit more from the sentence, "This sentence contains an error that occurs in subjectverb agreement." See Section 4.3 for more details. ] The CONTRIBUTORS field of this operator can then post subgoals: that for any subconcept involved in this statement, the user must know this subconcept. Because these subconcepts need to be known in order for the overall concept to be known, these are the kinds of "prerequisites" mentioned above. If the user domain knowledge model indicates that a subconcept is known, this goal is not put on the agenda; otherwise, it is, and additional operators will be needed to handle it. NAME: EXAMPLES (Provide examples which illustrate this language feature's correct usage.) EFFECT: (CANCORRECT ?hearer ?original) CONSTRAINTS: (AND (BROKENRULE ?original ?rule) (GOODMETHOD ?hearer EXAMPLES)) CORE: (KNOWS ?hearer (EXAMPLESOF ?rule ?exs)) CONTRIBUTORS: nil NAME: COMPARE (Compare the language feature with its nearest equivalent in ASL.) EFFECT: (CANCORRECT ?hearer ?original) CONSTRAINTS: (AND (BROKENRULE ?original ?rule) (ASLEQUIVALENT ?rule ?aslequiv) (GOODMETHOD ?hearer COMPARE)) CORE: (COMPARELANGS ?rule ?aslequiv ) CONTRIBUTORS: nil Figure 4.4: Sketches of the Method Operators, part II. For the purely recursive part of the definition, the second operator applies. To further our example, if it is indicated that the user needs a definition of "subjectverb agreement," the generator could expand its explanation to this: "This sentence contains an error in subjectverb agreement. In English, thirdperson singular subjects require a present tense verb to have the agreement marker S at the end." This could continue, recursively defining other concepts such as "thirdperson singular subjects." As mentioned in the beginning of Section 4.2, we have based our operators on the designs discussed in (Moore and Paris, 1992) but plan to substitute "informational" relations for the "intentional" relations used in the original work. Informational relations are more relevant for connecting the individual semantic units generated by form operators, since the primary function of the propositions generated by this phase are to produce text which is related by the information it conveys. The implementations of the operators sketched roughly in Figure 4.5 will need to be augmented to build a data structure with explicit representations of the semantic links between the concepts mentioned in a proposition generated by the operator and those additional definitions spawned from those concepts. It is this informational relationship between clauses which would be most relevant to a realizer which is ordering and configuring the surface structure. It would also be relevant to the history and manner phases, as will be discussed later. In addition to these semantic relationships which imply a general ordering, the planner may also which to note a preferred ordering between the subgoals generated by a given operator. These sibling subgoals do not have informational relationships between them, and yet the general flow of the explanation will impose upon them a preferred order of satisfaction. Therefore, at the time of generating subgoals, additional relationships should be noted so that the order of the speech acts generated by sibling subgoals reflects the order in which concepts are discussed in the overall utterance. NAME: RULESTATEMENT EFFECT: (KNOWS ?hearer (BROKENRULE ?original ?rule)) CONSTRAINTS: (AND (SUBCONCEPTS (CONCEPT ?rule) ?subs) (CORE)) CORE: (INFORM ?speaker ?hearer (BROKENRULE ?original ?rule)) CONTRIBUTORS: (FORANY ?subs (KNOWS ?hearer ?subs))) NAME: RECURSIVEEXPLANATION EFFECT: (KNOWS ?hearer ?concept) CONSTRAINTS: (AND (SUBCONCEPTS ?concept ?subs) (CONTRIBUTOR)) CORE: (INFORM ?speaker ?hearer (DEFINITION ?concept ?def)) CONTRIBUTORS: (FORANY ?subs (KNOWS ?hearer ?subs))) Figure 4.5: Possible Form Operators. The product of this phase of planning will therefore be the semantic specification of utterances connected by informational relations; this structure of semantic units is otherwise ready for the realization phase, but will first be sent through a revising process to add additional contextual information. This revision is handled by the history component. 4.2.3 History: a Revision Approach The history phase modifies the existing text plan to adjust the explanations to accommodate the context in the dialogue history and the user's domain knowledge. Its two responsibilities are to eliminate redundancy and to generate explicit references between related concepts. The latter is an essential task to enable the student to see connections between the new knowledge being presented and knowledge previously or even recently discussed, or between the new knowledge and established knowledge he or she already has. The exploration of human explanation strategies for the Sherlock (Moore, 1993; Rosenblum and Moore, 1993) and Migraine (Carenini and Moore, 1993) systems [Sherlock and Migraine are applications in which the EES planner has been implemented.] led the authors to develop a taxonomy to classify the different types of contextual effects to be found in human explanations, of which there were four: o explicit reference to a previous explanation to point out similarities or differences o omission of previouslyexplained material to avoid distracting the student from novel information being presented o explicit marking of repeated material to distinguish it from new material o elaboration of previous material in forms of generalization, more detail, or justification Because of the bulkiness encountered by the other systems when attempting to plan contextual references concurrent with planning the core structure of the text, we propose to implement some of these effects as revision techniques in ICICLE's penultimate planning phase. Revision in text generation can be described as the process of building an initial draft of the text at some level of representation and then changing that text before presenting it the user. Existing systems primarily focus on performing the changes subsequent to syntactic and surface realization decisions, since the objective of the revision is to make syntactic changes only, improving style and/or readability while leaving the semantic content the same. Such systems include the Yh system (Gabriel, 1988) which combined propositions in order to improve readability, Rambow's Joyce system (Rambow, 1990), and the REVISOR system by (Callaway and Lester, 1997). These approaches are therefore not highly applicable to ICICLE's revision goals, since we desire to add to the semantic content of the plan, and furthermore the semantic inclusion choices are divorced from surface realization concerns, which are driven by the user's language level. The choices of what additional information to include must be driven by the context of the performance of the user in terms of what he or she knows about the domain, and the recent dialogue history; the realization choices must be driven by the user's input needs and level of reading comprehension. Therefore, ICICLE's revision process, both semantically motivated and semantically operating, must focus on the propositional structure built by the previous stages, working before (and independent of) syntactic realization choices. One existing system focuses on content elaboration. Robin's STREAK system (Robin, 1993; Robin, 1994) generates newspaperstyle summaries of sports scores from a database of basketball game results. After its initial planning process STREAK uses "revision operators" to take existing structure and produce an altered structure adding pieces of historical context, such as the recent history of a given player or team. STREAK's goal of adding content (specifically, adding historical context) makes its goals very close to those of ICICLE; for this reason, the general implementation idea of revision operators which produce changes on a planned structure has been adopted for our history component. However, STREAK, like the other revision systems, plans its revisions after realization decisions have been made, and it currently only generates a single sentence. ICICLE's history operators will therefore be largely different from those which Robin implemented. ICICLE's revision goals are unique: it is adding content rather than revising syntax, and its inclusion decision must be independent of surface realization concerns. It needs to draw both from the dialogue context like Migraine and Sherlock, and from the domain knowledge base like STREAK. To appease all of these goals together, our approach to revision must be novel. To complement the planning operators for determining the first pass of rhetorical information, our planner will therefore also include a library of revision operators that function entirely on the semantic level and the "informational" links addressed earlier to transform that structure into the final form for passing to the realizer. In the following list of proposed techniques based on the taxonomy developed for Sherlock and Migraine (Moore, 1993; Rosenblum and Moore, 1993; Carenini and Moore, 1993), "context" is to be read as the large context containing the user's established knowledge in addition to what has been mentioned in recent or past explanations: o If a concept mentioned in an explanation is similar to a concept in the context, make explicit reference to the similarities and distinctions between the two. o If an explanation that has been planned is reiterating all or part of an explanation done earlier in this session [See next section for why this may be possible], modify the latter explanation so that it does not repeat all of the information mentioned before. How much of the information it repeats depends on the system's estimate of how well the user understood the previous explanation (i.e., is it completely new data or a reinforced explanation? Was the user able to perform the correction the last time?). If there is repeated information remaining, include specific mention that the information has already been discussed. o When information in an explanation is being repeated from an explanation in an earlier session, ensure that the repeated information is explicitly marked to indicate that this is old and not new. Figure 4.6: Expanding and then revising a definition. The revision operators of ICICLE will take an existing completed semantic structure (the output from the form phase) and the information in the user model and the domain database to create a new, modified structure. This structure may have added new propositions which do not change the existing propositions (see the simplistic example in Figure 4.6), or it may have added specifications to existing propositions so that the realizer will know to add phrases like "Remember..." or "As I was saying before when I talked about X..." in order to explicitly mark repeated information. There are several issues involved with further developing the history process and its operators. The historyplanning mechanism needs to make principled decisions on when to add comparisons to related domain concepts. Some of these principles may be based on the complexity of the existing explanation; if there are multiple branches in the informational structure, indicating many concepts and subconcepts being defined, a comparison may overcomplicate the structure. Also, it may be most useful to draw comparisons only to the main topic of the explanation and not the subtopics. These constraints will need to be built into the operator library or the operatorselection mechanism. Furthermore, when an explanation item is selected for this kind of revision, the planner must be able to find something relevant for the comparison. In order to develop the part of the planner which scans the user's domain knowledge for relevant items, we will need to investigate methods of determining similarity between objects in the domain. One possibility is to follow the lead in (Rosenblum and Moore, 1993; Lemaire and Moore, 1994) and investigate casebased reasoning such as that described in (Aleven and Ashley, 1992) to select legal cases which share or contrast characteristics in specific ways. Note that Figure 3.1 included a database of language features in the domain knowledge base; it is this knowledge source which will supply information on the properties of the different domain concepts for the comparison/contrast actions. At the conclusion of this phase, the explanation is fully developed semantically. Propositions which represent utterances are linked with informational relations in a hierarchical text plan which is almost ready for surface realization. The manner phase will provide the final step of preprocessing and then the explanations can be realized and presented to the user. 4.2.4 Manner In this last step of processing, our generation system needs to serialize the hierarchical text plan. This process will rely upon the informational links connecting the planned utterances in order to place them in a linear order to be fed to the surface generator. As mentioned above, the core/contributor relationships will be instrumental in deciding part of this order; other information will come from the relationships of propositions in the hierarchy to their parents, and orderpreference relationships between siblings not separated by the core/contributor distinction. 4.3 Realizing the System Response I mentioned in the Introduction that one of our goals is to tailor the syntactic level of the surface output to the acquisition level of the learner. In this section, I will discuss more deeply why this is a goal for ICICLE, and overview how it might be accomplished using an available text realizer. 4.3.1 Comprehensible Input A serious concern for a tutoring system teaching a second language is to produce instruction that can be understood by the learner. Because ICICLE is currently unable to conduct any instruction in the learner's native language, all instruction must be written in English and care must be taken that the level of syntactic complexity does not overwhelm the user. We are particularly concerned about this for our target learner audience, who (as mentioned in the Introduction) typically have very little access to text which is at their syntactic level (Anderson, 1993). We are therefore looking into how we can design the system to produce text at a syntactic complexity appropriate for the learner using our system. It has been observed that humans unconsciously "simplify" or otherwise modify our speech when addressing second language learners (SchinkeLlano, 1994; Snow and HoefnagelHohle, 1982; Krashen, 1982), but questions have been raised as to whether oversimplification is counterproductive in a learning environment (SchinkeLlano, 1994). Because of this, we do not wish to design ICICLE to simply produce very simple syntax regardless of the learner's actual proficiency; we wish for the level to vary along with the learner's progress. Stephen Krashen's ideal of "comprehensible input" (Krashen, 1985) forms a basis for our approach. His theory of acquisition holds that a second language learner acquires the target language through being exposed to grammatical forms which are slightly beyond his current level of proficiency; that is, if the learner is currently at level i, the forms to which he must be exposed are at level i+1. With the help of extralinguistic information such as context and content, the learner can still understand the input even though he has not acquired the grammar it contains. Krashen holds that this input is not only helpful to acquisition, but essential (Krashen, 1981; Krashen, 1982; Krashen, 1985). There has been some argument that tailoring input to exactly level i+1 is not helpful (Krashen, 1981; Ellis, 1992). Natural "foreigner speech," does not hit only level i+1 but also its surrounding levels as well (Krashen, 1981; Krashen, 1982; Krashen, 1985). It may even be harmful to consciously tailor speech to that specific level (SchinkeLlano, 1994), partly because it may distort the natural communication (Krashen, 1982). However, these observations have been made based on human-human interaction and what we can accomplish unconsciously compared to what happens when we concentrate on a specific task; an automated text generation system cannot accomplish anything unconsciously, nor will its actions be affected by the direction of its attention. Therefore, we conclude that directly designing our system to provide level i+1 input is a desirable approach. We do not need to constrain the realizer to produce only input at this level; this would be both very difficult and unnatural. Krashen suggests a "shotgun approach" where the speaker aims for i+1 and acknowledges that the actual area hit will be larger than this. The advantage to this approach is that it not only provides comprehensible input, but also review of known forms and some input which is a little beyond what the user is ready to acquire, an effect which Krashen calls "anticipation." Note that a central aspect of this approach is to be able to tell where a user is on the road to syntactic fluency --- and, specifically, what forms are at, above, and below his or her level. This will be discussed more thoroughly in Chapter 5. 4.3.2 Using FUF Our current goal is to modify FUF, a functional unificationbased system (Elhadad, 1993), for our realization purposes. Specific exploration into the details of this modification has not yet been made, but the semantic specifications produced by the form phase, revised by the history phase, and linearized by the manner phase would be essentially FUF specifications, clausesized semantic propositions connected in a representation of the informational structure of the discourse as described in Section 4.2.2. The modified realization engine would take these specifications and produce appropriate English output. If we order alternative possible syntactic structures for a given sentence according to the user language acquisition model, those constructions that are at or near the user's learning level can be preferred over other potential realizations. This ordering would be dynamically generated upon entry to the realizer; if the user model is updated after analysis of a text and the acquisition model changes, a new ordering of the alternatives would reflect this change so that each system response reflects the current user status. In this way, ICICLE will help provide the understandable input so needed by this learner population. The fully realized response will be presented to the user through the interface as described in Section 3.1.5. Once an explanation has been executed, what remains is to handle its success or failure. This is detected through subsequent user performance and is discussed in the next sections. 4.4 Presenting the Explanation to the User When the ICICLE interface presents a fully generated explanation to the user, the window in which the explanation appears will contain several possibilities for user response. The primary function of the explanation window is to provide the user with the opportunity to make a correction to the sentence(s) involved and then to continue with the system response; the window will, however, also make allowances for an explanation which has not completely satisfied the user. This approach is partially motivated by a view of explanation systems as engaging in a "negotiation" of meaning, proposed by (Pollack et al., 1982). It is similar to that in the Migraine and Sherlock systems, where it is expected that a first explanation is rarely sufficient, and that most students will ask for more information (Carenini et al., 1994; Moore and Mittal, 1996). Migraine uses a practice of highlighting selected terms or topics when the mouse cursor is over them; clicking on these terms brings the user a menu of possible questions to ask on this topic. In this way, it allows for the user to access information which the system decided not to include in the explanation. TAILOR (Paris, 1987; Paris, 1988) also allowed the user to request information that has been omitted from a description. In our implementation, the user interface will give the student multiple opportunities to request additional information or a different presentation of the given content. Where the form phase decided not to define a term used in an explanation, that term will be underlined to indicate a kind of "hyperlink" to the definition which was not generated. The method decision of the generator will also be alterable, with buttons on the interface to allow the user to request an explanation of a different type, at which time the explanation will be re-planned accordingly. Although both of these types of user response call into question the validity of the decisions made by the planning system, the interaction of the user with these interface components will not negatively affect the user model in any way; it will not cause a downward revision of the model's indication of user knowledge. This is motivated by a desire to avoid penalizing an aggressive learner as described in Cawsey's evaluation of EDGE (Cawsey, 1993). It is our belief that the generation of an explanation may result in increased user knowledge, however (if the explanation is successful), and therefore there may be an upward revision in the model as a result of user interface actions. I will discuss this possibility in more detail in Section 5.5.3. An additional question remains regarding whether or not the system should allow a user to "give up." If the user cannot gain any additional insight from exploring definitions or alternative ways of explaining the error, it may be advisable to allow the user to leave the sentence asis rather than trap him or her in a continuing cycle of fixes until the correct form is stumbled upon by trial and error. This is a topic for further research. 4.5 Recovering from Failed Explanations The discussion of ICICLE's generation system thus far has not yet included any attempt to address the issue of explanation failure, and yet it is a very real possibility that ICICLE's user model is inaccurate at the time of generating a given explanation. The decisions based on an inaccurate model will be faulty, and the explanation is not likely to succeed in making the user understand the grammatical concept it describes. Even if a correction is made at the time of the explanation presentation, in the long run the performance of the individual will reflect that the concept has not been learned. The response generator, therefore, needs a mechanism for tracking its handiwork over the long term. Part of the planning process will require comparing the errors addressed in a given system response with those from earlier ones in order to see if the same grammatical concepts are being explained again. This is an indication that the system will need to adjust its explanatory approach. The EES system exhibited a similar awareness of its own success and failure. When a user posed a question after an explanation, the system built a communicative goal to instruct on that information and then checked to see if this goal had been attempted by the system before (Moore and Paris, 1992). In this way, the system could see that the previous attempt failed and is being retried, and plan accordingly. Likewise, Menotutor (Woolf, 1984) had the ability to shift tutorial approaches when one was not succeeding. In ICICLE, the error identification phase produces a rich amount of user performance information, and this source can also be used to judge the success or failure of past explanations. This will be discussed more in Section 5.5.1. An explanation in ICICLE could fail due to a number of possible causes, including having selected the wrong explanation method for this user, having left out too much prerequisite information during the form decisions, or simply needing to reinforce what was previously said. The decision for leaving out prerequisite information is based on the underlying domain knowledge model and what information the user supposedly knows. If this model is incorrect, it will need to be fixed; information for updating it will come from the error identification module, and will be discussed further in Section 5.5.3. Another failure situation, however, is when the system history's representation of which tutorial approaches to use with a user are wrong. If certain error features come up repeatedly for explanation, it must be concluded that the explanatory method being used is at fault. The system history model will need to be updated accordingly. This will also be discussed in more detail in the user modeling chapter, particularly in Section 5.4. The different types of repair situations I have described are analogous to those discussed in (Ringle and Bruce, 1981), where different types of failure and types of repair moves are modeled. The "failure cues" they list are: o Direct Assertion / Question: The listener makes a direct assertion about his or her knowledge or lack thereof. In ICICLE, this would be similar to the student explicitly requesting that the interface give him or her additional explanatory material, although as asserted above, in ICICLE this is not necessarily an assertion that the student lacks this knowledge. o Point Extension: The listener attempts to extend the point made by the system but fails to do so appropriately. This is analogous to the ICICLE user taking an explanation generated by the TELLRULE method and failing to extend that point properly toward corrections or future performance. o Inference Assertion: The listener makes an incorrect inference from the presented material. In our system, this would be when the student performs in such a way as to indicate that he or she has drawn an incorrect conclusion from an EXAMPLES explanation. o Analogue Assertions: The listener tries to apply an analogy made by the system. For ICICLE, this would be when the student mistakes the point made when the system drew an analogy between language features in ASL and English (the COMPAREASL method). These four types cover almost all of the possible method failures in ICICLE, and I believe we should follow the philosophy of having multiple possible treatments of the failures. Those repair moves listed by Ringle and Bruce which are relevant to ICICLE are: o Inference Explication: When the listener has failed to make the desired inference, the speaker can draw the inference explicitly. In ICICLE this would be a failure of an EXAMPLES explanation. Note that the framework for building this explicit repair move has not been discussed in the method or form operators discussed before; if we are to enable the system to produce this kind of explanation, additional operators will be necessary. o Knowledgebase Expansion: When the listener is missing the necessary background knowledge, the speaker can make a digression to cover it. This would be the situation in which the user clicks on an underlined topic for additional information and the system defines that concept. o Analogy and Example: When the listener is unfamiliar with a concept, the speaker can use analogies and/or examples to introduce it. In some cases of the abovementioned information request, the planner may wish to provide additional information of this sort, though this may depend largely on the ability to combine methods. o Rephrase Explicit Definition: The speaker may rephrase his or her point entirely. This is equivalent to replanning an explanation with a different method. Recovery from failed explanations is therefore a joint effort between the two action modules of the system, error identification and response generation. In the case of response generation, the creation of any response must first take into account the success or failure of past efforts on this topic. If there are failures, normal choices may be preempted in favor of specific repair moves. This is another topic for further development. Chapter 5 Proposal: A User Model for ICICLE It has been argued (Sparck Jones, 1991; Cawsey, 1993) that creating and maintaining a detailed user model in a generation system is a very difficult task. Sparck Jones in particular argues that concrete information about the user is hard (if not impossible) to obtain, and therefore modeling should be very general and very conservative. She holds that evidence for modeling in a dialogue system is likely to be "poor in both quantity and quality," and that "fancy modeling chasing the real person is unnecessary." Nevertheless, the user model concept I proposed in Section 1.2 is a rich and complicated architecture which, far from being general and conservative, must be specific and ambitious in order to sustain both the processes of accurate text analysis and of text generation as detailed in the preceding chapter. Sparck Jones' approach, while sufficient for the kinds of dialogue systems she is concerned with, would fall far short of the needs of a language tutoring system dealing with a wide range of knowledge and expertise, particularly when that system's primary purpose is individualized instruction. Despite Sparck Jones' pragmatic conclusions that complex user modeling is most likely impossible if not at least impractical, she does hold that user modeling is a beneficial effort with respect to text generation goals, and that a certain level of tailoring the text to the user is both desirable and useful. As I have shown in the preceding chapters, in order for ICICLE to be what we intend it to be, then user modeling is also necessary. In this chapter I will introduce the proposed ICICLE user model design, focusing on the knowledge models, which are my main interest for my thesis work. 5.1 Reviewing the Demands on the Model In Section 1.2 of the Introduction I outlined how the goals of ICICLE require a fourcomponent user modeling architecture. I will review the specifics here by identifying the queries that the ICICLE system will pose to the user model in the course of its operation cycle and what part of the user model that I have proposed will provide the answers. o During error analysis the error identification module will query the user grammar model regarding user proficiency on specific grammatical forms in order to correctly choose between parses and assign error tags according to the source of the error. o Between error analysis and response generation the error identification module will ask the user grammar model which of the list of errors found in the text are within the user's current realm of learning so that it may pass them on for correction. o At the method phase of response generation the response generation module will chose which tutorial method to use based on the system history model's indication of which is the strongest method for this user. o At the form phase the response generation module will query the domain knowledge model to determine the user's command of the requisite material for understanding the explanation to be generated. o At the history phase the response generation module will check the current plan against the dialogue history in order to eliminate or explicitly mark repetition. It will also query the domain knowledge model to determine the user's command of related topics for drawing analogies and comparisons. o During realization the surface generation process will ask the user grammar model for the ideal syntactic constructs to use in generating the explanations. This interaction is also displayed below in Figure 5.1. The component which is at the center of my interest is the user grammar model, and for that reason I have chosen it as the focus of my work. Although I will overview the basic aims of the user domain knowledge model and the history components below, the majority of the rest of Chapter 5 is devoted to discussing and motivating the proposed design for the grammar model and overviewing the implementation issues involved with it. Figure 5.1: Interaction of the user model components with the ICICLE system modules. 5.2 Modeling Second Language Acquisition The main purpose of ICICLE is to be a tool for improving literacy skills (more specifically, writing skills) in deaf students. Literacy figures for this population are poor to the extent of being shocking. With this problem impacting every aspect of a deaf student's education, the search for an approach to improving the situation demands a second look at the unique needs of this learner population and at how a natural language system may be designed to meet them. American Sign Language is a communication form used by many deaf individuals, and we have chosen to focus on native or nearnative users of ASL as our target learner group. While having a strong native language base is a great benefit in the acquisition of a second language, broad differences between the two languages such as those between ASL and English also pose challenges for the learner attempting to transfer general language knowledge from one to the other. For instance, ASL is a visualgestural language whose grammar is distinct and independent of the grammar of English or any other spoken language (Baker and Cokely, 1980). The signorder rules of ASL are not the same as the wordorder rules of English, and ASL syntax includes systematic modulations to signs as well as nonmanual behavior (e.g., posture and facial expression) that achieve a simultaneous mode of communication not possible with the completely sequential nature of written English (Baker and Cokely, 1980). We discussed these differences plus the differences in cognitive processing techniques between spoken and manual languages in (Michaud and McCoy,1998).Our conclusion is that written English is, for ASL natives, a distinctly different and challenging language, motivating the need to view the process of acquiring fluency in written English as second language acquisition and to incorporate that view in a strategy for facilitating the learning process. This view affects how we model the grammatical proficiency of the user in the system. The previous chapters already established that ICICLE needs a model containing information about the user's mastery of certain language constructs. Specifically, since the system parses the user's language production and identifies errors using a modified English grammar (see Section 3.1.1), the user grammar model must represent the user's command of those constructs which the parsing grammar recognizes as being used correctly or incorrectly. Our design for this model and the linguistic theories underlying that design are discussed below. 5.2.1 Interlanguage In previous work on the ICICLE user knowledge model, we established its essential nature as a representation of the user's location along the path toward acquiring written English as a second language (Michaud and McCoy, 1999). To design this representation, we have looked into the Interlanguage Theory of second language acquisition (Selinker, 1971; Ellis, 1994). In this theory, a learner produces utterances in the second language from an internalized grammar which represents his or her hypothesis of that language. This hypothesis is revised systematically over time as the learner acquires the language. The initial interlanguage model formed by the learner is based on very little target language knowledge and is largely incorrect, but as the learner progresses, more of the interlanguage correctly models the target language, and less reflects incorrect assumptions. As this progression is made, the learner perceives a difference between his or her productions and the input which he or she receives, leading to a revision or restructuring of the interlanguage grammar in favor of the target-like form (Brown and Hanlon, 1970; Selinker, 1971; Carroll, 1995). (Ellis, 1994) characterizes each "step" of the interlanguage grammar as sharing "rules" with the previous step, but differing in that some rules have been added or revised. He further claims that mastery of a grammatical structure begins with free variation between forms during the "acquisition phase," after which the variation settles down into appropriate use during the "reorganization" phase. Therefore, forms which are on the cusp of being fully acquired tend to exhibit variation in use, much of which will be grammatically inappropriate in its syntactic context. Figure 5.2: Progression of an interlanguage. It is our goal, therefore, to represent the user's internalized grammar as a moment in the progression of this interlanguage. The contents of the user grammar model should reveal to the system the status of the learner's acquisition of the grammatical structures recognized by the system. In order to create this model, we need to examine the nature of interlanguage transitions so that we may describe the path we expect the user to follow, and we must determine a way in which we can identify where on this path the user's current proficiency lies. 5.2.2 Focusing on the Frontier of Acquisition: the ZPD Lev Vygotsky's research in cognitive skill acquisition and education concluded that as a learner masters a subject, there is always some subset of that material that is currently within his or her grasp to acquire. Intuitively, it is this area that he or she is currently in the process of learning. He termed this subset the Zone of Proximal Development (ZPD) (Vygotsky, 1986). The general idea has been applied to second language acquisition by researchers such as(Washburn, 1994) and (Krashen, 1982). Krashen states that when the learner is at some level i in acquiring the target grammar, there is some part of the grammar at level i+1 that the learner is "due to acquire" (Krashen, 1982). In our domain, therefore, the ZPD corresponds to the portion of the interlanguage that is in the process of making a transition to the target grammar. The identification of the ZPD for a given second language learner would be an ideal indication of the next language features he or she will acquire --- those features on which instruction would be most beneficial because they are neither wellestablished nor beyond his or her ability to learn at this time. Focusing on this shifting frontier between "acquired" and "unknown" concepts has been the goal of other instructional systems such as LEAP (Linton et al., 1996), whose designers stress that instruction outside of this area is wasteful of time and effort because it does not result in learning. It is also consistent with the views expressed in (Glaser et al., 1987), whose authors stress that the assessment of learning needs to take into account the changes in the learner's performance over time and the systematic misconceptions in his knowledge. A graphical representation of this focal area changing as grammatical elements move into the "acquired" realm byway of the ZPD can be seen in Figure 5.2. The goal of identifying the ZPD in a second language acquisition context is aided by the suggestion made by other researchers that the second language errors committed by a learner systematically change over time (Dulay and Burt, 1974). Since (Ellis, 1994) holds that features on the edge of acquisition experience variation before settling into appropriate use, and empirical observation of language acquisition has shown the emergence of grammatical structures in "immature" or "transitional" forms which appear for a time before they are fully acquired (Brown and Hanlon, 1970; Krashen, 1983; Ellis, 1994), it is possible to conclude that the changing errors observed in second language learners are indicative of the frontier of their acquisition process. Since there is research supporting a typical sequence of acquisition for language features (Bailey et al., 1974; Dulay and Burt, 1975; LarsenFreeman, 1976; Krashen, 1982), identifying the errors made by a learner could not only indicate the frontier of his or her learning, but also what lies ahead, and what has already been acquired. Other research is supportive of this approach. John Anderson, in his system called Adaptive Control of Thought (ACT) (Anderson, 1982; Anderson et al., 1980) (based on the work of (Fitts, 1964)), lists stages of cognitive skill acquisition in which a learner progresses from "declarative" knowledge, a shallow representation which results in performance errors, to "procedural" knowledge, which is established and functional and can be used without error. Anderson further states that the transition from declarative to procedural knowledge is done through systematically purging the errors in the representation, a description consistent with our description of interlanguage revision. This declarative/procedural distinction has been explicitly applied to language acquisition (Fitts, 1964; Anderson et al., 1980) and is supported by the language assessment work of (de Jong and Verhoeven, 1992) and psychological studies such as (Chi et al., 1981). The fact that the use of declarative knowledge results in the production of errors ties this knowledge-level distinction into our ZPDbased theory, since as established above, the frontier (or Zone) is the area in which one should expect variation in language performance to occur due to incomplete acquisition. Features that have been acquired previously should occur without significant variation or error, and features beyond the ZPD should be absent from his or her language production because of avoidance or they should be used with consistent error because they cannot be avoided. 5.2.3 Toward Modeling the Interlanguage I have established the idea of modeling a user's interlanguage status as part of a second language tutoring system; we shall accomplish this through representing the user's linguistic mastery of the target language in a database of morphosyntactic grammar elements --- essentially, those grammatical structures which make up the "rules" of the target language as understood by the ICICLE system. I have chosen to design this architecture as an overlay model. The term "overlay" was introduced in Section 2.2 and refers to a user model which represents what the user knows as a subset of a predetermined set of facts or concepts. This approach is most useful in a system which has access to a complete set of concepts relevant to the domain and/or the system --- in our case, this set is made of those grammatical structures about which the system needs to know user competence level. Since there is a finite set of grammatical structures of English implemented in the parser as either "correct" rules or "mal" rules, that is our predetermined set. In an overlay model, the user's actual knowledge is indicated by "tagging" the user's level of knowledge on each item in the model, often with the labels "known" and "unknown." In the case of the user grammar model for ICICLE, we will tag each grammatical structure with the labels "acquired," "ZPD," or "not acquired" depending on the system's observation of user performance on each particular item. In my proposed design, not every item in the knowledge base will actually possess a tag, for reasons I will discuss later. The design of the model, however, allows information about untagged elements to be derived from tags directly recorded in the model. This will be discussed further in Section 5.5.2. After the model is initialized according to the first performance analysis of the student, it would be expected that over time those features [Following the terminology of Rod Ellis, I will occasionally use the term "features" to refer to morphosyntactic constructions or grammatical "rules." ] originally tagged as part of the ZPD would be later tagged as "acquired" when they are used with consistent correctness, and features that were tagged "not acquired" would move into the ZPD once the learner begins acquiring them. This design answers the demands placed on the user grammar model as I have stated them. The error identification phase could use it when selecting a parse for a given portion of text. Because of the correspondence between the structures in the model and those in the grammar, the tags in the model could cast a reflection upon the grammar and influence which parses are selected, as mentioned earlier. The parser can assume that structures tagged as "acquired" in the model will be used correctly with consistency, while those within the ZPD will occur with variation, and those which are "not acquired," or beyond the user's knowledge, will be either absent or consistently used with error. When choosing a parse, the system should favor one using rules which correspond to the user's expected mastery of the features involved. Thus the correct parse and source of error can be determined by comparing the possibilities against what constructions the user is expected to use correctly or incorrectly according to the model. A model of this type would also provide vital information needed for transforming a list of errors into the tutorial response. As established above, instruction and corrective feedback on aspects of the knowledge within the ZPD may be beneficial, while instruction dealing with that outside of the Zone is likely to be ineffective or even detrimental. Tutoring on material outside the ZPD which has already been mastered by the student is likely to bore them; tutoring on material beyond the grasp of the student at this time is likely to produce confusion or frustration. Further argument for aiming at the ZPD also comes from the Learnability Theory, which constrains the ability of second language learners acquiring language features via direct instruction to acquiring only those features they are ready to learn (Ellis, 1993). Since we have already equated Krashen's "ready to learn" area with the ZPD in our model, it seems obvious that ICICLE would reap the most benefit by restricting its instructive efforts to those features in the ZPD. When passing the error list to the response module, therefore, the error identification module can consult the tags on the user grammar model in order to prune the errors so that the tutorial responses are focused only on those errors at the user's current level of language acquisition (in the ZPD). Lastly, as discussed in Section 4.3.1, these tags would also provide the realizer with the knowledge of which grammatical forms to focus on in order to provide comprehensible input for the user by focusing roughly on his or her level i+1 area when choosing the surface forms of the text --- in other words, by choosing structures that the model indicates are in the ZPD. In formalizing our user grammar model design, I therefore need to capture three aspects of language competence: the past, the present, and the future. The model must be able to indicate those features of language the user has already mastered, those features he is presently attempting to acquire, and those features that are above his current level. Because judgments on user competence must be derived from partial information (the system may not always have empirical data on user performance covering all features in the model), I must also establish a method by which the system can infer a fuller description of user proficiency than is directly displayed in his or her use of language forms. Lastly, I need to establish how the model will be initialized and updated, and how it relates to the model of the user's explicit domain knowledge, which is also needed for the generation component of the system. Figure 5.3: SLALOM: Steps of Language Acquisition in a Layered Organization Model. 5.2.4 SLALOM: A Proposed Model Architecture The architecture proposed to serve as the user grammar model of ICICLE is SLALOM (Steps of Language Acquisition in a Layered Organization Model). What I discuss in this paper is based on the design originally proposed in (McCoy et al., 1996) and is an expansion of that discussed in (Michaud and McCoy, 1999). A very simplified representation of SLALOM's basic structure can be found in Figure 5.3. It captures the stereotypic linear order of acquisition within certain hierarchies of morphological and/or syntactic features of language, such as negation, noun phrase construction, or relative clause formation. Within a hierarchy, depicted as a vertical stack of boxes in the figure, a given morphosyntactic feature (represented by a box in the diagram) is expected to be acquired subsequent to those below it, and prior to those above it, according to the natural order of a stereotypical learner from this particular native language learning English [Although some research has shown high correlation between the orders of learners from different native languages (NLs), we wish to represent the most likely order possible and thus will be specific to learners from our given NL when necessary.] . Dashed lateral connections between features in different categories indicate those which we expect to be acquired concurrently; they align the acquisition stages in different categories. As mentioned above, this model would represent a given user by tagging the features as acquired, within the ZPD, or not acquired according to observations of the user's language performance on texts analyzed by the system. The tags may be revised over time as more data about the learner is collected or the learner's proficiency develops. In the latter case, those features formerly tagged as being within the ZPD would move to "acquired" status, and new features from the notyetacquired area would move into the ZPD. Because the SLALOM architecture represents an expected order of acquisition, the likely path of the ZPD would be to move "up" in the stacks. Although the following sections will address many of the issues pertaining to the interaction of the SLALOM model with the rest of the ICICLE system architecture, the specifics of the knowledge base's contents and organization have not yet been developed. Part of this development task will be to determine how to capture the relationship between the model and the grammar in order to allow for the needed conclusions about user competence to be drawn. We will also need to establish the specific organization of these features along the general lines I have already defined. I will address these issues further in Chapter 6. Figure 5.4: The explicit domain knowledge model: A definitional dependency hierarchy and its relationship to SLALOM. 5.3 Modeling Explicit Language Knowledge Section 3.2.2 discussed the distinction drawn between explicit language knowledge, also described as "metalinguistic" knowledge, and the implicit grammar knowledge represented in a learner's interlanguage. We choose to part from Krashen where his Monitor Theory is concerned and instead embrace Bialystok's Interface Theory (Bialystok, 1978; Ellis, 1993) that there is an exchange which occurs between a learner's bodies of implicit and explicit knowledge --- particularly, that instruction which results in explicit knowledge can be converted into interlanguage knowledge over time and with practice of the language forms, and that implicit grammar knowledge can result in explicit knowledge through conscious inferencing. We still feel the need, however, to model implicit and explicit knowledge separately within the ICICLE system. While interacting, these are still two slightly different realms of knowledge: namely, the learner's ability to use a language feature, versus his or her ability to talk about it (or to understand when someone else talks about it). The previous chapter detailed the need of the form phase to have access to a model of the user's knowledge of the grammatical concepts which are mentioned in the explanations generated by the system. At the topmost level, the concepts being explained are the rules or features of English which are displayed in SLALOM. However, there are underlying concepts mentioned in these explanations; for example, an explanation on subject/verb agreement may need to mention the properties of person and number in a noun which is the subject of a sentence. The underlying concepts support rules in the interlanguage but are not actually grammatical structures in and of themselves, and therefore would not be in the SLALOM model. However, in order for the form phase to be able to generate explanations using these terms, it needs to have access to the user's understanding of them. Therefore, we need, in addition to the SLALOM interlanguage model, a model of the user's underlying domain knowledge which captures his or her knowledge of the concepts behind the grammatical constructions referred to in SLALOM. Although they are two separate stores of knowledge, the architecture of this model and that of the Database of Grammatical Concepts described in Chapter 3 correspond highly. Both need to represent concepts as part of a branching hierarchy with the terms used in each definition indicated as preconditions for understanding the concept in whose definition they occur (represented by listing the preconditions as "children" in the hierarchy); however, while the Database of Grammatical Concepts holds information on how to generate the definitions of these concepts, the underlying domain knowledge model is a store of the user's knowledge on each concept, and thus does not need to carry any definitional information. Since some concepts may appear in more than one or even several constructs' definitions, the nodes of the hierarchy may have multiple parents to illustrate all of the grammatical constructs which depend on a given node. A representation of this can be found in Figure 5.4, which depicts not only the definitional dependency relationship (shown with arrows pointing from each concept to those concepts involved in its definition), but also the relationship of elements of this model to elements in the SLALOM model. A given SLALOM feature, in this example "3rd person +s" (adding +s to the verb in a sentence with a third person singular subject) would link across to the explicit knowledge of that concept in this model. In turn, that concept is linked to the concepts involved in its definition and in their definitions. The system will need to be able to determine the user's knowledge of all of these concepts in order to make decisions on what information to include in an appropriate description of an error concerning 3rd person +s. Like in the SLALOM model, the information in the underlying domain knowledge model will be stored in a database overlay design. The tags I have chosen for this model will correspond to the SLALOM tags: known, ZPD, and unknown. The assignment and use of these tags will be discussed further in later sections. This proposed design for the user domain knowledge model is in the earliest stages of development, but it holds promise with respect to the needs and goals of the ICICLE system. The construction of the system response could easily consult a model with this structure and markings in order to determine the user's depth of knowledge on the features being discussed so that the form phase may elect to include background information and definitions as needed. This aspect of the user knowledge component of the user model needs a great deal of further development, though, and that development is outside the scope of my work. The underlying domain knowledge model will therefore exist as a working hypothesis based on certain assumptions I will discuss in later sections, but my focus will remain on SLALOM. 5.4 The History Models The history models are also outside of my focus, so I will only overview them very briefly here. The history component of the ICICLE user model consists of the dialogue history model and the system history model. The former is a storehouse of the text plans generated by the system during the current session with the user. The generation component will need to be able to search through these stored plans in order to explicitly take previouslysaid material into account when at the history phase of generating a new explanation. Since we do not wish to make reference to dialogue from previous sessions, the dialogue history will be initialized as empty at the beginning of each session of system use, and will not carry over from one session to another. The system history component is both more complex and more permanent. Since it stores the system's observations concerning the success or failure of different method choices, and this information is pertinent over the long term and across the various sessions, it needs to be saved and restored each time the system is used by the individual. Figure 5.5: Scored tutorial methods in the system history model. Its contents are divided into two parts: a store of evaluation scores for each of the method selections and a concise log of explanations attempted by the system. The evaluation scores are akin to the user models of Menotutor (Woolf, 1984) and German Tutor (Heift and McFetridge, 1999) as discussed in Sections 2.2.2 and 2.3.3 respectively. A numerical score will be recorded for each method ICICLE has at its disposal, indicating its strength with regard to that particular user. This score would be initialized to a middle value across all methods, indicating that as far as ICICLE knows, they are all equally strong for this learner. The idea behind the score would be that when an explanation executed in that method is followed by correct use of the language feature involved, the score would increase; on the other hand, if repeated explanations on the same feature are required because of errors in subsequent performance, the score should decrease. Ideally, these decreases should be relatively small in order to allow the user to have a concept reinforced in the same way some number of times before the system gives up on that method; but if the score does go too low, the system should change which tutorial approach it uses because the score associated with the one that has failed is now too low for it to be considered the best choice. This aspect of the system history model is shown in Figure 5.5, using as examples the method selections outlined in Section 4.2.1. The "default" marking indicated by the tic marks at the halfway points show the initialization point of the scores. The highestscoring method, shown by the dashed box, would be the one most likely to be preferred by the system when generating the next explanation, although (as mentioned in the previous chapter) some of that selection relies on other information as well. The second part of the information stored in this model is the log of attempted explanations. In a way, this repeats information stored in the dialogue history --- however, the dialogue history has much more data in it (making searching through it much more difficult) while this log only needs to record that an explanation about feature X was generated using method Y (stating, for each feature that has been explained by the system, the most recent method used). Since the dialogue history is also erased at the end of every session, but we do not necessarily want this information to disappear, we are further justified in duplicating this "summarized" information elsewhere. At the end of an error analysis, the performance data from the most recent sample of the user's writing will be sent to the user model (see Section 5.5.3 below). The System History model will compare the features used correctly and those used with error against the log of explanations. If a feature has been explained using method Y and it has occurred correctly in the new sample, the score for Y should be increased. If the feature was used incorrectly in the sample, the score should be decreased. In that way, the system can track how well the method choices are performing over the long term. 5.5 Representing a Changing User As discussed in Chapter 2, most natural language systems which base their generation choices on a user model are more concerned with using the model than initializing it or updating it over time (Ringle and Bruce, 1981; Paris, 1987; Moore and Paris, 1992; Carenini and Moore, 1993). In those systems, the user model is often static and assumed "given" from some outside source. Since ICICLE's user model needs to capture fine details about the user derived from user performance, and ICICLE will be used by an individual over time and across the development of new skills, we must establish the principles by which the system will garner its initial judgments about the user, store them, and update them over the course of time. Other systems with these concerns include Menotutor (Woolf, 1984) and EDGE (Cawsey, 1993). As is done in EDGE, I wish to base ICICLE's techniques for building and updating its user model on the sources of user information proposed by the user modeling theories of (Wahlster and Kobsa, 1986), including: o Initial individual models stored from previous sessions If a user has accessed ICICLE before, then his or her user model information will be retrieved from system storage and used to initialize the state of the user knowledge models. As mentioned above, this would pertain to all of the components except for the dialogue history model, which is started from an empty state at the beginning of each session. o Assumptions based on user input which provide explicit (direct) or implicit (indirect) inferences Analysis of userinputted texts will provide the SLALOM architecture with direct markings based on user performance; if a user consistently uses a certain grammatical construct correctly, that construct will be marked "acquired" in the user knowledge model. If there is inconsistency, or variation in the use of a construct, it will be marked "ZPD." If the user consistently makes an error on that feature, it should be marked "unacquired." If the error identification phase is performing the very first parse for a particular user, the performance statistics from that piece will become the initial state of the user knowledge model. Indirect information that can be derived from user input includes the success or failure of method choices as discussed in the previous section. This is not directly exhibited in user language production, but can be derived from patterns of success or failure over time and will thus affect the system history component as discussed. Another aspect of indirect information to be derived from input affects the model of underlying domain knowledge and will be addressed later. o Assumptions based on the system's contribution toward solving its communicative goals, which involve a change in the user's knowledge When the system has generated a definition or explanation about some concept in the underlying domain knowledge model, there is a possibility that the user's knowledge concerning this concept has successfully increased as a result of the explanation. This should be indicated in the markings on the model. This is discussed further in Section 5.5.3. o Default assumptions from stereotypic information or general knowledge about the user The stereotypic information which ICICLE has at its disposal is represented in the hierarchical layout of both the SLALOM and the underlying domain knowledge models. In SLALOM, the orders of acquisition and simultaneous acquisition links represent knowledge about stereotypical learners, and can sometimes be used when sparse performance data results in no tags placed on certain language features. If an untagged feature is "below" an acquired feature or a ZPD feature on a hierarchy, then the system can infer that it is probably also acquired, even if it has not occurred thus far in the user's language production. Likewise, links of concurrent acquisition can propagate their own tags to untagged items, and features "above" unacquired items or the ZPD can be inferred as unknown. These indirect conclusions are not as strong as those based on actual empirical data from the specific user, but in the absence of stronger data they can be used to make planning decisions. The way this kind of inferring would work in the model of underlying domain knowledge is less welldeveloped at this time, but I will address it in Section 5.5.2. Following is a more indepth discussion of each of these aspects of model establishment and maintenance. 5.5.1 Initialization Since the initialization of the history models was already discussed, this section will focus on how the knowledge models are initialized in the first session of interaction between a new user and the ICICLE system. SLALOM The initialization data for the user grammar model comes from the first text analysis performed by the error identification component and its first analysis of the user's language feature usage. Once this information is received by the user model, it can assign tags in SLALOM according to how the user performed on his or her first writing sample and all of the language structures it contains. One serious drawback to this is that the first parse of the first piece of writing by any new user must be performed without the use of SLALOM to help the system select between parses. This appears to be an nontrivial problem. Unlike some other systems such as EDGE (Cawsey, 1993), who base the initial state of their user models on a general classification of the user, ICICLE cannot reliably classify the language ability of a new user in such a way as to provide useful default reasoning for the parser. In the domain of language and writing proficiency, selfassigned or even teacherassigned classifications are likely to be highly subjective and variable. We are therefore faced with the problem of needing information on expected user performance without yet having any data on exhibited user performance. This problem has not been solved in our work to date and will be one of the issues I will address in my future research. Thankfully, the situation improves once the first analysis is complete and the initial tags are placed in the features in the SLALOM model. It is our belief that the user grammar model of ICICLE has a very rich source of input and thus avoids Sparck Jones' feared problem with "chasing" the user; a multisentential sample of a user's writing is a uniquely rich source of language proficiency information. In comparison to polling user knowledge where one question is only likely to reveal one point of data (either the user understands or does not understand the concept asked about), even a short piece piece of writing is going to offer many points of data per utterance. Every grammatical construct successfully or unsuccessfully used, from determiner choice to verb tense, provides information about the user. These points can be correlated to provide a map of those constructs used correctly, those which are experiencing variation, those which are occurring with error, and those which are absent; therefore, even after only a small number of interactive cycles, we are working from a rich source of directlyderived data about this individual, particularly compared to what we could obtain from questioning the student; furthermore, the student's answers to explicit questions about grammar would more reflect the information in their domain knowledge model than the possibly unconscious knowledge in their interlanguage (this will be addressed below). The Underlying Domain Knowledge Model Because this aspect of the user model is not my primary focus, I have not developed a robust method for assigning the initial tags in the domain knowledge model. Just as I have argued that the Interface Theory allows us to take the stance that explicit instruction can and will result in interlanguage development, we can also use it to state that the user's implicit grammar knowledge can surface in explicit knowledge as well. One argument for this is that learners can draw inferences from his or her intuitive knowledge to develop conscious conclusions about the way the grammar of the target language works (Ellis, 1993). Another argument is based on the fact that our particular learner population is acquiring English exclusively through classroom instruction; with little or no aural capabilities, there is very little opportunity for this group of learners to acquire English through naturalistic acquisition. One could claim (as we do) that written English can provide some of the "intake" they would require, but it has already been stated that most of their written input comes from texts aimed not at their level of acquisition, but that of their hearing peers (Anderson, 1993), which would not satisfy the requirements for "intake" as stated by (Krashen, 1981). Since (Ellis, 1993) states that a learner's body of explicit knowledge can be a "buffer" through which explicit instruction passes to become implicit knowledge, one could conclude that all internalized grammar knowledge of a population of learners under these constraints was represented as explicit knowledge first. Under this assumption, the technique I have chosen for the assignment of tags for the underlying domain knowledge model is to "project" them from the SLALOM model. The grammar model contains the conclusions the system has drawn directly from user performance; since there are no user actions which can give us conclusive evidence about their underlying knowledge, we must derive that information via a more indirect route, through the assumption stated above and the hypothesis that a learner's performance of a feature reflects the level to which he or she has mastered the concepts associated with it. This "projection" of the tags from SLALOM onto the parts of the domain knowledge model connected with the explanations of each feature in SLALOM proceeds as follows. After the conclusion of SLALOM's initialization, topics linked from features in the ZPD project a "ZPD" tag on the concepts involved in their definition; topics linked from features which are acquired are tagged "known;" those linked from unacquired features are tagged "unknown." The projection of tags should trickle down from parent to child, but in the case of a concept receiving a "projection" from multiple sources in SLALOM or multiple parents, the higher projection should be used; in other words, if one of a concept's parents projects down the tag "unknown," but another projects "known," then "known" should be recorded. This is represented in Figure 5.6. Figure 5.6: The "projection" of tags from SLALOM onto the underlying domain knowledge model. Not all features in SLALOM will always be tagged at all times. A feature which has not occurred in the user's production to date will have an empty tag, and as mentioned above its tag will be indirectly inferred at need using the relationships between items in the SLALOM hierarchies. Therefore, it is possible that some items in the underlying domain knowledge model will not receive any projections. In this case, we will have to design a series of inference rules similar to those described for the EDGE system (Cawsey, 1993) in Section 2.2.3. Rules of this type would be similar to: If all of the subconcepts are known, then the parent concept can be inferred as known. A possibility which I mentioned earlier is the use of direct user interaction (i.e., questioning the user) to discern the extent of user knowledge. Although that idea may be further investigated at some other time, it would entail considerable development, and at this stage we consider the "projection" approach to be sufficient for dealing with the domain knowledge model in its workinghypothesis status. 5.5.2 Retrieving the Information Once the information about a user has been established in the various components of the user model, the various ICICLE processes will be able to retrieve it through querying the model at need. In this section I will address in greater detail how retrieval of information from the user grammar model will proceed, since the other components have been sufficiently sketched out to this point. Information about the user is retrieved from SLALOM at three points in ICICLE's cycle of input and response: when selecting between parses of an input sentence, when pruning the list of errors to be discussed to just those in the ZPD, and when selecting ZPDarea surface structures for text realization. The SLALOM model supplies two kinds of data to these processes: direct data on the user's performance, and indirect data which can be inferred at need from the relationships of untagged items to tagged ones in the SLALOM hierarchies. The latter data will be less reliable and there is a precedent in some explanation systems (e.g. EDGE (Cawsey, 1993)) to make note of unreliable assumptions about the user so that they may be corrected if the action based upon them fails. In none of these three cases, however, do we see the possibility for feedback indicating failure to return to the system, and in the current ICICLE design the system is not envisioned as trying to repair these actions. Therefore, ICICLE's current design does not distinguish between decisions based on direct data and those based on indirect data. Figure 5.7: The retrieval/update cycle between the error analysis phase and the user model. The explicit recognization of "assumptions" not based directly on empirical data may have more relevance in the use of the domain knowledge model and its relationship to the function of the form phase of the planning model. This is outside the scope of my work. 5.5.3 Updating The relationship of the user model to the error analysis phase is represented in Figure 5.7. I have just established that once the model is initialized, the error analysis phase will draw "current" information about the user from the components of the model in order to perform its parses. The information presently in the model will represent that information about the user derived from performance up until the present piece of writing. Once the analysis process is complete, the system has new (and potentially different) observations of user performance to add to the user grammar model, and that information is sent along so that it may cause the model to update itself accordingly. The tags in SLALOM may be deemed incorrect for one of two reasons: an existing tag might have been based on a small sample size of user performance and more data might show that it is wrong, or the user's proficiency may simply be changing, as we hope it will over time. ICICLE must therefore have a mechanism for reevaluating these tags and updating their values. A model that can be overwritten over time gives rise to the question of whether new data should always champion over the old, as with the EDGE system, which always overwrites previously recorded tags (Cawsey, 1993). The outline given thus far of what observations are recorded in the model has been fairly vague: features used "consistently correctly" are acquired concepts, those used "with variation" are in the ZPD, and those which appear "consistently incorrectly" have not yet been acquired. These judgments may change when the amount of data increases as the system goes through more than one piece of the student's work, particularly if one or more of the pieces is too short to contain several instances of certain language features, or a feature is simply not a commonly used one. Therefore, it makes sense for the model to track certain figures (the number of times a feature has been attempted, and the number of times it was executed without error) across more than one piece of writing and to make distinctions between figures collected within the most recent piece of writing and those collected across others in the past (since the user's proficiency will not change within a given piece, but there may be change across a selection of them). This will allow the system to examine as much data as possible, strengthening its ability to make these judgments. In this view, the user's writing is seen as a continuum of performance events over time from the first session to the most recent. But since the user's proficiency is also changing, the system should not always compute performance statistics which include events stretching back to the beginning of his or her use of the system, when the performance levels may have been different. Therefore, we recommend that the system maintain a "sliding window" of performance history across writing samples from which to update the user grammar model at each new analysis (see Figure 5.8). Ideally, this window would include enough data to be robust, and yet be small enough to capture only the "current" statistics. This latter requirement is particularly important for the system's selfevaluation and deciding whether recent explanatory attempts have succeeded. Determining what size such a window should be is a realm of future research. Related issues are whether or not it should adjust its size according to the circumstance, and what statistics of successful execution would be sufficient for judging a feature to be "consistent" in its use. Figure 5.8: The "sliding window" of performance history. The Underlying Domain Knowledge Model With the status of this model as a "working hypothesis" and not the focus of my efforts, the primary update activities of the underlying domain knowledge model will derive from the updates of SLALOM --- whenever SLALOM is changed, the domain knowledge model should be updated as well by "projections" which work in the same fashion as those described for its initialization. When upgrading a former ZPD concept in this model to "known" to reflect a concept now marked "acquired" in the SLALOM model, we work from the assumption that since the concepts in the ZPD are the topics of instruction, when the associated feature has been acquired, the underlying concepts have been discussed with the user one or more times; and, moreover, the improvement in the user's performance indicates that the explanations were successful in affecting the user's grammatical proficiency; since we assume that explicit instruction passes through explicit knowledge before becoming implicit knowledge, we can therefore mark the explicit knowledge of this concept as known. Further development may make this model more independent from SLALOM, however. It has been proposed that we add the tag maybeknown to this model to reflect the status of a concept following its definition or explanation by the system, either directly through the original explanation or through the user requesting the definition through one of the interface's hyperlinks. Using the maybeknown label in this way would mirror its use in EDGE (Cawsey, 1993) to indicate the possibility that the system's explanation has resulted in an increase of knowledge, as described in Section 2.2.3. How this additional tag would interface with those derived from SLALOM, and how these tags will affect the form phase decisions, are questions for future work. Chapter 6 Summary and Future Directions I will now address a summarization of what the previous chapters have established and how the contents of this proposal relate to the goals of my PhD thesis research in terms of what parts of the ICICLE system I plan to develop and what issues I expect to face. The currently existing ICICLE system as described in Section 3.3 performs syntactic parses of user-inputted text without the use of any model of the user's language competence, domain knowledge, or history of interaction with the system. The generation component has not yet been implemented, but Chapter 4 described the proposed model for the generator in detail so as to make clear the demands it will place upon the user model. I have proposed a fourpart model consisting of a history component(the system history and the dialogue history) and a knowledge component (the user grammar and the domain knowledge). My focus in this work will be on the knowledge component, which impacts the text analysis and generation processes the most. I also wish to further refine this focus to the user grammar aspect of this component. The overall goal of my work, therefore, will be to develop a viable architecture for the SLALOM model and its interaction with the other system components. My SLALOM implementation will be supported by the domain knowledge model functioning in a "working hypothesis" format, based on the research I have cited but not fully developed within the scope of my work. I will also implement these models within the currently existing natural language parsing system and I will demonstrate how they will be used by the proposed generation component. Further details follow below. 6.1 Completing the User Knowledge Model Architecture Although some progress has been made toward establishing the hierarchical structure of SLALOM, the design is still very general and will need to be further specified. Part of this specification will be a determination of the exact relationship between the parsing grammar and what needs to be represented in the SLALOM model in terms of the "features" we should put in the model and how they correspond to the different parsing "rules," particularly in the case where more than one rule covers a given syntactic structure, including multiple malrules. A second aspect of specifying the SLALOM architecture is to determine the organization of these contents. Although the division into typological categories or "hierarchies" should be straightforward, the determination of the order of acquisition and the relationships of concurrent acquisition will require significant research. Existing studies to identify orders of language feature acquisition such as (Bailey et al., 1974; Dulay and Burt, 1975; LarsenFreeman, 1976) have focused primarily on the acquisition of morphology and a few others have addressed negation and relative clause formation, but many of the hierarchies required by our system have not been addressed. Empirical analysis of the productions of our learner population is required. A preliminary look at a corpus of 101 writing samples from deaf students [The corpus analysis has been performed by undergraduate Litza Stark.] has revealed groupings of syntactic errors with apparent correlation to a general (and currently subjective) estimate of language proficiency, indicating three ranges of errors which occur specific to proficiency levels. These findings are excerpted from in Figure 6.1, which shows a subset of the errors identified in the texts. Additional analysis has yielded statistically significant relationships of cooccurrence between errors within the proficiency levels. These results are informal and only indicate to us a direction and a possible confirmation for the SLALOM design, both in terms of representing the order of acquisition and in terms of representing concurrent acquisition. In order to complete the SLALOM design, this analysis must be brought further, shifting the focus from the area of error committed to the area of variation in performance in order to pinpoint the ZPD. Figure 6.1: Correlations between language ability and errors. One possibility for a continuation of this work is a longitudinal study on a new corpus of samples which contains multiple samples from individuals over a long period of time, showing the progression of a given individual's acquisition process. I am currently investigating means of obtaining samples for this purpose. Another source of data is existing studies on the orders of acquisition in certain categories of syntactic structure such as negation phrases and relative clauses. These studies are under investigation. 6.2 Implementation Goals One part of my thesis goal is to fully specify the hierarchical organization of the SLALOM architecture as described above; the other part is to implement this architecture within the ICICLE system. In the next few sections, I will detail my implementation goals for ICICLE. 6.2.1 Error Identification Using the Model One of my goals will be to revise the error identification process to incorporate the user knowledge model. The existing system I described in Section 3.3 currently operates entirely without any modeling of the user. As a result, when the parser returns multiple possible parses which span a given utterance, the system picks the first "grammatical" parse it finds or, if there are no parses without errors, it picks the first overall. Tested on a small selection of sentences containing determiner and agreement errors, this parser can correctly identify the error in 63% of the cases (Schneider and Mc Coy, 1998). This figure is encouraging for preliminary results, but not nearly sufficient for a working system. One of the ways in which this statistic may be improved is through the interaction of the parse selection mechanism with the user knowledge model. Because of the correspondence between the language features in the SLALOM structure and the augmented grammar used in the parser, the tags in SLALOM will permit the system to give preference to those parses containing rules from the grammar which accurately capture how we expect the user to perform: "correct" English rules capturing those morphosyntactic constructions the user has acquired, and malrules capturing those which are not acquired. Those which are in the ZPD should be occurring sometimes correctly and sometimes with error. I will modify the parse selection mechanism so that it uses this information to prefer parses which are consistent with the system's expectations of the user as part of my thesis work. As part of this work, I will need to address the "first parse" problem mentioned in Section 5.5.1. I will have to design a method for ICICLE to make selections between parses of userinputted text when it is working with a new user who has not accessed the system before and about whom SLALOM holds no observations. One idea I will investigating is a twopass approach, where the parser does an initial, shallow analysis of the text to determine some very basic qualities of the text such as words per sentence and clause complexity; these measurements may give us a very basic estimate of the user which can be compared against a very general "rating" of the potential parses in an approach similar to that of German Tutor (Heift and McFetridge, 1999)as described in Section 2.3.3. Although crude, it may give ICICLE an alternative to selecting parses arbitrarily. 6.2.2 Knowledge Model Updating after Text Analysis I discussed in Section 5.5.3 that it will be the responsibility of the error identification module to provide the data for updating the user grammar model at the conclusion of error analysis, in order to reflect the new data that has been obtained about the user. Tags maybe changed for one of two reasons: the old tag could require a correction, having been based on sparse data earlier and proved wrong now that there have been more performance events on which to base the judgment; or the old tag could require an update, meaning that its tag did represent the user's proficiency level in the past, but recent figures have shown the proficiency changing. One of my goals is to implement this update process at the conclusion of the error identification. Issues I will need to explore in doing so include the statistics of proficiency and the "sliding window" concept. For the former, I will need to determine what ratio of featureattempted to featureachieved is necessary for the feature to be tagged "acquired" rather than "ZPD." Does the user need to be correct 100% of the time for it to be acquired? 90%? These questions will need to be answered through investigations of standards for writing evaluation and second language proficiency. In order to develop the sliding window, I will need to determine what size the window needs to be to capture robust statistics on the full range of features represented in SLALOM without also including events (writing samples) which are too old and which bog down the statistics. It has been brought to my attention that the sliding window may not want to cleanly cut out all of the old events it is leaving behind; on the contrary, it is only the old errors we want to ignore, when the more current work shows those features being used with consistency. Old successes should be remembered, particularly in the case of the more obscure constructs which may continue to suffer from sparse data. When determining size, another question which arises is what unit to use to measure that window. I do not believe that a certain number of writing samples would be a good measure, as the samples may vary greatly in size according to what purpose the piece is intended for. The number of utterances may not be a good number, either, as utterance length varies between individuals of different proficiency levels and short utterances yield far less data. The solution may be to measure the size by the number of "features attempted," rounding up to the nearest whole sample, but this is also a subject of future investigation. A final issue regarding the sliding window which I will need to investigate is whether its size should be rigid. In a situation where long gaps of time may unexpectedly separate two writing samples, the older data is even less desirable than it would have been otherwise. In these situations, it may be advisable for the window to adjust itself smaller in order to avoid being skewed by the particularly old data. 6.2.3 Pruning the Error List When the error identification is complete, the error analysis module will pass the list of errors for tutorial instruction over to the response generation module. It has already been established that this list should be trimmed to just those errors within the ZPD. The actual committed errors may range widely beyond the ZPD; most may be on constructs beyond the user's understanding, and some will be simple mistakes. In the example used in the Introduction, a user might have made an error in subjectverb agreement simply because he has mistyped. In this case, the error analysis component would still have parsed it as subjectverb disagreement, but because the user is well aware of agreement, this error would not be passed to the response generator since instruction is not necessary. Part of my thesis work will be to construct the mechanism to prune the errors that have been identified by the system down to the relevant ones in the user's "current" realm of knowledge. The errors will be "packaged" for use by the response generation component, and passed along with accompanying information relevant to the generation process, including the original sentence and the source of the error. 6.2.4 Response Planning Although I will not be further developing the textplanning component within the scope of my dissertation, it is my goal to be able to show that the information contained within the user model will meet all of the needs of the system that has been outlined so far. In particular, since my focus is on the user knowledge models, I will complete a library of propositions and goals --- an extension of that sketched in Section 4.2 --- which make reference to the models I have developed. I will attempt to show that the propositions I have provided will extract sufficient information from the model to meet the needs of our text planning mechanism. 6.3 Evaluation In order to validate my design for the user grammar model and its implementation within a revised ICICLE system, I will undertake certain actions to evaluate it effectiveness. These actions will focus on the interaction of the model with the error analysis component, since the text generation component will not be fully implemented at the time of the completion of my work. In Chapter 3 I mentioned the two tasks lain out for tutoring systems by (Glaser et al., 1987): that of the diagnostician, who must discover the nature and extent of the student's knowledge, and that of the strategist, who must plan a response to this discovery. (Hativa and Lesgold, 1991) bemoaned the fact that individualized tutoring systems that were stateoftheart at the time of their writing mostly failed to accomplish the second task effectively because they were unable to accomplish the first task accurately; they were unable to accurately evaluate user performance. Hativa and Lesgold listed a taxonomy of performance situations which a system must handle in order to accomplish this goal, and I would like to use this taxonomy in order to lay out specific goals for ICICLE to accomplish under evaluation. The following is a list of performance situations derived from their taxonomy in which ICICLE should be able to select the correct action with its revised error identification component. To follow their terminology, I have used the phrases "correct answer" and "wrong answer," even though in the domain of ICICLE it would be more appropriate to discuss them in terms of language features used correctly or incorrectly. o When a correct answer does not mean a user understands ICICLE is not going to generate tutorial feedback in response to correct English text from the user, even if the competent performance contradicts what the user grammar model currently says about the learner's mastery of the constructions involved. A worry in this situation, however, is whether or not "accidental" correct constructions would result in the user model reflecting knowledge the user does not have. It is my belief that this is a possibility when the data on a given construction is still sparse, but that in the long run, if the user has not truly acquired a construction, it will be reflected in the production of errors and the subsequent revision of the data stored in the SLALOM model. o When a correct answer means the user does understand Correct English utterances should be parsed as such and the user's performance should be reflected in the SLALOM tags. If the system is tested on a corpus of utterances which human judges agree to be correct, the SLALOM model for that learner should display "acquired" status for the features involved in the utterances. o When a wrong answer does not mean that a user does not understand This has been addressed above. When a user makes a grammatical error which reflects a simple mistake rather than the status of his or her interlanguage, ICICLE should be able to discern the contradiction and mark the error on the screen interface but elect not to tutor the user on the underlying concept. o When a wrong answer means that a user does not understand ICICLE should be able to identify situations in which an ungrammatical parse is warranted, and when that parse should result in the generation of tutorial feedback, as well as the reflection of the user's knowledge in SLALOM. My goal is to construct specialized corpora, consisting of utterances culled from our user population, to test each of these situations as part of the evaluation of the system. I would like to compare the reaction of ICICLE to these corpora against the reaction of a human judge; in other words, the system should agree with human judges' judgment concerning the user's competence. It should not be unduly affected by isolated variance ("accidentally" correct forms, or simple mistakes on well-known forms), and SLALOM should correctly reflect tags which represent the user's performance on the features it has seen. I will supplement this evaluation criteria with an investigation of the parse selection process itself, to see if the parses chosen by the system are the same as those selected by human judges. 6.4 Conclusion In this proposal, I have overviewed the system design of ICICLE, an interactive writing tutoring system which will use both natural language analysis and natural language generation to achieve a cycle of tutorial evaluation and instruction in English. I have discussed the needs of the module which identifies errors in the userwritten text and I have put forth a basic design for the module which will generate feedback to that text. My primary interest in ICICLE is the knowledge source which supplies both of these functions with vital information about the user: a complex user model containing information about the history of the user's interaction with the system and the user's knowledge about the domain, both in terms of interlanguage grammar and conscious terminological knowledge. Within the scope of my work, I will fully develop the architecture of the user interlanguage grammar model and implement its interaction with the text analysis component of the ICICLE system. By focusing on the user grammar model, I intend to demonstrate how a detailed model of a second language learner's interlanguage can be used in a CALL system to find parses for English sentences containing grammatical errors, and to supply information to a complex text generation component for generating explanations about the language errors. Finally, my hope is that my work will show that detailed modeling of user language performance is both possible and reliable, and furthermore that it is highly desirable for achieving individualized instruction in a system for ComputerAssisted Language Learning. 6.5 Acknowledgments This work has been supported by NSF Grant #IRI9416916, NSF Research Traineeship Grant #GER9354869, and a Rehabilitation Engineering Research Center Grant from the National Institute on Disability and Rehabilitation Research of the U.S. Department of Education (#H133E30010). My personal thanks to my advisor Kathleen McCoy for her unending patience and helpful ideas. References Vincent Aleven and Kevin D. Ashley. 1992. Automated generation of examples for a tutorial in casebased argumentation. In C. Frasson, G. Gauthier, and G. McCalla, editors, Proceedings of the Second International Conference on Intelligent Tutoring Systems, pages 575--584, Berlin. SpringerVerlag. James Allen. 1995. Natural Language Understanding. Benjamin/Cummings, California, 2nd edition. John R. Anderson, Paul J. Kline, and Charles M. Beasley Jr. 1980. Complex learning processes. In Richard E. Snow, PatAnthony Federico, and William E. Montague, editors, Aptitude, Learning, and Instruction, volume 2: Cognitive Process Analyses of Learning and Problem Solving, chapter 21, pages 199--235. Lawrence Erlbaum Associates, Hillsdale, New Jersey. John R. Anderson. 1982. Acquisition of cognitive skill. Psychological Review, 89(4):369--406. Jacqueline J. Anderson. 1993. Deaf Student MisWriting, Teacher MisReading: English Education and the Deaf College Student. Linstok Press, Burtonsville, MD. N. Bailey, C. Madden, and S. D. Krashen. 1974. Is there a `natural sequence' in adult second language learning? Language Learning, 24(2):235--243. C. Baker and D. Cokely. 1980. American Sign Language: A Teacher's Resource Text on Grammar and Culture. TJ Publishers, Silver Spring, MD. Maria Beck, Bonnie D. Schwartz, and Lynn Eubank. 1995. Data, evidence, and rules. In Lynn Eubank, Larry Selinker, and Michael Sharwood Smith, editors, The Current State of Interlanguage: Studies in Honor of William E. Rutherford, pages 177--195. John Benjamins Publishing Company, Amsterdam and Philadelphia. Ellen Bialystok. 1978. A theoretical model of second language learning. Language Learning, 28(1):69--83, June. Ellen Bialystok. 1981. The role of linguistic knowledge in second language use. Studies in Second Language Acquisition, 4(1):31--45, Fall. Roger Brown and Camille Hanlon. 1970. Derivational complexity and order of acquisition in child speech. In John R. Hayes, editor, Cognition and the Development of Language, chapter 1, pages 11--54. John Wiley & Sons, Inc., New York. H. Douglas Brown. 1994. Principles of Language Learning and Teaching. Prentice Hall Regents, Englewook Cliffs, NJ, 3rd edition. Susan Bull. 1997. Promoting effective learning strategy use in CALL. Computer Assisted Language Learning, 10(1):3--39. Charles B. Callaway and James C. Lester. 1997. Dynamically improving explanations: A revisionbased approach to explanation generation. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pages 952--958, Nagoya, Japan. Giuseppe Carenini and Johanna D. Moore. 1993. Generating explanations in context. In Wayne D. Gray, William E. Hefley, and Dianne Murray, editors, Proceedings of the International Workshop on Intelligent User Interfaces, pages 175--182, Orlando, Florida, January 47. ACM Press. Giuseppe Carenini, Vibhu O. Mittal, and Johanna D. Moore. 1994. Generating patientspecific interactive natural language explanations. In Proceedings of the 18th Symposium on Computer Applications in Medical Care (SCAMC '94). McGrawHill, Inc. Susanne E. Carroll. 1995. The irrelevance of verbal feedback to language learning. In Lynn Eubank, Larry Selinker, and Michael Sharwood Smith, editors, The Current State of Interlanguage: Studies in Honor of William E. Rutherford, pages 73--88. John Benjamins Publishing Company, Amsterdam and Philadelphia. Alison Cawsey. 1990. Generating explanatory discourse. In R. Dale, C. Mellish, and M. Zock, editors, Current Research in Natural Language Generation, London. Academic Press. Alison Cawsey. 1993. Explanation and Interaction: The Computer Generation of Explanatory Dialogues. MIT Press, Cambridge, MA. Michelene T. H. Chi, Robert Glaser, and Ernest Rees. 1981. Expertise in problem solving. In Advances in the Psychology of Human Intelligence, chapter 1, pages 7--76. Lawrence Erlbaum, Hillsdale, NJ. Allan Collins and John Seely Brown. 1988. The computer as a tool for learning through reflection. In H. Mandl et al., editors, Learning Issues for Intelligent Tutoring Systems, chapter 1, pages 1--18. SpringerVerlag, NY. Vivian Cook. 1991. Second Language Learning and Second Language Teaching. Edward Arnold, New York. John H.A.L. de Jong and Ludo Verhoeven. 1992. Modeling and assessing language proficiency. In John H.A.L. de Jong and Ludo Verhoeven, editors, The Construct of Language Proficiency: Applications of Psychological Models to Language Assessment, chapter 1, pages 3--19. John Benjamins Publishing Company, Amsterdam and Philadelphia. Heidi C. Dulay and Marina K. Burt. 1974. Errors and strategies in child second language acquisition. TESOL Quarterly, 8(2):129--136, June. Heidi C. Dulay and Marina K. Burt. 1975. Natural sequences in child second language acquisition. Language Learning, 24(1). Michael Elhadad, 1993. FUF: The Universal Unifier User Manual Version 5.2. Columbia University, Computer Science Department, June. Rod Ellis. 1992. Second Language Acquisition and Language Pedagogy. Multilingual Matters, Philadelphia. Rod Ellis. 1993. The structural syllabus and second language acquisition. TESOL Quarterly, 27(1):91--113, Spring. Rod Ellis. 1994. The Study of Second Language Acquisition. Oxford University Press, New York. Paul M. Fitts. 1964. Perceptualmotor skill learning. In Arthur W. Melton, editor, Categories of Human Learning, pages 243--285. Academic Press, New York and London. Richard P. Gabriel. 1988. Deliberate writing. In D. McDonald and Leonard Bolc, editors, Natural Language Generation Systems, pages 1--46. SpringerVerlag. Robert Glaser, Alan Lesgold, and Susanne Lajoie. 1987. Toward a cognitive theory for the measurement of achievement. In Royce R. Ronning, John A Glover, Jane C. Conoley, and Joseph C. Witt, editors, The Influence of Cognitive Psychology on Testing, volume 3 of BurosNebraska Symposium on Measurement and Testing, chapter 3, pages 41--85. Lawrence Erlbaum Associates, New Jersey. Ralph Grishman, Catherine Macleod, and Adam Meyers. 1994. Comlex syntax: Building a computational lexicon. In Proceedings of the 15th International Conference on Computational Linguistics, Kyoto, Japan, July. Coling94. Nira Hativa and Alan Lesgold. 1991. The computer as a tutor can it adapt to the individual learner? Instructional Science, 20:49--78. Barbara HayesRoth and Perry W. Thorndike. 1979. Integration of knowledge from text. Journal of Verbal Learning and Verbal Behavior, 18:91--108. Trude Heift and Paul McFetridge. 1999. Exploiting the student model to emphasize language teaching in natural language processing. In Mari Broman Olsen, editor, Proceedings of ComputerMediated Language Assessment and Evaluation in Natural Language Processing, an ACLIALL Symposium, pages 55--61, College Park, Maryland, June 22. Association for Computational Linguistics. Jerry R. Hobbs. 1996. On the relation between the informational and intentional perspectives on Discourse. In Eduard Hovy and Donia Scott, editors, Burning Issues in Discourse: An InterDisciplinary Account, volume 151 of NATO ASI Series, Series F: Computer and Systems Sciences, pages 139--157. SpringerVerlag, Berlin, Germany. Stephen D. Krashen. 1981. Second Language Acquisition and Second Language Learning. Pergamon Press, New York. Stephen D. Krashen. 1982. Principles and Practice in Second Language Acquisition. Pergamon Press, New York. Stephen D. Krashen. 1983. Newmark's "ignorance hypothesis" and current second language theory. In Susan M. Gass and Larry Selinker, editors, Language Transfer in Language Learning, Series on Issues in Second Language Research, chapter 9, pages 135--153. Newbury House Publishers, Inc., Rowley, Massachusetts. Stephen D. Krashen. 1985. The Input Hypothesis: Issues and Implications. Longman, New York. Diane E. LarsenFreeman. 1976. An explanation for the morpheme acquisition order of second language learners. Language Learning, 25(1):125--135, June. Benoit Lemaire and Johanna D. Moore. 1994. An improved interface for tutorial dialogues: Browsing a visual dialogue history. In Human Factors in Computing Systems (CHI '94), "Celebrating Interdependence", pages 16--22, Boston, MA, April 2428. ACM. Frank Linton, Brigham Bell, and Charles Bloom. 1996. The student model of the LEAP intelligent tutoring system. In Proceedings of the Fifth International Conference on User Modeling, pages 83--90, KailuaKona, Hawaii, January 25. UM96, User Modeling, Inc. Diane J. Littman and James F. Allen. 1987. A plan recognition model for subdialogues in conversations. Cognitive Science, 11:163--200. William C. Mann and Sandra A. Thompson. 1988. Rhetorical Structure Theory: Towards a functional theory of text organization. TEXT, 8(3):243--281. M. Matz. 1982. Towards a process model for high school algebra errors. In D. Sleeman and J.S. Brown, editors, Intelligent Tutoring Systems, Computers and People Series, chapter 2, pages 25--50. Academic Press. Kathleen F. McCoy and Lisa N. Masterman (Michaud). 1997. A tutor for teaching English as a second language for deaf users of American Sign Language. In Proceedings of Natural Language Processing for Communication Aids, an ACL/EACL97 Workshop, pages 160--164, Madrid, Spain, July. Kathleen F. McCoy, Christopher A. Pennington, and Linda Z. Suri. 1996. English error correction: A syntactic user model based on principled malrule scoring. In Proceedings of the Fifth International Conference on User Modeling, pages 59--66, KailuaKona, Hawaii, January 25. UM96, User Modeling, Inc. Kathleen R. McKeown. 1985. Text generation. Studies in Natural Language Processing. Cambridge University Press, Cambridge, London, New York, New Rochelle, Melbourne, and Sydney. Lisa N. Michaud and Kathleen F. McCoy. 1998. Planning text in a system for teaching English as a second language to deaf learners. In Proceedings of Integrating Artificial Intelligence and Assistive Technology, an AAAI '98 Workshop, Madison, Wisconsin, July. Lisa N. Michaud and Kathleen F. McCoy. 1999. Modeling user language proficiency in a writing tutor for deaf learners of English. In Mari Broman Olsen, editor, Proceedings of ComputerMediatedLanguage Assessment and Evaluation in Natural Language Processing, an ACLIALL Symposium, pages 47--54, College Park, Maryland, June 22. Association for Computational Linguistics. Johanna D. Moore and VibhuO.Mittal.1996. Dynamically generated followup questions. IEEE Computer, Special Issue: Interactive Natural Language Processing, 29(7):75--86, July. Johanna D. Moore and C'ecile L. Paris. 1989. Planning text for advisory dialogues. In Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada. Johanna D. Moore and C'ecile L. Paris. 1992. Planning text for advisory dialogues: Capturing intentional and rhetorical information. Computational Linguistics, 19(4):651--695. Johanna D. Moore and Martha E. Pollack. 1992. A problem for RST: The need for multilevel discourse analysis. Computational Linguistics, 18(4):537--544. Association for Computational Linguistics. Johanna D. Moore. 1993. What makes human explanations so effective? In Proceedings of the 15th Annual Meeting of the Cognitive Science Society, Hillsdale, NJ. Lawrence Erlbaum Associates. Johanna D. Moore. Unpublished. The role of plans in discourse generation. Prepared for Discourse: Linguistic, Computational, and Philosophical Perspectives. Megan Moser and Johanna D. Moore. 1996. Toward a synthesis of two accounts of discourse structure. Computational Linguistics, 22(3):409--419, September. E. Owen and J. Sweller. 1985. What do students learn while solving mathematics problems? Journal of Educational Psychology, 77:272--284. C'ecile Laurence Paris. 1987. The Use of Explicit User Models in Text Generation. Ph.D. thesis, Columbia University. C'ecile L. Paris. 1988. Tailoring object descriptions to a user's level of expertise. Computational Linguistics, 14(3):64--78, September. Martha E. Pollack, Julia Hirschberg, and Bonnie Webber. 1982. User participation in the reasoning processes of expert systems. In Proceedings of the AAAI, Pittsburgh, Pennsylvania. American Association of Artificial Intelligence. Owen Rambow. 1990. Domain communication knowledge. In Proceedings of the Fifth International Workshop on Natural Language Generation, pages 87--94, Pittsburg, Pennsylvania. Martin H. Ringle and Bertram C. Bruce. 1981. Conversation failure. In W. G. Lehnert and M. H. Ringle, editors, Knowledge Representation and Natural Language Processing, chapter 7, pages 203--221. Lawrence Erlbaum Associates, Hillsdale, New Jersey. Jacques Robin. 1993. A revisionbased generation architecture for reporting facts in their historical context. In H. Horacek and M. Zock, editors, New Concepts in Natural Language Generation: Planning, Realization and Systems. Fraces Pinter, London and New York. Jacques Robin. 1994. Automatic generation and revision of natural language report summaries providing historical background. In Proceedings of the Eleventh Brazilian Symposium on Artificial Intelligence, Fortaleza, CE, Brazil. SBIA94. James A. Rosenblum and Johanna D. Moore. 1993. Participating in instructional dialogues: Finding and exploiting relevant prior explanations. In Proceedings of the World Conference on Artificial Intelligence in Education. Linda SchinkeLlano. 1994. Linguistic accommodation with LEP and LD children. In James P. Lantoff and Gabriela Appel, editors, Vygotskian Approaches to Second Language Research, Second Language Learning, chapter 3, pages 57--68. Ablex Publishing Corporation, Norwood, New Jersey. David Schneider and Kathleen F. McCoy. 1998. Recognizing syntactic errors in the writing of second language learners. In Proceedings of the ThirtySixth Annual Meeting of the Association for Computational Linguistics and the Seventeenth International Conference on Computational Linguistics, volume 2, pages 1198--1204, Universite de Montreal, Montreal, Quebec, Canada, August 1014. COLINGACL, Morgan Kaufmann Publishers. Ethel Schuster and Jennifer BurckettPicker. 1996. Interlanguage errors becoming the Target Language through student modeling. In Proceedings of the Fifth International Conference on User Modeling, pages 99--103, KailuaKona, Hawaii, January 25. UM96, User Modeling, Inc. Bonnie D. Schwartz. 1993. On explicit and negative data effecting and affecting competence and linguistic behavior. Studies in Second Language Acqusition, 15:147--163. Larry Selinker. 1971. The psychologically relevant data of secondlanguage learning. In Paul Pimsleur and Terence Quinn, editors, The Psychology of Second Language Learning: Papers from the Second International Congress of Applied Linguistics, chapter 4, pages 35--43. University Press, Cambridge. D. Sleeman. 1982. Inferring (mal) rules from pupil's protocols. In Proceedings of ECAI82, pages 160--164, Orsay,France. ECAI82. Catherine E. Snow and Marian HoefnagelHohle. 1982. Second language learners' access to simplified linguistic input. Language Learning, 32(2):411--430, December. Hans Spada. 1993. How the role of cognitive modeling for computerized instruction is changing. In Paul Brna, Stellan Ohlsson, and Helen Pain, editors, Proceedings of AIED'93, World Conference on Artificial Intelligence in Education, pages 21--25, Edinburgh, Scotland, August 2327. Association for the Advancement of Computer in Education (AACE). Invited talk. Karen Sparck Jones. 1991. Tailoring output to the user: What does user modelling in generation mean? In C'ecile L. Paris, William R. Swartout, and William C. Mann, editors, Natural Language Generation in Artificial Intelligence and Computational Linguistics, The Kluwer International Series in Engineering and Computer Science, chapter 8, pages 201--225. Kluwer Academic Publishers, Boston, Dordrecht, and London. Linda Z. Suri and Kathleen F. McCoy. 1993. A methodology for developing an error taxonomy for a computer assisted language learning tool for second language learners. Technical Report TR9316, Dept. of CIS, University of Delaware. Linda Z. Suri. 1993. Extending Focusing Frameworks to Process Complex Sentences and to Correct the Written English of Proficient Signers of American Sign Language. Ph.D. thesis, University of Delaware. Available as Dept. of CIS Technical Report TR9421. William R. Swartout. 1983. XPLAIN: A system for creating and explaining expert consulting systems. Artificial Intelligence, 21(3):285--325. J. Sweller. 1988. Cognitive load during problem solving: Effects on learning. Cognitive Science, 12:257--285. Lev Semenovich Vygotsky. 1986. Thought and Language. The MIT Press, Cambridge, Massachusetts. Translation revised and edited by Alex Kozulin; originally published in 1934. Wolfgang Wahlster and Alfred Kobsa. 1986. Dialogbased user models. In Giacomo Ferrari, editor, Proceedings of the IEEE, Special Issue on Natural Language Processing, February. Gay N. Washburn. 1994. Working in the ZPD: Fossilized and nonfossilized nonnative speakers. In James P. Lantoff and Gabriela Appel, editors, Vygotskian Approaches to Second Language Research, Second Language Learning, chapter 4, pages 69--81. Ablex Publishing Corporation, Norwood, New Jersey. Ralph M. Weischedel, Wilfried M. Voge, and Mark James. 1978. An artificial intelligence approach to language instruction. Artificial Intelligence, 10:225--240. Beverly Woolf and David D. McDonald. 1984. Building a computer tutor: Design issues. IEEE Computer, 17(9):61--73, September. Beverly Park Woolf. 1984. Context Dependent Planning in a Machine Tutor. Ph.D. thesis, Dept. of Computer and Information Science, University of Massachusetts at Amherst, May. COINS Technical Report 8421.