Copyright © 1993-2001 by the Xerox Corporation and Copyright © 2002-2007 by the Palo Alto Research Center. All rights reserved.
XLE is a combination of linguistic tools developed at PARC and Grenoble XRCE, plus a Tcl user interface to them. This document currently only describes the Tcl user interface. Documentation on the LFG formalism is in a separate file.
If you are viewing this document from an HTML browser, you can get to a table of contents by clicking on any of the underlined headers.
Each user of XLE should enable XLE by adding the following to their .login file (replace $xledir with the name of the XLE directory):
This can be put in a script for convenience. (NB: Do not create a link to $xledir/bin/xle directly instead of using PATH, since this will cause problems for XLE.)
If you use bash, you should use something like:
The only thing that XLE depends on is an X server. All of the platforms come with their own X server except MacOS X. MacOS 10.3.5 (Panther) comes with an X server, but it is not installed by default. If it is not installed on your Mac computer, then run the install disks again after unchecking everything else and checking the X Server box. It can also be downloaded from the apple web site.
Some X servers (such as Exceed) do not update window bar titles when they are changed programatically. This means that you may need to make a window into an icon and open it again in order to have the window bar display the correct information.
If you are at PARC, the first thing that you will need to do is to enable XLE, either by typing "enable xle" in your shell or by putting "enable xle" in your .login file. If you are at another site, check with the person who installed XLE to find out how to enable it, or read about enabling XLE in the documentation on installing XLE.
Once you have enabled XLE, all that you need to do to start XLE is to type "xle" in your shell (note: do not fork via "xle &" because XLE uses the shell for command inputs). When XLE is loaded, the shell that XLE was started in will be converted into a interactive Tcl shell. The Tcl shell uses "%" as a prompt.
To load a grammar for parsing, type:
% create-parser "filename"
To parse a sentence, type
% parse "sentence"
or
% parse {sentence}
For example:
% create-parser english.lfg
% parse {This is a sentence.}
The results of the parse will be automatically displayed in a Tree Window and an F-Structure Window described below.
There is a special emacs mode, lfg-mode, that makes some aspects of grammar
writing and interacting with XLE easier. To use lfg-mode, load the library
package lfg-mode.el with the command M-x load-library lfg-mode,
or put the command
(load-library "lfg-mode")
in your .emacs file. The file lfg-mode.el or the compiled version lfg-mode.elc should be placed in a directory on your Emacs load-path so that Emacs can find it. Alternatively, you can specify the complete file name when executing the load-library command.
For files with the extension .lfg, lfg-mode is invoked automatically when the library package lfg-mode is loaded. Otherwise, use the command M-x lfg-mode to invoke lfg-mode.
The lfg-mode supports the notion of coding system that is introduced in emacs-20. (You would need this version of emacs to work with non-roman alphabets.) The default coding system used in xle is iso-latin-1 (aka 8859-1) which is suitable for most European languages, but this can be overriden by the user by placing a command like
(setq xle-buffer-process-coding-system alternative-coding-system)
in the .emacs file before the call to load lfg-mode. For example, to use the coding system named junet, place the command
(setq xle-buffer-process-coding-system 'junet)This will cause emacs to use the coding system junet instead of the default iso-latin-1. If you use emacs-20, a list of coding systems is available from the top-level Mule menu under the subheading "Describe coding systems."
If you have the package imenu.el, lfg-mode will give you a menu of lexical items, rules, and templates and options for starting an XLE process in another window (see documentation below). Also, buffers created with the command inferior-xle-mode or with the menu options in an lfg window will have a menu of options for starting and restarting xle and for creating a parser.
By default, lfg-mode uses font-lock, so that parts of expressions are displayed in different colors. To switch off font-lock mode, use the command M-x font-lock-mode. The variable lfg-color-level determines what parts of an expression appear in color. The default is for several parts of expressions to appear in color. If you want only comments to appear in color, set the variable lfg-color-level to 0 in your .emacs file. This must appear before you load lfg-mode. For example:
(setq lfg-color-level 0) (load-library "lfg-mode")
The default color set uses bright colors. For more muted colors, using the default Emacs color values, set the variable lfg-more-colors to nil in your .emacs file before loading lfg-mode:
(setq lfg-more-colors nil) (load-library "lfg-mode")
To customize colors for the mode, put something like the following in your Emacs .init file:
(add-hook 'lfg-mode-hook
(function (lambda ()
(set-face-foreground font-lock-string-face "ForestGreen"))))
This will produce comments in ForestGreen. Use M-x list-colors-display to see what colors are available.
When the cursor is in a buffer that is in lfg-mode, an additional menu item "LFG" becomes available at the top of the screen. Clicking with the left mouse button on this menu item gives several choices:
You can search for rules, rule macros, templates, and lexical items in other buffers that have been loaded into Emacs by positioning your cursor on the word you are searching for and executing the command M-" (meta-quotation mark). To go back to the original buffer and position, execute the command C-" (control-quotation mark). Unless you have created a TAGS database for the files you wish to search (see below), this will only work for buffers for which menus have already been created. To create a menu for a buffer in lfg-mode, choose the command "Rules, templates, lexical items" in the LFG menu.
The extended searching option will also work for all files for which you have created a TAGS database. You can create a TAGS database for a file or files by executing the Unix command gtags with the filename(s) as the argument. For example, this command will create a TAGS database for all files with the extension .lfg in the current directory:
gtags *.lfg
The TAGS database will need to be recreated periodically as your files change. To update the database, execute the gtags command again. If you have created a TAGS database, then you can use the following commands:Consult Emacs Info for more information on using tags.
There is another special emacs mode, xle-mode, that makes interactions
with the XLE buffer easier. As with lfg-mode, this mode is available by
loading the library package lfg-mode.el with the command M-x load-library
lfg-mode, or putting the command:
load-library "lfg-mode"
in your .emacs file.
For XLE buffers created by using the menus in a buffer in lfg-mode, or with the commands M-x run-xle or M-x run-new-xle, xle-mode is invoked automatically.
When the cursor is in a buffer that is in xle-mode, an additional menu item "XLE" becomes available at the top of the screen. Clicking with the left mouse button on this menu item gives several choices:
(setq lfg-default-parser "mylanguage.lfg") (load-library "lfg-mode")
COMMENT{TYPE NAME TEXT}
If the same comment is associated with more than one type and name, the following abbreviation can be used:
COMMENT{{TYPE1 NAME1}{TYPE2 NAME2} TEXT}
COMMENT can be abbreviated COM. For comments between definitions, COMMENT-FILE or COM-FILE should be used rather than COMMENT; otherwise, an incorrect NAME for the comment may result.
Any value of TYPE may be used. The following values and their abbreviations are standard:
| FEATURE | FEAT |
| TEMPLATE | TEMP |
| MACRO | |
| CATEGORY | CAT |
| LEX-ENTRY | LEX |
| OTMARK | OT |
| PARAMETER | PARAM |
| EXAMPLE | EX |
| MISC |
NAME is the name of the feature/template/rule being commented on for comments with COMMENT, or the name of the file for comments with COMMENT-FILE.
TEXT is the text of the comment, which may not contain a closing curly bracket (}).
These comments are collected by the commands M-x lfg-display comments (for the current file), M-x lfg-display-comments-all (for all .lfg files in the current directory), and M-x lfg-display-comments-config (for all .lfg files mentioned in the CONFIG). These commands are also accessible from the LFG menu. Executing these commands produces a new buffer containing comments organized by NAME and TYPE. If any EXAMPLE comments are found, these also appear in a separate buffer in test file format.
# -*- mode: lfg; coding: euc-jp -*-or the following near the end of the file:
# Local Variables: #The # character is used because this is the comment character for XLE test files.
# mode: lfg #
# coding: euc-jp #
# End: #
Similarly, if you wanted to make the file japanese.lfg always be displayed with Japanese characters, you could add the following to the first line of the file:
" -*- mode: lfg; coding: euc-jp -*- "or the following near the end of the file:
" Local Variables: "The quote character is used here because this is the comment character for XLE grammar and lexicon files. For more information on the possible variables that can be set using this method, please see the Emacs documentation on File Variables.
" coding: euc-jp "
" End: "
You can find information on how to enter accented characters in Emacs here. See the section on Emacs File Variables for information on how to specify the character set of a file.
If you use Emacs 22 or later, then you can use Mule for Unicode character sets. Here is an example of how you would use Mule to work on a Russian grammar encoded in UTF-8:
The Tcl Shell Interface is the main interface to the XLE system. You can use it to load grammars, parse sentences, and view documentation. Also, process messages such as "loading grammar ... " get printed in the shell. Here are examples of commands that you can type:
% create-parser "demo-eng.lfg"
% parse Hello
% set parser [create-parser "demo-eng.lfg"]
% parse {This is a test.} $parser
% help
The Tcl Shell uses Tcl syntax for commands. This means that the double-quote ("), left brace ({), right brace (}), left bracket ([), right bracket (]), and dollar sign ($) characters are treated specially. All of the Tcl commands and syntactic conventions are available to you in the Tcl Shell.
The dollar sign is used to signal that the following token should be replaced by its value, so that
% parse {This is a test.} $parser
means take the value of the variable parser and pass it as the parser.
Braces are used to group things into a single argument. In the examples above, Hello and {This is a test} are both the first argument of parse. Putting braces around Hello is optional, but if the braces were missing from {This is a test}, then Tcl would have complained that there were too many arguments.
Double quotes are also used to group things into a single argument. The only difference between braces and double quotes is that dollar sign substitutions are not allowed inside of braces. So if you want to parse a sentence with dollar signs, braces, or double quotes in it, you need to use braces:
% parse {This cost $5.}
% parse {"Yikes!", he said.}
If you use braces around the input, the only special character that you will have to worry about is the backslash (\) character. In particular, a backslash followed by a brace (\}) is not treated as a close brace.
XLE automatically looks in the home directory and the current directory for .xlerc files (e.g. files named xlerc or .xlerc). The home directory .xlerc file allows you to customize your Tcl Shell however you want. For instance, you can customize the fonts that xle uses in the display. You can also write your own functions in Tcl for you to use and put their definitions in your home .xlerc file (see below). The current directory .xlerc file is useful for grammar-specific tasks. For instance, you can put the command create-parser in the .xlerc file of a grammar directory so that the grammar will be automatically loaded when you start xle from its directory.
When XLE is loading the .xlerc files, it first looks in the home directory for a .xlerc file and loads whatever it finds. Then, if the current directory is not the home directory, it looks in the current directory for a .xlerc file and loads whatever it finds. If there is both a xlerc file and a .xlerc file in the same directory, then XLE loads both and prints a warning. You can override the .xlerc file loaded from the currenct directory by using the following syntax: xle yourfile.tcl or xle -file yourfile.tcl. You can also use these in combination, e.g. xle firstfile.tcl -file secondfile.tcl or xle -file firstfile.tcl -file secondfile.tcl
Tcl makes it easy to write short procedures to do common tasks. For instance, suppose that you are always parsing sentences from a test file by typing something like parse-testfile my-testfile.lfg N, for different values of N. If you are tired of typing all this each time, you can add the following to your .xlerc file:
proc test {N} {
parse-testfile my-testfile.lfg $N
}
The $N means "substitute the value of N here". Once this is defined, typing test 7 will accomplish the same as typing parse-testfile my-testfile.lfg 7.
If you want to suppress loading of the default .xlerc files, you can use the command line argument -noxlerc. This option will not prevent the loading of explicitly specified files.
Another command line option is to specify a Tcl command that will run
immediately after all the files are loaded, and before the interactive session
begins. This is done using the -exec (or -e) option. You can use this facility
to start xle in a certain mode or perform any task that you wish. In particular,
you can include in your script files arbitrary procedures that can be invoked
this way. For example,
xle -e "create-parser grammarfile; parse {John sleeps.}"
will start xle, load a grammar and parse an initial sentence.
If you use the -noTk option, then XLE will suppress the initialization of Tk and X. This is useful for running XLE in a batch mode where there is no X server. If the script invokes Tk, then you will get an message about an application-specific error involving $DISPLAY or Tcl will complain that "winfo" is undefined.
If you use Perl or another scripting language to give Tcl commands to XLE, then you may run into problems where the output of Tcl commands implemented by XLE isn't synchronized with the output of standard Tcl commands. If this happens, add an xle_flush command after the XLE Tcl command.
We use Tcl to display trees, f-structures, and charts. The parts of these displays that are mouse-sensitive change their color when the mouse moves over them. You can find out what a mouse-sensitive part does by clicking on it with the right mouse button (<Button-3>). The documentation uses the Tcl convention for describing mouse actions. Typical mouse actions are <Button-3> (click the right mouse button), <Control-Button-1> (click the left mouse button while holding down the Control key), and <Control-Shift-Button-1> (click the left mouse button while holding down the Control and Shift keys). The documentation for pull-down menu items can be obtained by typing 'h' with the cursor over the menu item.
You can control the appearance of the graphical interface, by changing fonts and window sizes (useful when running XLE on a small screen). The window sizes of the four main windows can be controlled by setting the Tcl variables "window_width" and "window_height", which measure the window dimensions in pixels (the default values are 500 and 400 respectively).
There are now four different variables to control the use of fonts in the various XLE displays, one for the user interface (i.e. buttons), one for displaying feature structures, one for displaying texts and one for displaying trees. The terminals (surface forms) in the tree displayer use the text font, which makes it possible to separate them visually from the rest of the nodes. To change the fonts, simply set the variables "xleuifont", "xletextfont", "xletreefont" and "xlefsfont". Font names can be given using the naming convention of either X windows (i.e. -adobe-courier-medium-r-normal--12-120*) or Tk (i.e. {Times 14 bold}). Note that it is now possible to use fonts with proporional spacing, although the default font is still Courier.
Another aspect of the graphical interface that can be modified is the vertical and horizontal spacing in the tree layout algorithm. Following a facility that exists in the Medley system, this is done by changing the variables "CSTRUCTUREMOTHERD" and "CSTRUCTUREPERSONALD". Note, however, that the spacing is also proportional to the font used for displaying trees and therefore modifying these variables should not be necessary in general.
The tree window is used to display c-structure trees. There can only be one tree display window active at a time. There is a row of buttons and menus at the top that provide some control, plus the nodes of the trees are sensitive to mouse commands. You can get more documentation for the buttons and the mouse-sensitive nodes by clicking on them with the right mouse button (<Button-3>).
The items in the Views menu controls how the tree is displayed. Each item has a one-letter accelerator after it. Typing this letter in the tree window has the same effect as clicking on its menu item. The "node numbers" menu item determines whether node numbers are displayed on the nodes of the tree. The "partials" menu item determines whether or not the partial constituents used internally by XLE to deal with multiple daughters are displayed. The "tokens" menu item determines whether or not the terminals of the tree correspond to the tokens produced by the tokenizer or correspond to the surface forms of the input sentence.
Trees are displayed in a standard inverted format, with the root at the top and the leaves (lexical items) at the bottom. Only one tree is displayed at a time in the tree window. If you want another tree, then click on the "next" button. This will give the next good tree unless there are none, in which case it will give you the next bad tree. The "prev" button will let you back up. Trees are numbered according to the order that they are displayed. This means that if you parse the same sentence twice, there is no guarantee that the trees will get the same numbers in both parses, unless you only use "prev" and "next" to visit the trees (using "most probable" or clicking on a choice in one of the fschart windows can change the display order). If there is a file loaded for choosing the most probable parse, then that parse will be displayed first.
You can look for a particular tree using the tree window or the bracketing window. To look for a particular tree using the tree window you construct the tree by starting from the top node and recursively choosing the correct sub-tree at every level. Only the nodes that have dotted lines under them have more than one sub-tree. You can get the next sub-tree of a node by clicking on it with the left button (<Button-1>). You can get the previous sub-tree via <Shift-Button-1>. If there are no more trees in the direction indicated, the button will flash. If you can't find the desired sub-tree, it may be that you are looking at the wrong constituent. For processing reasons, xle has a different node for each place that a rule might stop (i.e. for each final state in the underlying finite-state machine that corresponds to the rule). So if you can't find the sub-tree that you want, see if the node above it has another sub-tree with the same category name but a different node number.
If a node in a tree is boxed, then that is a node where the f-structure
information went bad (i.e. there are no valid f-structures above this
node). If you click on a boxed node with the middle button (<Button-2>)
you should get to see the first bad f-structure. Sometimes, however, no f-structure
shows up. This may be because the node that is actually bad is a partial
node that is not shown on the display; this frequently occurs when the node
dominates a sublexical rule. You should
click on the "partials" menu item in order to see it. Or, you can click on
each daughter node with
If you click on a tree node with
The bracketing window allows you to select contiguous material in the parsed sentence and insist that it be a constituent, possibly of some particular category. The XLE graphical interface constrains the trees it displays to be consistent with the choices you've made in the bracketing window. You may also insist that some material not be a constituent.
The bracketing window is displayed by clicking on the "Show Bracket Window" menu item in the Commands menu of the tree window. In the bracketing window, the sentence is displayed with alternate tokenizations shown above one another. Buttons (labelled with "#") appear between the tokens.
Clicking <Button-1> on two of these "#" buttons brackets the material in between. The brackets are shown in the window. Only trees and solutions in which the bracketed material forms a constituent are displayed in XLE's tree and f-structure windows. Alternatively, clicking <Shift-Button-1> on two of these buttons "debrackets" the enclosed material; only trees and solutions in which the debracketed material does not form a constituent are displayed. Debrackets are displayed using "![" and "!]". A pair of brackets or debrackets can be removed by clicking <Control-Shift-Button-1> on one element of the pair. Clicking <Button-1> on a bracket pops up a menu of the categories that span the bracketed material. You can specify that a category be excluded, in which case no tree containing that category spanning the bracketed material will be displayed. You can specify that a category be included, in which case only trees containing that category spanning the bracketed material will be displayed. If you click on the "show" button for a category, then XLE will display the category's edge in the tree display window.The f-structure window is used to display feature structures. Feature structures are displayed as an attribute-value structure. Links between different parts of a feature structure (from, for instance, (^ SUBJ) = (^ XCOMP SUBJ)) are displayed by giving a path name. The f-structure is not mouse sensitive.
At the top of the f-structure window there is a row of buttons and menus that affect the f-structure display window. In general, you can get the documentation for each button by clicking on the button with the right-most mouse button (<Button-3>). Documentation for the menu items can be obtained by typing 'h' while holding the cursor over the menu item.
The "prev" and "next" buttons allow you to enumerate the feature structures (f-structures) in a manner similar to the "prev" and "next" buttons in the tree window. The valid feature structures are displayed first, and then any invalid feature structures. The reasons that a feature structure is invalid are highlighted in black. This may involve highlighting a relation that would not otherwise be visible, such as negated constraints or the arglist relation. When relations need to be displayed, they are put after the attributes. Their "value" is actually a set of values enclosed by parentheses. Sometimes the values are preceded by a "~" or a "c", which indicates that their is a negative or sub-c constraint on that value. Attributes which are equated to a constant value usually just display that value, but if there are additional constraints, then an attribute-value structure is displayed with a single equality relation with multiple values.
The items in the Views menu controls how the f-structure is displayed. Each item has a one-letter accelerator after it. Typing this letter in the f-structure window has the same effect as clicking on its menu item. The "abbreviate attributes" menu item suppresses all of the attributes except those that appear in the abbrevAttributes Tcl variable. The "constraints" menu item determines whether or not negated and sub-c constraints are included in the display. The "node numbers" menu item determines whether node numbers for each f-structure are displayed in a column along the left side of the f-structure. The "subc constraints" menu item determines whether or not sub-c constraints are included in the display.
Above each f-structure there is a label and buttons for each projection from that f-structure. Clicking with the left mouse button on a projection button causes that projection to be displayed. Projections that have a * in them (like o::*) are actually projections off of the c-structure that are displayed in the f-structure window for convenience.
Whenever you click on a node in a tree with the middle button while the Control key is held down, the constraints associated with that node will be printed in the Constraints Window (if you also hold down the Shift key, then the constraints associated with the partial above the node will be printed). The constraints will come from the lexicon if the node is a pre-terminal node, and otherwise they will come from a rule. The constraints are the base constraints that are obtained when all of the templates have been expanded. Constraints that are filtered from the grammar before instantiation are printed with a comment after them. For instance, if an =c constraint is globally incomplete, it will printed with a "GLOBALLY INCOMPLETE" comment following it. Similarly if an optimality constraint has a NOGOOD mark, then it will be printed with a "NOGOOD OPTIMALITY MARK" after it. Since these constraints aren't instantiated they won't appear in the f-structure window (even among the invalid f-structures) and so the only way to see them is to use the Constraints Window.
The f-structure chart windows are used to display two different views of a packed representation of all of the valid solutions. One window indexes the solutions by constraints. The result is an f-structure that is annotated with choices to show where alternatives are possible. The other window indexes the solutions by choices. The result is a tree of choices with their corresponding constraints. The choices in both windows are active. When you click on a choice, then a solution corresponding to that choice is displayed in the tree window and the f-structure window.
The f-structure chart window indexes the packed solutions by their constraints, so that each constraint appears once in an f-structure annotated by all of the choices where that constraint holds. By default, this window appears at the upper right of the display. There are three menu items under the Views menu that control how the f-structure is displayed. Each item has a one-letter accelerator after it. Typing this letter in the f-structure chart window has the same effect as clicking on its menu item. The "abbreviate attributes" menu item suppresses all of the attributes except those that appear in the abbrevAttributes Tcl variable. The "constraints" menu item determines whether or not negated and sub-c constraints are included in the display. Finally, the "linear" menu item changes the display into a line of tokens with corresponding f-structures.
The f-structure chart choices window indexes the packed solutions by the alternative choices. By default, this window appears at the lower right of the display. Choices are labeled a:1, a:2, a:3, ... b:1, b:2, b:3, etc. The choices that belong to the same disjunction have the same alphabetic string as a prefix. The disjunctions are laid out vertically in the window, with the "a" disjunction shown first, then the "b" disjunction shown second, and so on. At the left of each disjunction is its context. Top level disjunctions are given the True context. Embedded disjunctions are given the choice that they are embedded under. Sometimes disjunctions are embedded under more than one choice (because of the way that the chart works). When this happens, then the context is itself disjunctive. Alternatives within a disjunction are laid out vertically, with the constraints that they select displayed on their right. If a constraint has a conjunctive context, then the constraint will show up under both choices contextualized by the remaining choice. Thus a:1 & b:2 -> foo will appear under a:1 as b:2 -> foo and under b:2 as a:1 -> foo. When an f-structure only has one predicate name, then its name will be printed after the f-structure variable for easy identification (e.g. f12:WITH $ (f15:TELESCOPE ADJUNCT)). If there are no f-structure constraints, it is sometimes useful to display the subtree constraints (by using the "subtrees" option in the Views menu), for finding solutions that only differ in the c-structure.
The choices in the f-structure chart window and the f-structure chart choices window are active. When you click on a choice with the left button, then a solution corresponding to that choice is displayed in the tree window and the f-structure window. A selection is a fully specified choice of exactly one solution. One way to select more than one solution is to use the narrowing facility (which is somewhat related to underspecification). You can mark a choice as being nogood by clicking on a choice button in the f-structure chart choices window with the middle mouse button. This will effectively mask out all the solutions that include the choice. XLE will grey the button and report the number of remaining solutions in the title of the window. Clicking the choice button again will toggle the nogood property back to its previous setting. If you want to specify that only one choice in a disjunction is good, you can press the shift key and the middle button at the same time. This will turn every other choice in the disjunction to nogood.
To enumerate the solutions according to the choices they include, you can use the "next solution" and "prev solution" buttons in the chart choices window. This enumeration honors the nogoods so it only goes over the narrowed set of solutions. If you want to include the nogood solutions, press the shift key when clicking on "next solution" and "prev solution" buttons.
Also, if you enumerate the solutions using the next/prev buttons on the tree or fstructure window, the selections corresponding to the currently displayed solution will be highlighted in the fschart window. If you want to clear the selection, there is a "Clear Selection" command in the command menu of this window. This can be useful when you want to print the choices in the Prolog format and you don't want the selection to be recorded there as well.
The menu items in the Views menu control what is shown and how it is shown. Each item has a one-letter accelerator after it. Typing this letter in the f-structure chart choices window has the same effect as clicking on its menu item. The "abbreviate attributes" menu item suppresses all of the attributes except those that appear in the abbrevAttributes Tcl variable. The "constraints" item causes sub-c and negated constraints to be displayed when it is enabled. The "disjunctions only" item only shows the disjunction structure (no constraints). The "OT marks only" item only shows the optimality mark constraints. The "subtrees" item causes c-structure subtrees to be displayed as binary rules. The "unoptimal" button causes XLE to display unoptimal solutions by disabling the OPTIMALITYORDER defined in the grammar config. The next three items mentioned are mutually exclusive: only one can be enabled at a time. They have to do with how the disjunctions are laid out. The "flat choices" item is a flat list of disjunctions. The "nested choices" item causes each disjunction to be nested within the choice that define its context under the constraints that are particular to that context. If a choice has several disjunctions embedded underneath it, then the disjunctions will be separated by a black line for readability. The "re-entrant choices" item is like the "nested choices" item, except that disjunctions that are defined in a complex context (such as "{a:1 | b:1}") are displayed once at the level of the disjunctions in the TRUE context instead of being duplicated under each context (e.g. once under "a:1" and once under "b:1").
If the disjunctions are sufficiently complicated, then XLE will not be able to display the disjunctions nested within the window size limits allowed by Tcl. In this case, XLE will turn off nesting and print the following message:
A nested disjunction had to be truncated.
Nesting has been turned off so that
the 'more' button can do the right thing.
Just below the menu items is a list of optimality marks that are present in any of the solutions to the current input. The list is prefixed with "OTOrder:". The optimality marks are in the order given by the current OPTIMALITYORDER. If an optimality mark is prefixed with a "*", then it is an ungrammatical mark. If it is prefixed with a "+", then it is a preference mark. Clicking on an optimality mark in this list temporarily removes it from the ranking so that it has no effect on the relative ranking of analyses. Clicking again restores the optimality mark.
Both the f-structure chart window and the f-structure chart choices window can be very long. If either window exceeds a certain limit, then the displayer will add a "more" button at the end. Clicking on the "more" button will cause a new window created that displays another chunk of the data.
KNOWN PROBLEMS
The packed representation used for the display is the same one used by "print-chart-graph" and "print-prolog-chart-graph". The structure of the disjunctions that appears in these displays is an artifact of the computation, and won't always match one's intuitions about how the disjunctions should be factored. At some future time we may try to add code for re-factoring disjunctions. Also, the code for extracting a packed representation will produce an incorrect representation when you are extracting from the generator and there are is a solution with discontinuous heads that have the same category. There isn't even code to detect this situation, so you should use this on the generator with caution.
The code for producing a packed representation attempts to normalize the packed representation by rewriting equalities and eliminating redundant constraints. Unfortunately, normalizing the packed representation can cause XLE to timeout. In this case, XLE will print out the message:
extract_chart_graph aborted because of timeout.
You might try setting normalize_chart_graphs to 0 and try again.
Setting "normalize_chart_graphs" to 0 will turn off normalization, which may allow XLE to produce the packed representation within the time alloted.
The chart window displays all of the edges in the chart. The edges are stacked according to depth: the lexical items are at the bottom, the edges that build on lexical items are immediately above them, those that build on pre-terminals are above them, and so on.
The morphology window displays all of the morphological edges. The edges are displayed with the tokens on the left, descending vertically in order they appear in the sentence. To the right of each token comes any preterminals for that token (e.g. matching lexical entries that have a * morph code). Below the token preterminals are lexical forms for the token, and to the right of each lexical form are preterminals for the lexical forms. If there is a ?? at the end of the edge name for a lexical edge, then the lexical edge wasn't found in the lexicon. If there is a ? at the end of the edge name for a pre-terminal edge, then the pre-terminal came from the -unknown entry. If there is a * at the end of an edge, then the edge didn't have any valid solutions or the solutions weren't computed.
Tcl has a built-in facility for traversing menu items within a window without using a mouse. It is invoked by typing the F10 key. After the F10 key is typed, the first menu of the window that currently has the input focus will be displayed. You can choose an element within this menu by using the up and down arrows. After the desired menu item is selected, type a carriage return to invoke it. You can cycle through the different menus associated with the current window by using the left and right arrow keys. Typing ESC aborts the menu traversal mode initiated by the F10 key.
In addition to Tcl's built-in facility, XLE provides a means for cycling the input focus through the XLE windows. Typing the F9 key will cause XLE to move the input focus from an XLE window that has the input focus to the next XLE window. This means that repeatedly typing the F9 key will cycle the input focus through all of the XLE windows including the Tcl shell window. However, typing F9 will not move the input focus from the Tcl shell window to an XLE window unless the Tcl shell window is in an Emacs buffer and lfg-mode is loaded.
A generator is the inverse of a parser. A parser takes a string as input and produces f-structures as output. A generator takes an f-structure as input and produces all of the strings that, when parsed, could have that f-structure as output. The generator can be useful as a component of translation, summarization, or a natural language interface. It can also be used to test whether a grammar overgenerates.
Sometimes it is not desirable for the generator to be the exact inverse of the parser. For instance, although the parser eliminates extra spaces between tokens, you may not want the generator to insert arbitrary spaces between tokens. To handle this, you can make the generation grammar be a little different from the parsing grammar by changing the set of optimality marks used (through the GENOPTIMALITYORDER configuration field) and by changing the set of transducers used (through the "G!" prefix in the morphology configuration file). These mechanisms allow you to vary the generation grammar as needed while still being able to share as much as possible with the parsing grammar.
The generator in XLE is associated with the following commands:
The sections below describe the use of these commands.
create-generator "grammarfile"where "grammarfile" is the root file of a grammar. This will create a generator that uses the given grammar file, except that the GENOPTIMALITYORDER optimality marks will be used instead of the OPTIMALITYORDER optimality marks and the G! morphological transducers will be used instead of the P! morphological transducers.
Creating a generator may take a considerable amout of time, because it may require indexing the lexicons on their content, if previously-created indexes are no longer valid (because of changes to the lexicon, templates, or initial grammar file). This also has the side effect of checking the syntax of the content of any files that have to be re-indexed. This can be handy for debugging and catching typographical errors. If all of the files have to be re-indexed, then XLE will report the names of any features in the feature declaration section that are never used in the grammar. You can force all of the files to be re-indexed if you want by making a minor change to the root file of the grammar.
The generator takes f-structures as input. The only format that is currently supported is the Prolog format that the parser produces as output. To generate from a Prolog file, use the following command:
generate-from-file "filename" ("count")
The "filename" argument specifies the name of the prolog file that contains the f-structure to be used as input. The "count" argument specifies how many f-structures to generate from if the prolog file is a packed representation of multiple f-structures. The default value for "count" is 9999.
There is also a way to generate from a set of files in a single directory:
generate-from-directory "directoryname"
This command will enumerate the files in the given directory and call generate-from-file on each one. You can create a directory full of prolog files using
parse-testfile "testfile" -outputPrefix "dirname/"(the trailing slash is required and the directory named "dirname" must already exist).
You can also generate from the output of the parser using the "Generate from this FS" command in the f-structure window of the parser.
The generator produces a packed representation of all of its output strings using a regular expression notation. For instance:
You
{ make copies of books magazines or other bound {and|&} large documents or
can also detach the scanner if you want to scan.
|can also detach the scanner if you want to scan or make copies of books
magazines or other bound {and|&} large documents.}
The generator output is printed on $gen_out_strings, which defaults to stdout. Error messages are printed on $gen_out_msgs. These variables should be set using set-gen-outputs (e.g. "set-gen-outputs output.txt stderr").
If you want all of the generations listed separately instead of as part of a packed representation, set the Tcl variable gen_selector to allstrings, e.g.:
setx gen_selector allstrings
You can set this variable in the Tcl shell or in the performance variables file. This variable must be set before the generator starts generating unless you use the following notation in the Tcl shell:
setx gen_selector allstrings $myGenerator
You can revert to the standard regular expression output using:
setx gen_selector ""
You can force the generator to only produce one output by setting the Tcl variable gen_selector to one of shortest or longest.
In principle the generator should produce strings that, when parsed, have an f-structure that exactly matches the input f-structure. However, sometimes one wants to allow the generator to produce strings that have an f-structure that matches the input except for a few features. This is particularly true if the grammar has some features that are only used internally to the grammar. It may be difficult for the user of the generator to know how these features should be set. To allow underspecified input, XLE allows the user to specify a set of features that should be removed from the input and a set of features that can be added by the generator. These can be set in the following way:
set-gen-adds add "FEAT1 FEAT2 FEAT3=foo @INTERNALATTRIBUTES"
set-gen-adds remove "FEAT4 FEAT5 @INTERNALATTRIBUTES"
@INTERNALATTRIBUTES expands to all of the attributes that are not listed in EXTERNALATTRIBUTES in the grammar configuration.
The notation FEAT3=foo means that only this particular attribute-value pair is addable. To make more than one value addable, use set-gen-adds add "FEAT3=foo FEAT3=fum".
The generator will freely add any attributes or attribute-value pairs that are declared to be addable if they are consistent with the input. However, all of the nodes in the c-structure must map to feature structures that exist in the input. This means that if you have a rule like NP --> (DET: (^ SPEC)=!) ..., then you cannot make SPEC be underspecified. However, if the rule is NP --> (DET: ^=!) ... and SPEC is defined in the lexicon, then you can allow SPEC to be underspecified.
You can also associate an optimality mark with addable attributes. Whenever an addable attribute is added to the f-structure of a generation string, then its optimality mark will also be added. The effect of this depends on the optimality mark's position in GENOPTIMALITYORDER. Here is an example:
set-gen-adds add @ALL AddedFact
set-gen-adds add @INTERNALATTRIBUTES NEUTRAL
set-gen-adds add @GOVERNABLERELATIONS NOGOOD
set-gen-adds add @SEMANTICFUNCTIONS NOGOOD
In this example, all of the attributes are first assigned the user-specified AddedFact OT mark. Then the internal attributes are assigned the NEUTRAL OT mark, which makes them freely addable. Then the governable relations and semantic functions are assigned the NOGOOD OT mark, which means that they cannot be added. The net effect is that all of the attributes other than the internal attributes, the governable relations and the semantic functions are assigned the AddedFact OT mark. These attributes can be added to the f-structure of a generation string at a cost.
Note that it is possible to have more than one call to set-gen-adds, and that the calls are additive. For backward compatibility, calling set-gen-adds with no OT mark removes any existing addable attributes.
create-generator defaults to "set-gen-adds add @INTERNALATTRIBUTES".
Sometimes the input to the generator is underspecified in that the relative order of internal f-structures is not given. For instance, the adjuncts of an f-structure may not have been ordered using the scope relation. This can produce an exponential number of different outputs. You can control this by adding the BADSEMFORMIDORDER OT mark to GENOPTIMALITYORDER. This mark is added to the f-structure of a generation string whenever two semantic form ids are not in numeric order. This means that the generator will try to generate so that the generation string preserves the order given by the semantic form ids in the input f-structure. This is just a preference, though: if the grammar doesn't allow strings to preserve the semantic form id order, then the generator will pick the string that has the fewest semantic form ids out of order. Since the parser guarantees that the semantic form ids are ordered by string position, this means that the generator will tend to pick a generation string that preserves the order of the parse string. Since the transfer component tends to preserve semantic form ids, translations also tend to preserve the order of the source string.
In some circumstances you can use the generator to produce an exhaustive list of all possible variations of a grammatical phenomenon. For instance, suppose that you wanted to see all the ways that a verb could be inflected in English. You can accomplish this by removing the tense and aspect features from the input to the generator and also making them addable so that the generator can freely add them back in:
set-gen-adds remove "TNS-ASP @INTERNALATTRIBUTES"
set-gen-adds add "TNS-ASP MOOD PERF PROG TENSE @INTERNALATTRIBUTES"
Note that although you only need to remove TNS-ASP from the input since the tense and aspect features are embedded under it, you need to make all of the tense and aspect features addable. If you left one of the tense and aspect features out, then the generator would not generate anything.
Now, parse a simple sentence like John sleeps and use the "Generate from this FS" command to generate from its f-structure. The tense and aspect features will be stripped from the f-structure before it is given to the generator, and the generator will produce all possible forms:
John
{ { will be
|was
|is
|{has|had} been}
sleeping
|{{will have|has|had}|} slept
|sleeps
|will sleep}
This technique can only be used for grammatical phenomena that only vary in the values of a few features or feature complexes. This technique won't work if any of the underspecified features map to a c-structure node. For instance, if the grammar had (^ TNS-ASP)=!, where ! contained the values of the TNS-ASP feature cluster, then the generator would have refused to generate any analysis that used this constraint since it only introduces c-structure nodes that map to an f-structure in the input. On the other hand, any phenomena that obeys these restrictions can be enumerated using this technique. For instance, you might be able to generate all of the specifiers by ignoring the SPEC feature. So this technique can be used for things beyond inflectional paradigms.
Sometimes you want the generator to produce an output even if the input f-structure is ill-formed (e.g. it is not a possible output of the parser). For instance, you may want a translation system to produce an output even if some of the input f-structure didn't get translated. Also, for debugging purposes it is easier to see why an output was invalid than to see why the generator produced no output at all. XLE has two techniques to help with this. The first is to allow the generator to relax the relationship between the input f-structure and what is generated through some special optimality theory marks (OT marks). The second is to allow for a fragment grammar for generation, similar in spirit to the fragment grammar for parsing.
XLE defines some special OT marks that are useful for robust generation: MISSINGFACT, DUPLICATESEMFORMID, and BADSCOPE. They have the following interpretation:
The generator uses these OT marks to choose the generation string that minimizes the mismatch between the input and the generation string's f-structure. If one of these special OT marks is not listed in an OPTIMALITYORDER (or GENOPTIMALITYORDER), then it is implicitly NOGOOD (e.g. the OT mark and its behavior is disabled).
The second technique that XLE has for robust generation involves a fragment grammar for generation. This is similar to the fragment grammar for parsing, except that it is designed to match any input f-structure rather than being designed to match any input string. The idea is to stitch together well-formed generation strings by using a cover grammar that can match any input f-structure. Here is a sample fragment grammar for generation:
GENFRAGMENTS --> {"fragments"
NP | VP | S | ADV | ADJ
|"f-structures"
GENFRAGMENTS*: GenFragment $ o::*
{ (^ %ATTR1)=!
|! $ (^ %MOD1)}
PRED
GENFRAGMENTS*: GenFragment $ o::*
{ (^ %ATTR2)=!
|! $ (^ %MOD2)}
|"sets"
[GENFRAGMENTS: GenFragment $ o::*
! $ ^;
COMMA]+
(CONJ)
GENFRAGMENTS: GenFragment $ o::*
! $ ^
(^ COORD)=+_
}.
-token CONJ * GenFragment $ o::*
(^ COORD-FORM) = %stem;
PRED * GenFragment $ o::*
{ (^ PRED)='%stem'
|(^ PRED)='%stem<%ARG1>'
|(^ PRED)='%stem<%ARG1 %ARG2>'
|(^ PRED)='%stem<%ARG1 %ARG2 %ARG3>'
|(^ PRED)='%stem<%ARG1 %ARG2 %ARG3 %ARG4>'
}.
This GENFRAGMENTS rule produces a tree that has well-formed generation strings based on NP, VP, S, etc. at the bottom of the tree. These well-formed generation fragments are stitched together using the GENFRAGMENTS rule. The constraints (^ %ATTR1)=! and ! $ (^ %MOD1) are designed to match any attribute in the input that has the corresponding structure (NB: any variable name can be used for %ATTR1, %ATTR2, %MOD1, and %MOD2). This means that the generator can generate even if the input attribute is unknown to the grammar. The attribute name variables like %ATTR1 can be constrained just like any other variable. For instance, adding %ATTR2 ~= SUBJ to the rule near (^ %ATTR2)=! means that the SUBJ attribute cannot follow the head. Adding %ATTR1 ~$ {OBJ OBJ2 OBL} near (^ %ATTR1)=! means that the OBJ, OBJ2, and OBL attributes cannot precede the head. These sorts of constraints can be useful for reducing the number of different orders that the generation fragments can appear in.
The PRED and CONJ categories in this -token entry are designed to match a PRED or COORD-FORM attribute and generate its value as a string. They work because, by convention, the -token head word is replaced by the value of %stem, which matches the value of the attribute in the input f-structure.
The GenFragment OT mark is a user-specified OT mark (e.g. XLE doesn't assign a special interpretation to GenFragment). It is used in this fragment grammar to minimize the number of generation fragments produced. It should be put in the NOGOOD section of the parser's OPTIMALITYORDER, so that the parser doesn't make use of the GenFragment constructions. The GENFRAGMENTS rule is added to the grammar when the generator fails if it is included in the grammar config as the value of the REGENCAT field. To keep generation tractable, the generator eliminates any REGENCAT edges that are not headed.
The OT marks described above can be added to the GENOPTIMALITYORDER field of the grammar config along with other optimality marks. Here is an example of how they might be used:
GENOPTIMALITYORDER
ParseOnly NOGOOD
*GenFragment
MISSINGFACT AddedFact BADSCOPE DUPLICATESEMFORMID
STOPPOINT
GenMisc.
In this example, MISSINGFACT is dispreferred more than AddedFact, which is dispreferred more than BADSCOPE, which is dispreferred more than DUPLICATESEMFORMID (if you want them all equally dispreferred, then you can put them in paretheses). AddedFact is an example of a user-specified OT mark that is associated with addable attributes using set-gen-adds:
set-gen-adds add "NUM PERS NTYPE" AddedFact
Given this GENOPTIMALITYORDER, the generator first tries to generate without relaxing any constraints (e.g. only using the hypothetical GenMisc OT mark). If that fails, it then tries to relax the relationship between the input f-structure and the f-structure of a generation string using the MISSINGFACT, BADSCOPE and DUPLICATESEMFORMID OT marks, picking the generation string(s) that most closely match the input f-structure. At this point, generation strings are grammatical, although they might not correspond to the input f-structure. Finally, the generator allows the GenFragment OT mark, which enables the GENFRAGMENTS rule. The GenFragment OT mark is marked with a * to indicate that the output strings will be ungrammatical. The generator will choose the string(s) that minimize the number of generation fragments.
Most of the time, when the generator fails to generate a string it is because of one of the following:
One approach to debugging with the generator is to look for the desired tree in the generation chart and see why it failed. You can look for a tree by typing show-solutions $defaultgenerator in the Tcl shell and then enumerating subtrees until you get to the desired tree. Unfortunately, this can be slow and error-prone, since there can be many edges in the generation chart with the same generation vertex and category, and it is hard to tell which is the desired one. (Edges in the chart are indexed by the f-structure in the input that maps to ^. There can be several edges with the same index and category because edges also encode positions in the grammatical rule, positions in the morphology, and some state about resources that have been consumed.) To make this process easier, you can filter edges from the chart using a facility similar to the parser's bracketing tool. This is described in the next paragraph.
If you click on "Show Bracket Window" in the tree window or type show-chart-nav in the Tcl shell, then XLE will display a window of vertices in the generation chart. Each vertex will have an index and a predicate name after it (or ? if there is no predicate). The index corresponds to the GOAL relation of the input f-structure, which can be found by typing show-input and then clicking on the "constraints" item under the Views menu. (The GOAL relation is an internal relation used by XLE to index the input to the generator.)
If you click on a "show" button for a vertex in the vertex window, then XLE will show a category menu for that vertex. This is a list of c-structure categories that the generator attempted to build for this vertex. You can exclude categories by clicking on "out" and then clicking on "Apply". This will filter these categories from the chart, reducing the number of trees that you have to look through to find the right tree. (It is not easy to make edges required as in the parser because a generation chart doesn't have some of the properties that a parse chart has.) If you click on "cancel", any changes that you have made on the menu will be discarded.
If you click on a "show" button for a category in a category menu for a particular vertex, then XLE will show a menu of edges that have that category and vertex. Edges can be excluded or not, just like in the category menu described above. If you click on a "show" button for an edge, then XLE will display the edge in the tree window using "show-subtree".
If you click on "restrictions" in the vertices window, then XLE will show a menu of all of the categories and edges that have been excluded so far. This is useful for turning the exclusions off if you find that the exclusions have eliminated the tree that you are interested in. If you click on "clear" in the vertices window, this will turn off all of the current exclusions.
The techniques described above work if the generator produced the desired tree but the feature structure was ill-formed for some reason. However, sometimes the generator prunes a lexical entry or subtree before it builds a tree. Usually this happens because the lexical entry has a feature that is not addable or conflicts with the input. The only way to debug this sort of problem is to try smaller and smaller generations until you identify the lexical item or grammatical rule that is causing the problem. An easy way to do this is to parse the sentence that you wanted the generator to produce, and then use "Generate from this FS" for complete f-structures that correspond to subtrees of the parse tree that you want. (You cannot generate from incomplete f-structures because they will be incomplete in the generator, too. This means that you cannot generate from a VP, for instance.)
The best way to test a generation grammar is to parse a string, pick one of the resulting f-structures, generate from it, and see whether any of the outputs match the input. We call this process regeneration. This section lists some useful commands for doing regeneration. Please see the online "help" command for more information on how to use these commands.
The regenerate command takes a string as input and parses it using $defaultparser (the parser created by create-parser). Then it picks the first f-structure and generates from it using a generator created with the same grammar file that $defaultparser was created with. The "regenerate" command automatically creates a generator if one hasn't already been created.
The regenerate-testfile command is just like the parse-testfile command except that it regenerates each test item in the testfile instead of just parsing it. It also produces a testfile.regen file, which is a file of just the regenerations. Finally, it checks the output of the generator against the original string, ignoring minor differences in white space. Errors are written in testfile.errors.
The regenerate-morphemes command is useful for checking the generation morphology. Sometimes the generator fails because the morphemes that it needs to produce aren't accepted by the generator's morphology. regenerate-morphemes applies the generator morphology to the morphemes that are in the valid trees of the parse chart. If this doesn't produce anything, then the generator's morphology is not the inverse of the parser's morphology in a non-trivial way.
If the grammar is context-free equivalent and the input to the generator is fully-specified, then the generator should generate in time that is quadratic in the number of f-structure variables in the input in the worst case (usually linear in the typical case). A grammar is context-free equivalent if XLE can parse with the grammar in time that is cubic with the length of the input sentence. The input is fully-specified if the output of the generator is a single string.
If the grammar is not context-free equivalent or the input to the generator is not fully-specified, then the generator can generate in time that is exponential in the number of f-structure variables in the input in the worst case. In particular, if the grammar doesn't distinguish between different orders of constituents within the tree, then the number of solutions produced and the amount of time taken to generate can be proportional to N!, where N is the number of free constituents and "!" represents the factorial function. One way to reduce non-determinism in the grammar is to record the order of adjuncts using $<h<s or $<h>s instead of just $. Unfortunately, this can slow down the parser, since it forces more information to be copied up within each adjunct. Another way to reduce non-determinism is to mark adjuncts with features such as FOCUS that indicate what function the position of the adjunct plays in the sentence.
The generator uses the PREDs in the input to build a generation chart. If a word is missing a PRED (such as "it" in "it is raining"), then the generator builds generation trees that have that word optionally in every possible position, leaving it to the unifier to eliminate duplicates. This is very inefficient, especially in a free-word order language that has a functional uncertainty associated with the word. It is better if you can avoid such analyses.
Whenever a grammar gets loaded by create-parser or create-generator, XLE prints out some statistics about the size of the grammar. These statistics look something like:
grammar has 286 rules with 699 states, 1528 arcs, and 2987 disjunctsThis says that the grammar has 286 finite-state rules, and that the finite-state rules have 699 states and 1528 arcs in them. The number of arcs is a good indication of the size of the grammar. If you were to covert the grammar into an equivalent grammar consisting of only unary and binary-branching rules, then it would have about this number of rules in it. The last number, the number of disjuncts, indicates about how many different rules you would have if you further required that the equivalent grammar couldn't have disjunctive constraints. For instance,
VP --> VP PP: { (^ OBL)=! | ! $ (^ MODS) }.
would have to be converted to:
VP --> VP PP: (^ OBL)=!.
VP --> VP PP: ! $ (^ MODS).
These numbers can be useful for giving someone a rough idea of how big your grammar is.
This section describes a number of tricks for debugging grammars in XLE in both parsing and generation (see the generation section for further hints on debugging the generator).
SEARCHING FOR TREES
It's quite common that there are a large number of trees for a given sentence. Locating trees of interest can be nontrivial. The bracketing window can help you restrict the display to trees containing (or not containing) constituents of interest.
DEBUGGING THE NOTATION
Whenever you load a grammar using create-parser, XLE parses the formalism and reports warnings and errors. Warnings are given for things that are acceptable but suspicious, errors are for unacceptable notation. If XLE reports an error, then the grammar that it has is in a funny state and no guarantees can be made about what it will do if you try to use it. Usually XLE will give a line number in the file where the error occurred.
If you have separate lexicon files, then the lexicon entries that you need are parsed on demand when you use a word in the sentence for the first time. This means that you may get errors reported when you parse a sentence. If XLE reports an error, it won't give an absolute line number for the error, but rather a line number that is relative to the beginning of the entry. If you want to check all the lexical entries, you can load the generator ("create-generator filename") which will index all the lexicon files and in the process find any errors.
LOCKING SOLUTIONS
Whenever a solution is locally bad, it shows up in the f-structure display with a reason why it is bad and with the constraints highlighted that caused it to be bad, if possible. A solution on a subtree may be marked as "EVENTUALLY BAD" if it is eventually bad in all of the trees that incorporate the subtree. Similarly a solution on a subtree may be marked as "INCOMPLETE" if it is incomplete in all of the trees that incorporate the subtree. These solutions are marked as bad low down in the tree for performance reasons, so that the number of solutions that occur on intermediate subtrees remains manageable. However, it can make a grammar difficult to debug. To solve this problem, solutions that are marked as "EVENTUALLY BAD", "INCOMPLETE", or "UNOPTIMAL" can be locked using the "lock" button to the left of the f-structure label on the f-structure display window. When the "lock" button is clicked, then this solution becomes the only solution available from its subtree. This information gets propagated up the tree as if the solution had been made the only good solution of the subtree.
The "lock" button can also be used to lock a good solution. Locking a good solution will filter out all of the other good solutions in the subtree. This information will be propagated up the tree, reducing the number of solutions in the subtrees above the locked solution.
Locking only applies within a single tree; if you click "next" or "prev" in the tree displayer, then the locked solutions get unlocked automatically.
FINDING DUPLICATE SOLUTIONS
A common problem in grammar debugging is determining whether the reason that you are getting multiple solutions is that there are duplicates. One way to detect such a situation is to look in the choices window and see if there are any pairs of mutually exclusive choices that don't have any constraints in them. Another way is to use the "Print" commands in the tree window and the f-structure window to print out different structures and then use diff to see whether and how the structures differ.
One common way that you can get duplicate c-structures is if you have two entries under a word for the same category. XLE doesn't collapse these, and so you will get two identical trees with (possibly) different f-structures. A diagnostic for this case is that the lexical entry will have a dotted line between it and its duplicated category in the tree.
Another common way of getting duplicate f-structures is if you have disjunctions that are not mutually exclusive. For instance, y'all are ... will get two solutions if y'all is second person plural and are constrains its subject to be second person or plural. One way to eliminate the spurious ambiguity is to make the disjunction mutually exclusive. In this case, are could constrain its subject to be second person or (plural and not second person). If there is a possibility that a feature may be unspecified, then you will need to make the positive constraint (in this case, second person) a sub-c constraint.
This particular problem can show up even when there is not an obvious disjunction. For instance, a common way to deal with base form verbs is to say that when they are present tense, the subject is not third person singular (~[(^ SUBJ NUM)=3 (^ SUBJ PERS)=SG]). However, XLE uses DeMorgan's law to convert this into a disjunction ((^ SUBJ NUM)~=3 | (^ SUBJ PERS)~=SG). If the resulting disjunction is not mutually exclusive, you may get spurious ambiguity. The best solution in this case is to make the disjunction explicit and make it mutually exclusive.
A simple way to find disjunctions that are not mutually exclusive is to use the "Check Disjunctions" command on the tree window. This command will look through the current grammar for disjuncts that are not mutually exclusive and print them out, along with the file and line number where they occur. It also looks through the lexical items of the current chart, if there is one. You can increase the usefulness of the "Check Disjunctions" command by adding constraints to the non-exclusive disjuncts so that they become mutually exclusive. This will reduce the number of disjunctions printed by "Check Disjunctions", and so make it easier to see when new non-exclusive disjunctions appear.
Disjunctions that are not multually exclusive do not always lead to a spurious ambiguity. For instance, the rule fragment
...PP*:{ (^ OBL)=! | ! $ (^ MODS) }; ...
is not likely to lead to a spurious ambiguity. To reduce the number of disjunctions that need to be checked, "Check Disjunctions" uses a heuristic to filter out disjunctions like these. If you want to see all of the disjunctions that are not mutually exclusive, you can use the "Check All Disjunctions" command.
DEBUGGING THE MORPHOLOGY
You can print out the results of the tokenizer or the morphology when applied to a particular string by typing:
tokens {John laughs.}
might produce:
{"^ " john|John} "TB"
{ laughs. "TB" [ "_," "TB" ]*.
|laughs "TB" { "_," "TB" [ "_," "TB" ]*.|.}
|laughs.}
"TB"
While the command:
morphemes {John laughs.}
might produce:
{ john {"+Token"|"+Noun" "+Sg"}
|John
{ "+Token"
|"+Prop" {"+Misc"|"+Giv"
"+Masc" "+Sg"}}}
{ { { {laughs|laughs.} "+Token"
|laugh {"+Noun"
"+Pl"|"+Verb" "+Pres" "+3sg"}}
"_," "+Token"[
"_," "+Token"]*
|{laughs|laughs.} "+Token"
|laugh {"+Noun" "+Pl"|"+Verb"
"+Pres" "+3sg"}}
. {"+Token"|"+Punct" "+Sent"}
|laughs. "+Token"}
These commands use the morphology of the default parser ($defaultparser).
This section describes a set of tools that have been developed to build and run regression test suites, which are useful for checking progress and detecting bugs during the development of grammars, semantic lexicons, rules mapping semantic representations to knowledge representations, or transfer rules. The form of regression testing supported by these tools does not just record whether one obtains a f-structure, semantic, KR analysis or transfer for a sentence. It also matches the analyses against gold (benchmark) standards, and gives a detailed report of any points of difference.
You are strongly encouraged to store your grammar in a version control system (such as CVS or Subversion) to make it easier to find out when a particular change ended up breaking the grammar. For instance, suppose that after a week of working on your grammar, you discover that your test suite takes twice as long to process as it did the last time you ran your regression tests. Storing your grammar in a version control system allows you to back up to earlier versions to see when the slowdown began. If you are lucky, you will be able to use a divide-and-conquer strategy to narrow the problem down to a particular change that caused the problem.
A test suite can be created automatically using the command create-testsuite-dir, which takes a single optional argument:
create-testsuite-dir (<DIRECTORY>)
The optional argument specifies the path to the directory where the test suite directory ts will be created. If the command is called without this argument, then the test suite directory will be created in the current working directory (i.e., ./ts).
Note that the test suite directory and its subdirectories will also be created automatically if a test suite is run with one of the commands described in the next section, Running a Test Suite—e.g., run-syn-testsuite).
To construct a test suite by hand (e.g. for a gold standard test suite recording ground truths, rather than a regression suite recording current best analyses), you need to know something about the directory structure required by a test suite. The test suite directory must contain the following files and subdirectories.
sentences.lfg % file with example sentences
fs/ % directory with benchmark f-structures
sem/ % directory with benchmark semantic reps
kr/ % directory with benchmark KR
xfs/ % directory with benchmark transferred f-structures
xfr/ % directory with benchmark transfer structures
reports/ % directory with reports of previous test runs
tmp/ % directory with structures produced by most recent test run
fs/
sem/
kr/
xfs/
xfr/
Note that many of these directories will remain empty if you are not using the relevant level (e.g., no sem or kr). The file sentences.lfg takes the form of a regular input file for the parse-testfile command, but with two crucial additions. First, each sentence must be numbered. This is necessary to keep track of which f-structures, semantics, KRs, and xfr files belong to which sentences, by means of a file numbering convention. To number a sentence, it must be preceded by a comment line containing just the number of the sentence, e.g.
# 23 This is sentence number 23.
It is recommended that the number be surrounded by blank lines to make the test suites compatible with the format used by parse-testfile.
Second, the first line of the file should be another comment giving the highest sentence number in the file, e.g.
# 253
#1
Sentence 1.
#2
Sentence 2.
...
#253
Sentence 253.
It is recommended that sentences be numbered consecutively and without gaps, although this is no in fact enforced by XLE.
The fs, sem and kr directories contain files of gold standard structures fs<N>.pl, sem<N>.pl and kr<N>.pl, where N is the number of the sentence in sentences.lfg. It is possible for the semantics and kr benchmark directories to be empty if semantic and/or KR results are not being stored for the test suite.
The fs<N>.pl files contain normal prolog f-structures or f-structure charts. The sem<N>.pl files contain a prolog term of the form:
sem(N, 'Sentence N.' Choices, Equivs, ContextedFacts)
where N is the number of the sentence, 'Sentence N.' is the text string for the sentence, Choices is a choice structure (in the same format as for prolog fs-charts), Equivs is a list of variable definitions and/or selections (in the same format as for prolog fs-charts), and ContextedFacts is a list of contexted facts of the form cf(C, Fact).
The kr<N>.pl files are similar:
kr(N, 'Sentence N.' Choices, Equivs, ContextedFacts)
If transfer has been run before adding items to the testsuite, then the
xfs<N>.pl file will contain transferred f-structures, and
the xfr<N>.pl files will contain transfer structures of the
form
xfr(Choices,Equivs,Equalities,ContextedFacts,Documentation)
The reports directory contains reports from previous test runs.
The tmp directory contains the structures obtained when running a test suite, and which are compared against the benchmark structures. The fs, sem and kr directories parallel those of the benchmark directories. In addition, the tmp directory itself will contain files final_N.pl for the final structures produced by the last test run.
Permissions: It is important that the tmp directory and its subdirectories and files be readable and writable to anyone who might run the test suite. The other directories should be readable by anyone who might run the test suite, and writable to anyone who might add further examples. The add-to-testsuite commands default to make these directories group writable, and readable to everyone.
Test suites can be run across for any range of example numbers for any of the sub-sequences of the following levels:
The following XLE commands are available:
run-syn-testsuite % text => f-structure run-sem-testsuite % text => semantics run-kr-testsuite % text => KR run-synsem-testsuite % f-structure => semantics run-semkr-testsuite % semantics => KR run-xfs-testsuite % text => transferred f-structure run-xfr-testsuite % text => transfer-structure run-synxfs-testsuite % f-structure => transferred f-structure run-synxfr-testsuite % f-structure => transfer-structure run-multi-xfr-testsuite % text => transfer-structure run-syn-multi-xfr-testsuite % f-structure => transfer-structure
These commands can also be called specifying a range of example numbers, as follows:
run-syn-testsuite <FROM> <TO> run-sem-testsuite <FROM> <TO> run-kr-testsuite <FROM> <TO> run-synsem-testsuite <FROM> <TO> run-semkr-testsuite <FROM> <TO> run-xfs-testsuite <FROM> <TO> run-xfr-testsuite <FROM> <TO> run-synxfs-testsuite <FROM> <TO> run-synxfr-testsuite <FROM> <TO> run-multi-xfr-testsuite <FROM> <TO> run-syn-multi-xfr-testsuite <FROM> <TO>
These commands will pick up whatever the current test suite directory is, which is either the directory last specified by the command set-testsuite or the default directory ./ts. The input to the test suite run is taken from sentences.lfg in the case of runs starting from text, or from the appropriate benchmark files otherwise.
When parsing text, the currently loaded parse grammar will be used. (Note: Sometimes running the test suite may reload the grammar, due to a technical irritation.) If an example number range is not specified, all of the test suite examples will be run. When running transfer (xfr) versions of the test suite, the current active transfer grammar will be used. For multi-xfr runs, where a sequence of transfer rules is applied, the sequence is whichever one was last used, e.g. by the transfer-seq command.
For each level of analysis, the analysis results will be compared to the benchmark structures, and the differences between the best matching analysis structure and the benchmark structure will be printed out. When running across multiple levels (e.g. text => KR, which passes through syntax and semantics), the analysis result most closely matching the benchmark will be selected for subsequent processing.
The command
set-testsuite-most-probable 1
will set an environment flag that causes subsequent test runs to only compare the most probable f-structure (i.e. the one that would initially be shown in the unpacked f-structure window) to the benchmark. The default, where all f-structures are compared to find the one best matching the benchmark, can be restored by running the command
set-testsuite-most-probable 0
The command
set-testsuite-partial-benchmark 1
will set an environment flag indicating that benchmark structures are only partial specifications of the desired results. Matching will maximize recall rather than the fscore of precision and recall when this is set. To restore the default, run the command
set-testsuite-partial-benchmark 0
Comparison of analysis and benchmark results uses the triples matching mechanism. Thus transfer rules convert f-structures, semantics, KR, and xfr structures to sets of triples (more accurately, tuples, since not all semantic and KR relations are 2-place), and these are compared to find the best match. A default set of rules for converting f-structures, semantics and KR to triples is automatically loaded. It is possible to redefine these mapping rules by loading your own set of structure=>triples transfer rules. These rules must have the identifiers
grammar = fs_triples. % redefine fs=>triples grammar = sem_triples. % redefine sem=>triples grammar = kr_triples. % redefine kr=>triples
in order for them to be recognized by the test suite comparison. When running testsuites on transfer structures, you can also load a set of transfer rules with the grammar name xfr_triples to determine the mapping of transfer structures to triples. If no such rule-set has been loaded, then fs_triples will be used instead.
XLE also provides tools for grammar testing. The parse-testfile command can be used to test a suite of sentences against a grammar in batch mode. To use it, first load a grammar with the command create-parser and then call parse-testfile using the following syntax:
parse-testfile (<START> <STOP>) (-parser <GRAMMAR>)
The arguments <START> and <STOP> are sentence indices, which can be either numbers or strings. If no indices are given, then parse-testfile will parse the entire testfile. If indices are given and they are numbers, then the indices refer to the number of the sentence from the beginning of the file (1 for the first sentence, 2 for the second, etc.). If, however, the indices provided are strings, parse-testfile will parse the first sentence that matches the <START> string (i.e., has the string in it or follows a comment that does) and continue parsing sentences until it reaches the first sentence that matches the <STOP> string. For example, given a testfile sample-test.lfg such as the following
# 1 Philip K. Dick brought the anomic world of California to many of his works. # 2 Dick spent most of his career as a writer in near-poverty. # 3 Alternate universes and simulacra were common plot devices. # 4 "There are no heroics in Dick's books, but there are heroes."
the command
parse-testfile sample-test.lfg anomic simulacra
will parse from the first sentence in sample-test.lfg that has anomic in it (or follows a comment with anomic in it) up to and including the sentence that has simulacra in it (or follows a comment with simulacra in it). In other words, it will parse sentence 1 through 3.
A testfile must consist of test sentences separated by blank lines. If any of the lines begin with #, they are treated as comments. Be sure to put a blank line between a comment and the following sentence. Otherwise, the sentence will be considered part of the comment.
There are a number of command-line options for parse-testfile that follow the indices arguments and change the command's normal behavior. By default, parse-testfile uses the value of the variable defaultparser for the grammar used in parsing. The command create-parser normally sets this variable. But if multiple grammars are being tested, each can be assigned to a different variables. The option -parser can then be used to instruct create-transfer to use one of these alternative variables (rather than defaultparser). For example,
set $foo [create-parser /home/foo/grammar.lfg] parse-testfile sample-test.lfg -parser $foo
The option -outputPrefix directs parse-testfile to create a packed prolog file for every sentence parsed, which will be named according to the following convention
<PREFIX>S<INDEX>.pl
where PREFIX is the prefix given and INDEX is the sentence number for the sentence parsed. For instance, if the prefix is /tilde/smith/gold, the output of the first sentence will be stored in /tilde/smith/goldS1.pl. If you want to store in a sub-directory, then you need an explicit slash at the end of the prefix (e.g. gold/). XLE will not create the directory for you.
The option -writeFailures directs XLE to print an empty f-structure file whenever the parser fails. This empty f-structure file will include parse statistics but will not have any f-structure content in it.
When called with the option -parseProc followed by a procedure name, parse-testfile will call the procedure named on every sentence processed by parse-testfile with four arguments:
This facility can be used to process testfiles in ways not anticipated by XLE. The default value for -parseProc is defaultParserProc.
If parse-testfile is called with different start and stop indices, then it will produce a set of files that give the results. If the filename of testfile is testfile.lfg, then testfile.lfg.new will contain a copy of the testfile with performance information appended to the end of each sentence. This file should become your new testfile. After this has been done for a testfile, calling parse-testfile will also produce testfile.lfg.stats with performance information about the sentences. If there are any errors or mismatches in the number of solutions for a sentence, then the sentence will be printed on testfile.lfg.errors.
The format of the performance information is (solutions time subtrees), where "solutions" is the number of valid solutions for the sentence, "time" is the time it took to parse the sentence, and "subtrees" is the subtrees that XLE had to process in order to parse the sentence (in general, the time will be proportional to the number of subtrees).
If a grammar distinguishes between optimal and unoptimal solutions, then XLE will report the number of solutions as x+y (e.g. 7+3) where x is the number of optimal solutions and y is the number of unoptimal solutions. parse-testfile checks both numbers when deciding whether there is an error or a mismatch. This means that 7+2 won't match 7 or even 9. You can tell parse-testfile to ignore the unoptimal number by adding set ignoreUnoptimal 1 to your .xlerc file.
For instance, the following result for the sample testfile above
((1) (2+12 0.43 349) (14 words))
would indicate that the first sentence had 2 optimal solutions and 12 unoptimal solutions and took 0.43 CPU seconds to process 349 subtrees.
The value for the number of solutions will be a positive integer if a sentence successfully parses. Negative values indicate an unsuccessful parse:
If you want to specify how many solutions a sentence should have, then put a number at the beginning with an exclamation point:
(3! 5 1.03 200).
Whenever parse-testfile is run, it will compare the number of solutions a sentence should have with the number it actually got and report an error if they are different. It also reports a mismatch whenever the number of actual solutions changes.
Whenever the .new, .stats, or .errors files get remade, backup copies are made. The type of backup copy made depends on the Tcl variable "version-control". This variable is modelled after Emac's version-control variable. If the value of the variable is t, then numbered backups of the form foo.~1~ will be made. If the value of the variable is never, then a single backup of the form foo~ will be made. If the value of the variable is nil, then numbered backups will be made if a file already has numbered backups, and otherwise a single backup file will be made. You can set version-control in your .xlerc file using something like set version-control t. The default is nil.
If the grammar has a BREAKTEXT transducer in the morph config file of a parser, then you can create a testfile from a text file with:
make-testfile <TEXT-FILE> (<TESTFILE>) (<PARSER>)
The command make-testfile breaks the text in <TEXT-FILE> up into text segments using the BREAKTEXT transducer and writes the results to <TESTFILE>. It inserts a comment at the beginning of the testfile to indicate that the testfile should be parsed literally (e.g. preserving whitespace and giving no special treatment to comments, performance data, or prefixed categories). It also inserts a blank line after every text segment in order to ensure that the results is in testfile format. Blank lines in each text segment have a vertical bar (|) appended at the end so that parse-testfile and also the user can distinguish blank lines used as separators from blank lines used as text. This is designed so that
parse-testfile(make-testfile(x))
is equivalent to
parse-file(x)
The argument <TESTFILE> is optional. If it is not provided, then the results are written to <TEXT-FILE>.new. If the argument <PARSER> is not provided, the parser defaults to the variable defaultparser.
You can use diff-testfiles to find mismatches in the number of solutions for sentences in two different versions of a testfile. diff-testfiles will work even when some sentences or comments have been added or deleted from the testfile, although it will miss some sentences if the testfile gets rearranged. diff-testfiles reports mismatches between sentences in the two testfiles and also errors in the second testfile (e.g. differences between the expected number of solutions (notated as 7!, for instance) and the actual number of solutions). It also reports the sentences that it skipped because it couldn't find a corresponding sentence in the other testfile.
If you call parse-testfile with just a start index or the same start and stop index, then it will parse just that sentence and display the results on the screen (otherwise, parse-testfile won't display results). This facility can be used to parse a single sentence from a testfile. To make things even more convenient, you can create your own Tcl procedure for parsing a sentence from a particular testfile:
proc sent {n} { parse-testfile verbmobil.testfile.lfg $n $n }
If you add this to your .xlerc file, then typing the following into the Tcl shell will cause the seventh sentence to be parsed from the testfile verbmobil.testfile.lfg:
sent 7
You can use parse-testfile to construct an annotated tree bank. First, load a grammar using create-parser. Then parse the first sentence in your testfile using parse-testfile testfile.lfg 1 1. This will automatically add the buttons "next sentence" and "prev sentence" to the fschart window. Look at the packed representation and choose the correct analysis. Then, click on the print button while holding the CONTROL key down on either the fschart, fstructure, or tree window. The fschart print button will produce a packed representation of all of the solutions along with the choices that you made as a prolog term. The fstructure window will produce the current fstructure as a .lfg file. The tree window will produce the current tree as a .tree file. If the grammar assigns a value to the SENTENCE_ID attribute on the top-level fstructure or if there is a comment that begins with # SENTENCE_ID: before the sentence, then the print files will include the SENTENCE_ID value in their names. Otherwise, the next available name will be used. Finally, click on "next sentence" to get the parse of the next sentence. Continue until you are done.
You can test how well your test files cover a rule by parsing a test file and then calling
print-unused-grammar-choices $defaultparser (<CAT>)
For example, the command
print-unused-grammar-choices $defaultparser {NP[std]}
will print the unused choices in NP[std]). print-unused-grammar-choices lists the constraint disjuncts in the given rule that were not part of any valid analysis in the test files. It will also list the constrained daughter categories in the rules that were never used. It will try to report the unconstrained daughter categories as well (e.g. the VP in S --> NP: (^ SUBJ); VP), but it won't know the line number for the unconstrained daughter. If no category (CAT) is given, then print-unused-grammar-choices will print unused grammar choices for all of the rules.
If you are having trouble figuring out why print-unused-grammar-choices is reporting that a constraint disjunct or daughter category is unused, it may be that the grammar choice in question is part of a macro or template that is expanded in more than one place with different epsilon constraints in front of it (e.g. e: (^ FOO)=+; @MACRO). The process of shifting the epsilon constraints to the following category can sometimes obscure the original source of some choices.
The command generate-test-sentences can be used to generate new test sentences for a given category. Its syntax is the following:
generate-test-sentences $defaultgenerator -lexemes <LEXEMEFILE> -length <N> -rootcat <ROOTCAT>
Each test sentence will include at least one disjunct or daughter category that hasn't been covered yet. LEXEMEFILE is a list of lexemes that generate-test-sentences can use. The list is a plain text file where each lexeme is on a new line. You can specify the category of the lexeme by putting it in front of the lexeme followed by a colon (e.g. N_BASE: dog). You do not need to quote spaces or other special characters.
generate-test-sentences works by generating all possible sentences up to length N and then choosing sentences that have unused grammar choices in them. For this reason, you should choose N to be as small as possible. For instance, an N of 6 may take several minutes. In order to make generate-test-sentences practical at all, it places severe restrictions on the possible sentences:
To get around the first restriction, you should define some pseudo lexical entries like "PP" and "CPREL" that represent multi-word categories. If you put a dispreference mark in them, then generate-test-sentences will only use them when they are absolutely necessary.
There are several ways of increasing the robustness of your grammar without sacrificing performance. One way is to put a STOPPOINT mark in the OT field of the configuration, and mark rules used for robustness with marks that are stronger than the STOPPOINT mark (e.g. to the left of the STOPPOINT in OPTIMALITYORDER). These rules will only be used if the core grammar fails to find a valid analysis.
Another way to increase the robustness of the grammar is to create a special rule for collecting fragments and put the rule name in the REPARSECAT field of the configuration. This rule will only be invoked if XLE fails to find a valid analysis after all of the STOPPOINT rules have been tried. XLE will then retry with the new category and the first STOPPOINT. If written correctly, the fragments rule should always get a valid analysis (unless it runs out of resources). Here is an example of a rule for fragments:
FRAGMENTS --> { S
| CPint
| NP
| PARENP
| VP: (! SUBJ PRED)='DUMMY'
| PP
| TOKEN: Fragment $ o::*
} e: (^ FIRST)=! Fragment $ o::*;
(FRAGMENTS: (^ REST)=!).
This fragments rule produces a right-branching list of major categories. The TOKEN category represents any token. It uses the special -token lexical entry which matches any token (including those that already have a lexical entry). It appears in the lexicon something like this:
-token TOKEN * (^ TOKEN)=%stem.
In the worst case, XLE will only produce a list of tokens as the valid analysis.
Because of the epsilon constraints (e.g. e: (^ FIRST)=! Fragment $ o::*), each major category receives a dispreference mark named Fragment. This means that XLE will prefer analyses that have fewer major categories. The TOKEN category gets two Fragment marks, and so XLE will prefer a major category consisting of a single word over a TOKEN. Finally, categories that are missing information are completed. In this case, the VP is given a dummy subject.
The TOKEN category is necessary to guarantee that you can always get some analysis, but it doesn't have much useful information in it. In particular, it doesn't have any information about what part of speech the token might be. You can get this information by adding lexical categories to the FRAGMENTS rule, and filling them in with appropriate defaults. For instance:
FRAGMENTS --> { ...
| V: {(! SUBJ PRED)='DUMMY'}
{(! OBJ PRED)='DUMMY'}
{(! OBJ2 PRED)='DUMMY'}
{(! OBL PRED)='DUMMY'}
...
| ...
It is important for performance reasons that the reparse category build fragments using a list structure (e.g. like the FIRST and REST attributes like above). XLE cannot efficiently handle the large sets that would be generated if you used a set instead of a list structure.
So, a sentence like the the boy appeared might have a fragmented
f-structure like:
[ FIRST [ TOKEN the]
REST [ FIRST [ PRED 'appear<SUBJ>'