Prolog Transfer Rule Notation

This section describes the original prolog syntax for transfer rules, which has now been superceded by a more forgiving notation.  To use the old prolog syntax, just ensure that the first non blank line of the rule file is NOT of the form
" PRS (1.0) "
which is a comment line used by the new syntax to indicate which version of the notation is being used.  The following is taken directly from the original transfer documentation.

Transfer Facts

The first thing to consider is the nature of transfer facts, and then how to convert f-structures to transfer facts.  For anyone familiar with the intricacies of prolog syntax, it is sufficient to say that a transfer fact is just a ground prolog term.  But for most people this is not an entirely helpful characterization.  So to be more concrete, transfer facts typically comprise a predicate, an opening parenthesis, a comma separated list of arguments, and a closing parenthesis, e.g.
predicate(argument1, argument2, argument3)
Prolog imposes some restrictions on the way that predicates and arguments can be written
  1. The predicate must be an atom.  That is, start with a lower case letter, and (to a first approximation) consist of only letters, numbers and underscores.  Spaces are not allowed in predicate names, and nor are hyphens.  Anything beginning with an uppercase letter or an underscore is taken to be a variable, and variables (i) are not allowed anywhere in transfer facts, and (ii) are not under any circumstances allowed to stand in the place of predicates. (We will see later that variables can occur in transfer rules, where they allow you to express patterns over facts; they are just not allowed in the facts to which the rules apply).
    Examples of valid predicates are:  subj, obj_theta, myPredicate3, x
    Examples of invalid predicates are: SUBJ,  2obj,  obj-theta,  contains space, complex(pred), _x
  2. There can be no space between a predicate name and the open-parenthesis that precedes its first argument.
  3. Arguments can be atoms, they can be integers, and they can also be compound, fact-like expressions.  An example of a fact with compound arguments is:  subj(var(0), var(3)).  The arguments must be separated by commas, and parentheses must balance.
  4. A special form of compound argument is a list.   This comprises a sequence of comma separated arguments between square parentheses, e.g. [1, 2, 3]
  5. If it is really necessary to have atoms that start with upper case letters or underscores, or contain symbols besides letters numbers and underscores, then it is possible to do so provided that the atom is enclosed in single quotes.  Thus 'This is a valid predicate!' is a valid predicate.
    Any sequence of characters that is enclosed is single quotation marks is a literal. Such a sequence can even contain a single quotation mark by writing two adjacent such marks for each one that the string should contain.  The same convention must be used if it is desired to include a back-slash in the name of a literal. For example:
    1. '''' (4 characters) is the literal whose name consists of a single quotation mark.
      '"' (3 characters) is the literal whose name consists of a double quotation mark.
      'let''s' is the literal consisting of the work "let" followed by an apostrophe-s.
      'stop\\go' is the literal consisting of the words "stop" and "go" separated by a back-slash.
  6. Something that can be very confusing is that the number 3 and the atom '3' are distinct and non-equal.  The latter can be the predicate of a fact, the former cannot.   To make matters worse, quoted atoms that conform to the restrictions in point (1) are identical to the unquoted atoms.  Thus 'atom' is identical to atom. This does however mean  that, apart from quoting bare numbers, you are generally safe if you err on the side of caution and use quotes widely. 

Converting F-Structures to Transfer Facts

Let us now consider how the following f-structure can be represented as a set of transfer facts.

The following is the corresponding list of transfer facts: pred(var(19),sleep),                The outermost structure

anim(var(2),'+'),                    <SUBJ>
mood(var(3),indicative),            <TNS-ASP>

proper(var(4),name)                <SUBJ, NTYPE>

First note how the facts conform to the notational restrictions stated above.  In particular, troublesome atomic values like 'Mary', '+' and '-' are all enclosed in single quotes, and all the predicates start with lower case letters and contain no hyphens. Note also that value of the pers attribute of var(2) is the atom '3' and not the number 3. Throughout, it can also be seen that attribute names have been systematically lower cased and have had hyphens replaced by underscores.  Thus TNS-ASP becomes tns_asp.  

Node Indices and Attribute-Value Pairs

To see how these facts correspond to the f-structure, it is first necessary to grasp the convention lying behind the use of var(n) arguments.  These are to be interpreted as standing for f-structure nodes / indices.  Thus the outermost node/index labeled 19 in the f-structure is represented by var(19),  and the value of the SUBJ attribute labeled as 2 is represented by var(2).  There are two other f-structure nodes that do not receive explicit labels in the graphical representation, namely the values of the TNS-ASP and NTYPE attributes.  These are assigned the indices var(3) and var(4) in the transfer facts. (Unfortunately, in practice the numeric labels assigned to the graphical f-structure representations rarely if ever correspond so directly to the numeric values assigned to the indices in the transfer facts.  Thus label 19 might map onto a transfer index var(0) and label 2 might map onto an index var(1).  This mismatch is the result of a similar labeling mismatch between graphical and prolog representations of f-structures. For ease of exposition, in this example we assume that 19 maps onto var(19), and so on.)

Looking at all the facts with var(19) as their first argument, we can see that most of them correspond directly to attribute value pairs in the f-structure.   The fact that f-structure 2 is the value of the SUBJ attribute of 19 in the f-structure is represented as
 subj(var(19), var(2))
The fact that the value of the PASSIVE attribute of 19 is - is represented as
passive(var(19), '-')
The fact that the valueTNS-ASP attribute of 19 is a complex structure (indexed var(3)) is represented as
tns_asp(var(19), var(3))

Representing Semantic Forms

The pred, arg and sf_id facts do not each correspond to an attribute-value pair.  Instead, they decompose the single PRED-SemanticForm attribute-value pair into individual components.  A semantic form consists of four elements: the predicate (e.g. sleep); a unique semform identifier (not explicitly shown in the graphical f-structure representation, but present nonetheless); an ordered sequence of thematic arguments; and a (possibly empty) sequence of non-thematic arguments.  The attribute-value pair for 19
PRED      sleep'<[2:Mary]>'
In the prolog f-structure file that the parser writes and which provides input to the transfer system, this would be written as
eq(attr('PRED, var(19)), semform(sleep, 3, [var(2)], []))
where the four arguments to the semform are the predicate, the semform id, the list of thematic arguments, and the (empty) list of non-thematic arguments.  This gets broken down into three transfer facts,
pred(var(19), sleep)
says that the PRED FN of 19 is sleep.
sf_id(var(19), 3)
says that the semform id of 19's semform is 3
arg(var(19), 1, var(2))
says that the first thematic argument of 19's semform is 2.  Note that this fact has three arguments, rather than the usual two.

Two more complicated examples of semantic forms and their corresponding transfer facts follow. The f-structure of the sentence "Place the scanner on the table" contains the semantic form In the prolog f-structure file that the parser writes,  this appears as (assuming no renumbering of indices):  In the transfer system, this is represented as follow
The three thematic arguments each give rise to a separate arg fact, with the numbers 1, 2 and 3 indicating which are the first, second and third argument.

The main clause of the sentence "To replace the print head, you need to perform maintenance tasks.", has one argument and one non argument, as follows:

Notice that the second argument appears outside the angle brackets.  The relevant transfer facts are the following:

Here, non_arg is used to indicate the (first and only) non-thematic argument.  Semantic forms without any thematic and/or non-thematic arguments are indicated by the absence of any arg and/or non_arg facts


The simple example of Mary sleeps does not illustrate the representation of set-valued attributes. Consider the sentence Mary saw the big black dog which is assigned the following structure:
Notice that the ADJUNCT of the OBJ of the sentence is a structure enclosed in curly brackets, the standard representation for sets.  In the transfer system, sets are represented by the in_set predicate. For the above example, it would be like this We can see that the value of 49's ADJUNCT attribute is a set value, that we have indexed as var(5). There are two items in the set, namely var(59) and var(68).  LFG contains a special device for representing scope relations that would otherwise be lost, or reconstructable only as the result of a more or less complex computation based on the c-structure of the sentence.  The f-structure displayed above contains an example.  The f-structures for "big" and "black" occur together as members of a set and are unordered except for an attribute shown as ">s" in the structure for "big" which has the structure of "black" as its value. In this case, we say that "big" outscopes "black" because "big" precedes  "black" in the English string.  If we continue to assume that var(59) and var(68) are the indices of "big" and "black in the representation used in the transfer component, the representation for the scope relation is scopes(var(59), var(68)).

Basic Transfer Rules

We have seen the transfer facts that are derived from the f-structure for the sentence Mary sleeps. For reference in the following example, once again these facts are

We now consider rules for rewriting these facts.  In this section we only discuss the basic constructs of transfer rules.  More advanced topics are discussed later.

Let us jump in, and look at the following (somewhat contrived) transfer grammar

% Give the transfer grammar a name:
grammar =  simple_example.

% Rule 1: Rewrite the verbal pred "sleep" to "dormir"

pred(X, sleep), +vtype(X, main) ==> pred(X, dormir).

% Rule 2: Delete the progressive attribute if present tense

+tense(X, pres), prog(X,_) ==> 0.

% Rule 3: Remove the indicative mood attribute for declarative
%        statements, and replace "declarative" by "decl"

stmt_type(X,declarative), +tns_asp(X,TA), mood(TA,indicative) ==> stmt_type(X,decl).

% Rules 4 and 5: Rewrite "Mary" to either "Marie" or "Maria"

pred(X,'Mary') ?=> pred(X, 'Marie').

pred(X,'Mary') ==> pred(X,'Maria').

Note that % is the comment character; anything on a line that follows a % will be treated as a comment.

The first thing that a grammar must do is give itself a name.  This is done at the top of the file by the statement
grammar = <Atomic Name>.
where <Atomic Name> must be a valid atom.  Note also that this statement must be terminated by a period, as must the rules that follow it. Further note that the rule file does not need to be ended with any special symbol.

Variables in Patterns

The first rule rewrites the English semantic predicate "sleep" as the French predicate "dormir".  There are a number of things to notice about this rule.  The first is that it contains variables.  A variable is written as an (unquoted) atom that starts with either an uppercase letter or an underscore. Thus X, TA and _ in the rules above are all variables.  Variables are used to match arguments, or parts of arguments, in transfer facts.  Thus the rule pattern pred(X,sleep) matches the transfer fact  pred(var(19), sleep), setting the variable X to the value var(19).   All other occurrences of the variable Xin the same rule,  will get instantiated to the value var(19) as a result of the match. (The scope of a variable is limited to a single rule: Occurrences of the same variable in different rules are not linked, and instantiating a variable in one rule will not affect any of the variables with the same name in other rules.)

The rule itself consists of a left hand side, a rewrite arrow, and a right hand side.  The left hand side is a comma separated sequence of patterns that are intended to be matched against individual transfer facts.  The rewrite arrow can be either ==> (obligatory rewrite) or ?=>  (optional rewrite). For an obligatory rule, if all the patterns on the left hand side can be matched against facts, then the (instantiated) patterns on the right hand side must be added to the set of transfer facts.  For an optional rule, there is a choice: either apply the rule or not.  This choice has the effect of forking the transfer rewriting process along two separate and independent paths.   But we will only consider the effects of optional rules when we come to consider rules 3 and 4 in the example.

To see how rule 1 operates, let us match the first pattern on the left hand side with the transfer fact pred(var(19), sleep). This has the effect of instantiating the rest of the rule as shown (with the already matched pattern shown italicized)
pred(X, sleep), +vtype(X, main) ==> pred(X,dormir)
pred(var(19), sleep)
to give
pred(var(19), sleep), +vtype(var(19), main) ==> pred(var(19), dormir)
Note how all occurrences of X are instantiated to var(19).  

Consumption of  Transfer Facts

The second pattern on the left hand side of rule 1 is now an exact match with the input fact vtype(var(19), main).  The significance of the + sign preceding the pattern is as follows.  Normally, when a fact matches a pattern on the left hand side of a rule,  that fact gets used up and is removed from the set of transfer facts. (Or rather, it is removed once all the other patterns on the left hand side have been successfully matched and the rule is applied). Thus by matching the fact pred(var(19), sleep) against the pattern in the rule, this facts is taken out of the input set of transfer facts.  The + sign preceding a pattern means match against a fact, but don't use the fact up.  That is, retain it in the set of transfer facts. So, after matching the second pattern, we have a complete match of the left hand side, and know that application of the rule will remove pred(var(19), sleep) but keep vtype(var(19), main) in the set of facts.

The right hand side of the matched rule now serves as an instruction to add a new transfer fact to the set of facts, pred(var(19), dormir).   Thus, if we look at the input facts affected by this rule
we can see that after applying the rule the relevant output facts are
That is, we have removed pred(var(19), sleep) and replaced it by  pred(var(19), dormir).

Anonymous variables and empty RHSs

This modified set of transfer facts now serves as input to the second rule.
+tense(X, pres), prog(X,_) ==> 0.
The first pattern matches the fact tense(var(3), pres).  The second pattern, now instantiated to prog(var(3), _),  matches the fact prog(var(3), '-').    The single underscore is an anonymous variable.   This matches in the same way as an ordinary variable, but does not lead to any instantiation of the variable.  Thus multiple occurrences of _ in a rule can match with different items.  The effect of the pattern in this rule says find the progressive attribute for var(3), but we don't care about the value of  the attribute represented as _.

The use of anonymous variables to represent "don't care" values is strongly encouraged.  When the rule compiler finds an instance of a normal variable that has just a single occurrence in a rule, it issues a warning message naming the offending singleton variable.  If you are consistent about the use of anonymous variables to represent "don't care" values, these singleton variable warning messages are a useful way of detecting possible typos in your rules.   A very common mistake is mistyping variable names, e.g. SUBJ and Subj when they are both intended to refer to the same item.  Such typos usually result in at least one the variables being singleton, and the warning message will alert you to this.  However, if you consistently use singleton non-anonymous variables for "don't care" values, messages that might alert you to the presence or typos will get drowned out in a flood of  innocuous warnings.   A variable name beginning with an underscore, e.g. _temp,  is a non-anonymous singleton variable. The rule compiler will not complain about singleton occurrences of such non-anonymous singletons.  But multiple occurrences of these variables within a rule are treated as linked.

The right hand side of the second rule, 0, is how we say that no new facts are to be added as the result of applying the rule. Thus, applying the rule means that we match and keep the fact  tense(var(3), pres)and match and discard the fact prog(var(3), '-').  In other words, the rule removes the single fact prog(var(3), '-') from the list of transfer facts, and passes the updated set on as input to the next rule.

Rule 3 is an example of using an intermediate variable, TA, to link structures.
stmt_type(X,declarative), +tns_asp(X,TA), mood(TA,indicative) ==> stmt_type(X,decl).
The rule should apply if  X has indicative mood, as manifested by X's tns_asp attribute having a mood attribute whose value is indicative.  We use TA to link the mood to the statement type via the intermediate tns_asp structure.   The effect of the rule is to remove the MOOD attribute (if indicative)  from the the TNS-ASP of any declarative statement while keeping the rest of TNS-ASP in place (thanks to the + preceding the pattern), and also reformatting "declarative" as "decl".

This one of those cases where misspelling TA would be a problem.  If instead we had accidentally broken the links by writing
... +tns_asp(X,TA), mood(T_A,indicative)...
the rule would instead say (i) remove all indicative mood attributes, whether or not they belong to belong to declarative statements, just so long as there is a declarative clause somewhere in sentence, and (ii) so long as there is an indicative clause somewhere in the sentence, rewrite all statement types of "declarative" to "decl".   However, if we had written this, a warning about TA and T_A being singleton variables would have alerted us to the broken link.

Converting Transfer Facts to F-Structures

When converting f-structures to transfer facts, nearly all f-structure attribute-value pairs correspond to binary f-structure facts, e.g.
19:[SUBJ    2:[...]]
subj(var(19), var(2))
The same is true when converting from transfer facts back to f-structures: most binary facts correspond directly to attribute-value pairs.  The exception, in both directions, is when it comes to dealing with semantic forms.  As we noted before, semantic form values of the PRED attribute are decomposed into pred, sf_id, arg and non_arg facts.  Thus when converting back to f-structures, these facts must be re-assembled to construct a PRED-SemanticForm attribute-value pair.  In most cases, this is just a simple inverse of the conversion from f-structures to transfer facts.   But sometimes transfer rules alter the basic argument structure of semantic forms, and in these cases the conversion back can be more involved.  We have just seen one example of how a transfer rule can alter argument structure, when adding a pronominal object in translating intransitive "know" into transitive "savior".  Arguments can also be removed, with passivization being a common case.

Added Semantic Form Arguments

The most common mistake in rules adding a new argument to a semantic form is to neglect the arg and non_arg transfer facts.  The rule we previously gave for intransitive know -> transitive savoir (repeated below) commits this error.
pred(X, know), subj(X, Subj) ==>
   pred(X, savoir), subj(X, Subj), obj(X, Obj),
   pred(Obj,pro), pers(Obj, 3), number(Obj, sing), case(Obj, acc).
The rule creates a new object for the verb, but it does not include the new object in the list of the verb's thematic arguments. Grammar writers may have an implicit obliqueness hierarchy in mind, so that e.g. objects always correspond to the second thematic argument in semantic forms.  But no such hierarchy is hard-wired into the conversion from transfer facts back to f-structures; it needs to be made explicit.   Thus the correct rule should be
pred(X, know), subj(X, Subj) ==>
   pred(X, savoir), subj(X, Subj), obj(X, Obj),
   arg(X, 2, Obj),
   pred(Obj,pro), pers(Obj, 3), number(Obj, sing), case(Obj, acc).
where the new object has been explicitly included as the second thematic argument.  It is not necessary to say anything about the subject being the first thematic argument.  The input arg(X, 1, Subj) fact will, if nothing is done to consume it, be passed through transfer to preserve the required output fact.  The same is true of the input sf_id fact; provided nothing consumes or alters the fact, the French semantic form will have the same semform id as the English semantic form.

If the additional arg pattern is not included in the rule's right hand side, a semantic form will still be created for the French output.  However it will produce a semantic form like
PRED      savoir'<[2:je]>'
where the presence of the object grammatical function is not reflected in the semantic form.  Striclt speaking, there is nothing wrong with semantic forms like this except that the XLE will not be able to generate from it because the OBJ is ungoverned.

Added Semantic Forms

The know-savoir rule, as well as introducing an additional thematic argument, in fact creates a whole new node and semantic form for the pronominal object.  What is the name of this new f-structure node, what is the semform id of its semantic form, and what are its thematic and non-thematic argument lists?  The rule does not appear to say anything about this.

First, the thematic and non-thematic arguments.  Because the rule does not add any arg or non_arg facts for the object, these lists are taken to be empty when converting back to f-structure.  To create non-empty argument lists we would have to explicitly add arg and non_arg facts.

Second, new node names.  Note how the rule contains a variable, Obj, on the right hand side that does not occur on the left hand side.  Whenever a new variable is introduced on the right hand side of a rule, it will be instantiated to a brand new constant of the form var(N), where N is an integer that does not clash with any previously encountered f-structure node number.  This instantiation, unlike the next, is performed by the transfer system when the rule is applied.

Third, new semform ids.  When composing PRED-SemanticForm attribute-value pairs, the transfer fact to f-structure conversion first looks for any facts of the form pred(X, P)and collects all the sf_id, arg and non_arg facts pertaining to X. If there is no sf_id fact for X, then the conversion process will create one, using a brand new numerical identifier that does not clash with any previously encountered semform ids.

Deleted Semantic Form Arguments

Deleting an argument generally requires removing all facts pertaining to the argument.  Given the way that semantic forms are recomposed, it is not strictly necessary to remove the  sf_id, arg and non_arg facts that constitute the removed argument's semantic form, provided that it's pred fact is removed. However, you should take care to remove arg and non_arg facts from any higher level semantic forms that included the deleted item as an argument.   This removal can be problematic if the deleted item was not the last argument in the list.  This is because the numbers of all the arguments following the deleted item should be reduced by 1 to take account of the deletion.  This can be very cumbersome to express with the current transfer rule formalism.

One special case has been taken care of, however. Passive verb phrases without an agentive by-phrase usually give rise to a semantic form where NULL is the first thematic argument, and the subject is the second argument.  A passivization transfer rule might thus delete all facts to do with the active subject, including the fact that it was the first thematic argument to the active verb.  Rather than renumbering subsequent arg facts, or including an explicit arg(X, 1, 'NULL') fact, it is sufficient just to delete the arg fact for the active subject.

When recomposing semantic forms, whenever there is a (non)_arg(X, N+1, Arg) fact, but no (non)_arg(X, N, Arg) fact, then a (non)_arg(X, N, 'NULL') fact will automatically be created. So, by just deleting an active subject, we will get the required NULL first argument in the semantic form.  However, NULL values will also be given to all arguments missing from the middle of a list, which may not be what you want.

Unconvertible Facts

It is possible to write rules that produce transfer output that cannot be converted back to f-structures.  Apart from arg and non_arg facts, only binary facts can be converted to attribute value pairs.   For unary and n-ary facts (n>2), rather than have conversion fail, dummy attribute-value pairs of the form
eq(attr(null, '$unconvertible_attribute'), <Fact>)
will be included in the f-structure, where <Fact> is just the offending transfer fact.   To generate from such f-structures, you will need to remove or otherwise manipulate these dummy pairs.

Redefining Notation

The Prolog programming language allows for certain predicates to be written in infix, prefix, or postfix notation, and this can be exploited in the design of templates.  Armed with the knowledge that '->' is an infix operator in Prolog, one might make the following template definition to make ->  into a symbol that we can use to express simple word-for-word substitutions:

Having defined this template, we can now write rules like
man -> homme.
woman -> femme.
These will expand to the rules
pred(X,man) ==> pred(X,homme).
pred(X,woman) ==> pred(X,femme).
Thus we have effectively extended the core transfer notation to include a new, user-defined rule construct, ->.   While this can be a very powerful and flexible tool, it can also be very confusing to people reading the rules for the first time.   If they miss the template definition, they are liable to start combing this documentation in vain to try and work out what the symbol is supposed to mean.  Moreover, to define such new infix symbols, you must have enough knowledge of prolog to either know which operators are already declared as infix or postfix, or to know how to declare them as such.  All in all, defining new symbols via templates is best avoided, though the systematic use of -> for obligatory pred to pred translation might be defensible as a commonly known rule writing idiom.

And if you really want to confuse people who are trying to read your rules, the following is a possibility. A template or macro call is taken to invoke the most recent corresponding definition in the grammar. Templates and macros can be redefined at any point in the grammar, and the new definition takes effect as soon as it is encountered. When a macro or a template is redefined, this causes an implicit redefinition of any other template or macro in whose definition it partakes, directly or indirectly. Suppose that a grammar contains the following, after having defined -> as above: This might allow "easy to see" and "ready to carry" to be translated as "facile à voir" and "prête à porter", respectively, but "sure to fall" as "sûr de tomber", by virtue of the fact that the redefinition of the pred(_) macro effectively redefines the '->' template.  The model that one needs of this process is simply that all definitions are explored completely, from the top down, every time they are used.

The use of such tricks to redefine notation is discouraged.  Defining the -> rule format for simple word-to-word substitutions is probably worthwhile.   But defining and then re-defining templates and macros tend to make rule sets obscure to anyone but the person who wrote them. Introducing a  whole variety of new rule symbols to accompany -> also tends to obscurity, and generally requires a prolog expert to do it.