Chapter 6Choosing among alternatives

Where it’s practical to write a grammar that is unambiguous, that’s best. But it isn’t always practical. Sometimes it’s difficult, and sometimes it’s impossible. If the data is actually ambiguous, you may have to reflect that in your grammar.

Invisible XML doesn’t consider ambiguity an error, but it also doesn’t provide any mechanism for controlling it. All parses are considered equal and the processor’s only obligation is to provide one of them.

Consider this simple, ambiguous grammar:

   number-list = (number, -#a)+, number? .
        number = hex | decimal .
           hex = hex-digit+ .
       decimal = decimal-digit+ .
    -hex-digit = ["0"-"9" | "a"-"f" | "A"-"F" ] .
-decimal-digit = ["0"-"9" ] .

If we parse the following input,

bad
cafe
42

We might get this result:

<number-list xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
  <hex>bad</hex>
  <hex>cafe</hex>
  <hex>42</hex>
</number-list>

Of course, we might equally get this result:

<number-list xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
  <hex>bad</hex>
  <hex>cafe</hex>
  <decimal>42</decimal>
</number-list>

The ambiguity here is between “decimal” and “hexidecimal”. You may wish to control which one is selected. CoffeePot provides two ways to examine the alternatives and select one: you can provide a function library that defines a function that chooses between them, or you can provide a list of XPath expressions to select an alternative. Using a function library requires Saxon PE or Saxon EE.

If both a function library and XPath expressions are provided, the function library is used first and the expressions are only used if the function library does not select an alternative.

6.1Using a function library

To use a function library, you must provide an XSLT stylesheet or XQuery module that defines a function named choose-alternative in the namespace https://coffeepot.nineml.org/ns/functions. The function must two parameters: an element and a map. It must return a map.

Here is an example of an XSLT function library that will always select the decimal alternative:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:cp="https://coffeepot.nineml.org/ns/functions"
                exclude-result-prefixes="#all"
                version="3.0">

<xsl:function name="cp:choose-alternative" as="map(*)">
  <xsl:param name="context" as="element()"/>
  <xsl:param name="options" as="map(*)"/>

  <xsl:variable name="choice"
                select="$context/children[symbol[@name='decimal']]/@id"/>

  <xsl:sequence select="map { 'selection': $choice }"/>
</xsl:function>

</xsl:stylesheet>

For example:

$ coffeepot -g:numbers.ixml -i:numbers.txt \
            --pretty-print --function-library:numbers.xsl
Found 2 possible parses.
<number-list xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
   <hex>bad</hex>
   <hex>cafe</hex>
   <decimal>42</decimal>
</number-list>

An alternative function library that always selects hexidecimal could be written in XQuery this way:

module namespace f = "https://coffeepot.nineml.org/ns/functions";

declare function f:choose-alternative(
  $context as element(),
  $options as map(*)
) as map(*) {
  map {
    'selection': $context/children[symbol[@name='decimal']]/@id
  }
};

For example:

$ coffeepot -g:numbers.ixml -i:numbers.txt \
            --pretty-print --function-library:numbers.xqy
Found 2 possible parses.
<number-list xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
   <hex>bad</hex>
   <hex>cafe</hex>
   <hex>42</hex>
</number-list>

For a more complete discussion of how such functions can be written and what is available in $alternatives, see Chapter 7. Choosing among alternatives in the CoffeeSacks documentation.

6.2Using XPath expressions

An alternative to using a function library is to specify one or more XPath expressions. For each place where ambiguity occurs, each expression is evaluated with each alternative as the context item. The first expression that has an effective boolean value of “true”, selects the alternative to which it was applied.

For example:

$ coffeepot -g:numbers.ixml -i:numbers.txt \
            --pretty-print --choose symbol[@name='decimal']
Found 2 possible parses.
<number-list>
   <hex>bad</hex>
   <hex>cafe</hex>
   <decimal>42</decimal>
</number-list>

Observe that because a unique choice was made, the result is not marked as ambiguous. You can override that with the --strict-ambiguity option:

$ coffeepot -g:numbers.ixml -i:numbers.txt \
            --pretty-print --strict-ambiguity --choose symbol[@name='decimal']
Found 2 possible parses.
<number-list xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
   <hex>bad</hex>
   <hex>cafe</hex>
   <decimal>42</decimal>
</number-list>