Chapter 3Modularity

Modularity is an experimental feature introduced in version 3.3.0. It’s inspired by Modular ixml presented by Steven Pemberton at MarkupUK in 2025, but some of the details are different. It’s likely, if the Community Group continues to work on modularity as a feature of iXML, that the design will change further. Places where this design differs from the design presented in Modular ixml are marked with “👀” to draw your attention.

The idea behind modularity is to enable reuse. As the number and complexity of iXML grammars grows, good software design practice encourages reusablity with a mechanism more reliable than “cut-and-paste” between grammars.

Conceptually, modularity constructs a new (monolithic) grammar that contains all of the necessary productions. It isn’t a textual inclusion of any kind. Grammars can use each other more-or-less arbitrarily.

Modularity extends the iXML grammar by adding uses and shares elements. These extensions are introduced by a “+” as shown in this example:

Example 3.1A modular grammar for numbers, numerics.ixml
+uses digit, hexdigit from "digits.ixml" .
+shares decimal, hexadecimal .
 
number = decimal | hexadecimal .
 
decimal = digit+ .
hexadecimal = hexdigit+ .

The uses clause says that this grammar uses the rules for digit and hexdigit from the digits.ixml grammar. It shares the rules for decimal and hexadecimal.

A grammar with an explicit shares clause is publishing its interface. Only the explicitly shared symbols may be used by another grammar. If there’s no shares clause, 👀 any symbol may be used by another grammar.

The effect of requesting modularity is to 👀 parse the input grammar against a variant iXML grammar that defines a module. The relevant productions are:

       module: s, (prolog, RS)?, (( ((uses; shares), RS)+, ixml) | ixml) .
         uses: -"+uses", RS, mfrom++(-";", s), s, -'.' .
       shares: -"+shares", RS, entries, s, -'.' .
   mfrom>from: entries, RS, -"from", RS, source.
     -entries: share++(-",", s).
        share: @name.
      @source: string .

As with a normal iXML grammar, the prolog comes first. In a module, that’s followed by optional uses and shares clauses that may be repeated, followed by the iXML itself. The 👀 grammar URIs are quoted and 👀 each clause ends with a full stop.

It is an error to define a rule for a symobol that is used from another grammar.

It is an error to attempt to use a clause from another grammar if that grammar does not share the symbol (or if no symbols are explicitly shared, if the grammar does not contain a rule the symbol).

To complete the example above, digits.ixml is shown in Example 3.2, “A modular grammar for digits, digits.ixml and partno.ixml is show in Example 3.3, “A modular grammar for part numbers, partno.ixml.

Example 3.2A modular grammar for digits, digits.ixml
+shares digit, hexdigit .
 
-digit = ["0"-"9"] .
-hexdigit = digit | hexadecimal .
-hexadecimal = ["A"-"F" | "a"-"f" ] .
Example 3.3A modular grammar for part numbers, partno.ixml
ixml version "1.1-nineml" .
 
+uses decimal from "numerics.ixml" .
 
partno: -decimal .

A 👀 modular grammar uses the renaming feature and must be in “version 1.1”. (This is an arbitrary limitation but reinforces the experimental nature of the feature.)

Observe that the nonterminal hexadecimal is used in two grammars and has different definitions. The processor is responsible for making sure that a correct grammar is constructed. The renaming feature, introduced recently in Invisible XML, allows the processor to rename nonterminals while preserving the correct serialization.

The composed grammar is included in the debugging output:

<module>
   <prolog>
      <version string="1.1-nineml"/>
   </prolog>
   <ixml>
      <rule name="partno">
         <alt>
            <nonterminal name="decimal" mark="-"/>
         </alt>
      </rule>
      <rule name="number">
         <alt>
            <nonterminal name="decimal"/>
         </alt>
         <alt>
            <nonterminal name="hexadecimal"/>
         </alt>
      </rule>
      <rule name="decimal">
         <alt>
            <repeat1>
               <nonterminal name="digit"/>
            </repeat1>
         </alt>
      </rule>
      <rule name="hexadecimal">
         <alt>
            <repeat1>
               <nonterminal name="hexdigit"/>
            </repeat1>
         </alt>
      </rule>
      <rule name="digit" mark="-">
         <alt>
            <inclusion>
               <member from="0" to="9"/>
            </inclusion>
         </alt>
      </rule>
      <rule name="hexdigit" mark="-">
         <alt>
            <nonterminal name="digit"/>
         </alt>
         <alt>
            <nonterminal name="_1_hexadecimal"/>
         </alt>
      </rule>
      <rule name="_1_hexadecimal" alias="hexadecimal" mark="-">
         <alt>
            <inclusion>
               <member from="A" to="F"/>
               <member from="a" to="f"/>
            </inclusion>
         </alt>
      </rule>
   </ixml>
</module>

Note how one of the hexadecimal nonterminals was renamed. Nonterminals are only renamed if there are collisions and the nonterminals in the “top level” grammar are never renamed.

Note

All of the rules from the three grammars are included in the composed grammar. This is a bug in the current implementation. It’s harmless, but nonterminals that are not reachable from shared symbols could easily be ignored during construction.