Appendix A. Unified change log
This appendix is a unified change log for all of the NineML components in this repository.
- 3.2.2, 24 Aug 2023
This release is mostly a large refactor of how the documentation organized, managed, and produced. There are also significant new sections about the CoffeeGrinder and CoffeeFilter APIs.
- CoffeeGrinder
Javadoc clarifications and improvements. Made parser options more consistent in parsers and results.
- CoffeeFilter
Javadoc clarifications and improvements.
- CoffeePot
Javadoc clarifications and improvements.
- CoffeeSacks
No significant changes; dependencies updated to latest version.
- 3.2.1, 06 Aug 2023
- CoffeeGrinder
Fixes a bug in prefix parsing.
- CoffeeFilter, CoffeeSacks, CoffeePot
No significant changes; dependencies updated to latest version.
- 3.2.0, 04 Aug 2023
- CoffeeGrinder
Reworked how the
Arborist
classes track and provide descriptions of how ambiguity was resolved.- CoffeeSacks
-
Fixed a bug in the construction of the graph that an
XPathAxe
gets to inspect. Nodes were inadvertantly being processed many (many!) times in ambiguous forests making the construction very slow. -
Changed the
XmlForest
used for resolving ambiguities in theXPathAxe
so that it includes nodes for intermediate states.
-
- CoffeePot
-
The
--describe-ambiguity
option was broken in the previous 3.x releases. It was a fair bit of effort to restore it. The descriptions have changed a bit and theapi-xml
option has been removed (the “API” flavor is returned by thexml
option). -
Added an
--axe
option and support for arandom
axe. -
Added a
--trim
option to trim leading and trailing whitespace off the input. (This can be handy if you have input in a file, your editor automatically adds a newline at the end of the file, and your grammar doesn’t support trailing whitespace.)
-
- CoffeeFilter
No significant changes; dependencies updated to latest version.
- 3.1.0, 01 Aug 2023
- CoffeeGrinder
-
Added support for marking horizontal ambiguities with processing instructions.
-
- CoffeeFilter
-
Added support for specifying an alternate start symbol for parsing. This is not conformant behavior, but is useful for converted grammars that may have otherwise unreachable states.
-
Added more checks for grammars provided in XML form. Removed some unused rules from the internal pragmas grammar.
-
Improved the way ambiguity is detected and reported.
-
- CoffeeSacks
-
Refactored the way parser options are constructed. There’s now a separate options object for each grammar and parser.
-
Cleaned up the names of the parser options that can be passed in the XPath options map.
-
Fixed a bug where the default Axe failed to report ambiguous choices if there were no selectors (XPath expressions or functions) provided at all.
-
- CoffeePot
-
Hygiene issues with the grammar are now logged at the debug level.
-
Added a
--start-symbol
option to select an alternate start symbol.
-
- 3.0.0, 29 Jul 2023
- CoffeeGrinder
-
Many improvements and bug fixes in the GLL parser. It is now often (slightly) faster than the Earley parser and should be just as reliable.
-
Refactored how trees are returned from the forest. In principle, it is possible to retrieve all of the trees.
-
When prefix parsing is enabled, it is now possible to continue the parse with a different parser.
-
Refactored tests to use JUnit 5 exclusively, many new and improved tests.
-
Refactored and improved how and where terminals can be matched with regular expressions. Regular expression matches work with both parsers.
-
Many improvements to the
forest2dot.xsl
stylesheet that produces forest diagrams. Added an option to analyze the grammar for ambiguity with Anders Møller’s ambiguity analyzer. (Note that the analyzer jar file is included in the repository, but it isn’t bundled with CoffeeGrinder, you need to add it to your class path. It is bundled with CoffeePot.)
-
Parser attributes (used internally to track marks and other features) have been renamed so that they’re all URIs. Added an attribute to track nonterminal priority. (To guide selection in ambiguous forests.)
-
Added an option to normalize line endings on input.
-
Added an option to mark ambiguities. When enabled, this option identifies places in the tree where an ambiguous choice was made. (It currently only works for “vertical” ambiguities between nonterminals.)
-
- CoffeeFilter
-
All of the NineML core components have been updated to version 3.0.0; CoffeeFilter incorporates the changes in CoffeeGrinder 3.0.0.
-
Added support for the renaming proposal provided that the grammar identifies itself as version 1.1 (or 1.1-nineml).
-
Refactored how pragmas are processed; removed the unused “token” pragma.
Removed the “rewrite” pragma; it can be replaced with standard features: deletion and insertion. Removed the default priority pragma because it didn’t actually seem useful.
Removed the notion of compiled grammars and the cache of compiled grammars.
-
Added an option to omit headers when generating CSV output.
-
Added support for the “mark ambiguities” option. When enabled
n:ambiguous="true"
attributes are added to the XML trees.
-
- CoffeeSacks
-
All of the NineML core components have been updated to version 3.0.0; CoffeeSacks incorporates the changes in CoffeeGrinder and CoffeeFilter 3.0.0.
-
The API for resolving ambiguities has been completely refactored to align with CoffeeGrinder.
-
- CoffeePot
-
All of the NineML core components have been updated to version 3.0.0; CoffeePot incorporates the changes in CoffeeGrinder, CoffeeFilter, and CoffeeSacks, 3.0.0.
-
Many new command line options and configuration properties have been added; a few have been renamed or removed.
-
CoffeePot will now work with either Saxon 11 or Saxon 12.
Undefined symbols are no longer allowed, even when not in pedantic mode. Undefined symbols are almost always an error and lead to very confusing error messages.
Added an option to analyze the grammar for ambiguity with Anders Møller’s ambiguity analyzer.
-
Support for compiled grammars and grammar compilation has been removed.
-
Fixed a number of small serialization bugs.
-
- 2.2.3, 15 Jun 2023
- CoffeeFilter
Fixed a typo:
setStrictAmbiguity
was setting the wrong internal flag.- CoffeeSacks
-
If a user-supplied function is called to choose an alternative, assume an unambiguous choice has always been made. Support the
strictAmbiguity
option to override this behavior. -
Support the
disablePragmas
andenablePragmas
options to selectively disable or enable individual pragma types.
-
- CoffeePot
-
Support
disable-pragmas
andenable-pragmas
in the options property file and--disable-pragma
and--enable-pragma
on the command line. -
Support
--graph-svg-option
on the command line to set options for the SVG graph output. -
Improved support for reporting infinitely ambiguous grammars. A loop that extends through several non-terminals will now be detected and reported if the parse tree generated attempted to traverse that loop. The processor doesn’t enumerate every possible parse, so it may fail to detect loops that are on branches it does not explore.
-
- 2.2.2, 15 Jun 2023
- CoffeeFilter
-
Added options to enable and disable pragmas. Individual pragmas can be enabled or disabled by name. Specify the name
#all
to enable or disable them all. -
Added a
strictAmbiguity
option. IfstrictAmbiguity
is true, then a grammar will be marked ambiguous even if user-supplied priorities uniquely determined every outcome. Added an event builder option to track whether or not any ambiguous choices were made. -
Added a warning message if a priority pragma is applied to a literal. Parsing doesn’t distinguish between different occurrences of literals so the priority is unavailable when resolving ambiguities.
-
- CoffeeSacks
Never released.
- CoffeePot
Fixed bug where CoffeePot had a runtime dependency on Saxon EE. (It can use Saxon EE, and EE is required for loading dynamic function libraries, but EE isn’t required for basic functionality.)
- 2.2.1, 17 May 2023
- CoffeeGrinder
Fixed a small bug in the presentation of alternative parses when there is ambiguity. Nonterminal symbols in the alternative choices did not necessarily have the correct attributes.
- CoffeeSacks
Further refined the way XML is constructed for choosing alternatives. The “left” and “right” sides of each alternative are now sorted so that the “left” alternative always matches input tokens that precede the “right” alternative. The CoffeeGrinder update also assures that the attributes associated with nonterminal symbols are correct.
- CoffeePot
Fixed bug where choices were not always presented in a logical order (earlier matches before later matches) when displaying alternatives. Updated CoffeeGrinder and fixed stylesheet so that marks on nonterminals are included when describing ambiguity.
- CoffeeFilter
No significant changes; dependencies updated to latest version.
- 2.2.0, 06 May 2023
- CoffeeGrinder
-
CoffeeGrinder now carefully distinguishes between nonterminals with different attributes. This introduces new nonterminals into the grammar. These can be examined by calling
resolveDuplicates
on theSourceGrammar
.If, for example, you have two instances of a nonterminal “B” in the grammar, where one has a mark attribute of “^” and the other a mark attribute of “@”, after resolving duplicates there will be two nonterminals in the grammar, “B” and “B₁”. They will match the same inputs, but is now possible to distinguish between them in the parse forest.
One particular use for this feature is the priority attribute supported by CoffeeFilter. This attribute allows the grammar author to associate priorities with nonterminals in ambiguous grammars to guide the parse.
-
All of the infrastructure associated with “pruning” nonterminals that lead to ε has been removed. (It hasn’t actually be used for at least a couple of releases.)
-
- CoffeeFilter
-
Leveraging changes to the CoffeeGrinder implementation, the priority pragma now works correctly.
-
References to “pruning” nonterminals has been removed (because it’s been removed from CoffeeGrinder).
-
- CoffeeSacks
Updated the way the choose alternative function is called to assure that the first element in the list is always the current “best” choice.
- CoffeePot
Changed the --show-grammar option to display the grammar after duplicates have been resolved. Also changed the display to include marks. Documented the
priority
pragma. Removed thecombine
andregex
pragmas from the documentation; I’m not convinced they work correctly.
- 2.1.0, 23 Apr 2023
- CoffeeGrinder
Updated the API for choosing among alternatives to supply the immediate context.
- CoffeeSacks
Added support for providing a function to choose among alternative parses.
- CoffeePot
Added support for choosing between ambiguous parses using either an extension function or XPath expressions.
- CoffeeFilter
No significant changes; dependencies updated to latest version.
- 2.0.3, 15 Apr 2023
- CoffeeFilter
Corrected the encoding of the C0 and C1 control characters. Changed the encoding of
>
so that it is always encoded as>
. (This assures that the output will never accidentally enclude]]>
which marks the end of a CDATA section.)- CoffeeSacks
-
If a UTF-8 grammar or input file begins with a byte-order-mark (BOM), the BOM is ignored. Set the “ignore BOM” option to
false
to disable this behavior. -
Updated to use CoffeeFilter version 2.0.3 which supports ignoring the BOM and fixes errors in the serialization fo control characters (that’s not actually relevant to CoffeeSacks which doesn’t serialize the results).
-
The build system has been updated to use Gradle version 8.0.2.
-
- CoffeePot
-
Updated to use CoffeeFilter version 2.0.3 which corrects errors in the serialization of control characters in attributes and text content (and changes the serialization of
>
to always be>
). -
If a UTF-8 grammar or input file begins with a byte-order-mark (BOM), the BOM is ignored. A new configuration property
ignore-bom
can be set tofalse
to disable this behavior. -
The build system has been updated to use Gradle version 8.0.2.
-
- 2.0.2, 14 Apr 2023
- CoffeeFilter
Fix encoding of characters per the XML Output Method of XSLT and XQuery Serialization 3.1.
- 2.0.1, 13 Apr 2023
- CoffeeFilter
-
Fixed #98. Added an option to ignore the Unicode BOM on UTF-8 files. It’s enabled by default.
-
Resolved #97. Improved the encoding of carriage returns in the output. If a carriage return in the output is not followed by a line feed, it will be encoded as
&#d;
so that an XML parse of the output won’t normalize it away. -
Fixed #96. Improved error reporting. In the case where a parse fails with an unexpected character, if the character is not a visible ASCII character, the output includes the codepoint of the character.
-
Tinkered with the GitHub branch workflow so that it won’t attempt to use the SSH private key if it isn’t configured as a secret.
-
- 2.0.0, 10 Apr 2023
Making the 2.x code base the current release.
- 1.99.1, 17 Jun 2022
- CoffeeGrinder
Second pre-release with GLL support; substantial refactoring of the internals and a new API for getting trees from the parse forest.
- 1.99.0, 08 Jun 2022
- CoffeeGrinder
First pre-release that included GLL parser support. Updated to the Invisible XML 1.0 grammar.
- 1.1.0, 16 Apr 2022
- CoffeeGrinder
Internal changes to support the 15 April 2022 Invisible XML “insertions” feature, version 1.1.0
- CoffeeFilter
Support for the 15 April 2022 specification, version 1.1.0
The most significant changes are:
The “repeat0” and “repeat1” separator characters are now doubled: “
item*','
” becomes “item**','
”, and “item+','
” becomes “item++','
”.The semantics of “^” before a literal have changed. It now signals an insertion. The grammar fragment “
'a', ^'-', 'b'
” matches an “a” followed immediately by a “b”, but the XML output will be “a-b”. The text marked “^” matches nothing but is inserted in the output. The insertion character may change.At least one whitespace character or comment is required between rules. (This is very unlikely to have any practical consequence since most grammar authors start a new rule on a new line. But where “
a:'1'.b:'2'.
used to be allowed, you must now write “a:'1'. b:'2'.
. This avoids an ambiguity in the grammar.)
- CoffeeSacks
Support for the 15 April 2022 specification, version 1.1.0
- CoffeePot
Support for the 15 April 2022 specification, version 1.1.0
The most significant changes are:
The “repeat0” and “repeat1” separator characters are now doubled: “
item*','
” becomes “item**','
”, and “item+','
” becomes “item++','
”.The semantics of “^” before a literal have changed. It now signals an insertion. The grammar fragment “
'a', ^'-', 'b'
” matches an “a” followed immediately by a “b”, but the XML output will be “a-b”. The text marked “^” matches nothing but is inserted in the output. The insertion character may change.At least one whitespace character or comment is required between rules. (This is very unlikely to have any practical consequence since most grammar authors start a new rule on a new line. But where “
a:'1'.b:'2'.
used to be allowed, you must now write “a:'1'. b:'2'.
. This avoids an ambiguity in the grammar.)
- 1.0.0, 20 Mar 2022
Initial release, version 1.0.0