Chapter 2Running CoffeePot

CoffeePot can be run directly from the jar file with a command like:

java -jar coffeepot-.jar options

It may be more convenient, however, to write a small batch file or shell script to run it with an explicit classpath, Chapter 7, Configuration.

In the examples that follow, we assume that “coffeepot” will run CoffeePot. Depending on how you’ve installed it, you may have to substitute java -jar /path/to/coffeepot.jar or some other variation.

2.1The command line

Typical usage is:

coffeepot [--pretty-print] [--parse:number] [--grammar:file] [--output:file] [ | [--input:file] | input ]

(For a quick recap of all the possible options, run CoffeeJar with the --help option.)



Specifies that XML output should be formatted with line breaks and indentation. Note: the built-in pretty printer is a little bit crude, especially in the face of mixed content.


Selects a specific parse. If the parse was ambiguous, use this option to inspect alternate parses. (If the parse wasn’t ambiguous, there’s only one.) You can also use --parse-count to display more than one parse.


Specifies the input grammar. If unspecified, the Invisible XML specification grammar is used. The grammar specified may be an Invisible XML grammar in either the text or XML formats.


Specifies the output location, defaults to standard output.


Specifies the input file. If unspecified, the remaining command line arguments are taken as the input, separated by single spaces.

The following additional options are also available:


Applies Anders Møller’s ambiguity analyzer to the grammar. This can help to identify where ambiguity occurs and what inputs will trigger it. The analyzer operates on the transformed grammar.

Note that the analyzer can be quite slow if the grammar uses Unicode character classes. This is especially true if it uses classes that contain a large number of characters such as [L] or [LC].


Checks if the grammar is a “plain BNF” grammar. This doesn’t changer the parser, it simply reports if the grammar is BNF. (Why? Well, if you’re interested in exploring a tool like the ambiguity analyzer, the results can be easier to understand if the input grammar hasn’t been transformed in any way. This option let’s you check if that’s the case.)


Use the XPath expression to select among ambiguous alternatives. See Chapter 6, Choosing among alternatives.


Load a specific configuration file.


If the --debug option is specified and CoffeePot terminates because an exception was raised, the stack trace will be printed.


Requests a desription of where in the parse forest ambiguity arose. This option only applies to ambiguous parses. See also, Section 5.1, “Describing ambiguity”.


Selects how ambiguity will be described. Implies --describe-ambiguity.

The options are text, xml, and api-xml. See Section 5.1, “Describing ambiguity”.


Disables the specified pragma for this parse.


Use the Earley parser. Alternatively, --gll.


Use encoding-name when parsing the input. This must be an encoding name recognized by Java. See also --grammar-encoding.


Requests a specific format type. Available types are xml, the default, json or json-data, json-tree or json-text, or csv.

The json-data format is only possible for result trees that contain no mixed content.

The csv format is only possible for result trees that have a specific “shape”. The grand-children of the root element must all be atomic values and no mixed content is allowed.


Load a function library to select among ambiguous alternatives. See Chapter 6, Choosing among alternatives.


Use the GLL parser. Alternatively, --earley.


Use encoding-name when parsing the input grammar. This must be an encoding name recognized by Java. See also --encoding.


Output an XML representation of the structure of the parse forest. This is intended as the input to visualization processes, like the SVG output.


Output a diagram of the parse forest to the file. This option will only work if GraphViz has been configured. See also --graph-format.


Specifies the graph file format. This must be a format recognized by GraphViz. If no format is provided, the default is taken from the extension of the --graph file. (This option has no effect if --graph is not specified.)


Passes the specified $option with value to the stylesheet that generates the SVG. See Chapter 11, Diagrams.


Sets the log level, one of silent, error, warning, info, debug, or trace. It can also set logging at finer granularities by specifying a list of logger:level pairs, for example: CoffeePot:trace,CoffeeGrinder:info,CoffeeFilter:warning.


Enables marking ambiguities in the output. This option will cause additional attributes to be added to the result trees marking where ambiguous choices were made. (This only works for vertical ambiguities on nonterminals that aren’t suppressed.)


Normalize line endings in the input file. This option will translate all occurrences of carriage return (#D), carriage return followed immediately by a line feed (#D#A), next line (#85), and line separator (#2028) into a single line feed (#A). Multiple line endings are not combined. In other words, #85#2028 becomes #A#A, not #A.


Suppress output of the parse tree.


When generating CSV output, the nonterminal names are used as column headers by default. This option causes them to be omitted.


For an ambiguous parse, you can specify a parse count greater than one to get several (or all) the possible parses. It is not an error if you get multiple parses that have the same XML structure, this simply indicates that the ambiguity was in some part of the result that wasn’t serialized.

No attempt is made to enumerate the infinitely many possible parses that arise if the resulting parse forest contains a loop. Consequently, it’s possible to get an “infinitely ambiguous” result that has only a single parse.


By default, CoffeePot accepts certain grammar extensions, such as pragmas. With this option, only grammars strictly conforming to the Invisible XML specification may be used.


There are two styles for managing priorities: max and sum. If the style is max (the default), the priority of a node in the graph is the same as the highest priority among its descendants. If the style is sum, the priority of a node in the graph is the sum of the priorities of its descendants. In either case, if a node has an explicity priority, that priority takes precedence.


Allows you to specify that the progress bar should be on, off, or only on if the output is going to a TTY (tty). The default for this option can be set in the configuration file.


If provenance is requested, a comment is generated at the top of XML outputs that identifies the version of NineML used and details about the input and the grammar. (This only applies to XML outputs as neither JSON nor CSV have a standard mechanism for comments.)


Specifies that the input should be parsed as a set of records delimited by the regular expression provided at the end of each record. (Implies --records.)


Specifies that the input should be parsed as a set of records delimited by the regular expression provided at the start of each record. (Implies --records.)


Specifies that the input should be parsed as a set of records. If neither --record-start or --record-end is provided, the default separator is the record-ending regular expression “\n”.


Repeat the parse number times. This option only really exists for performance testing. If both --repeat and --time are specified, a summary of total parse time over all repeats will be printed.


Show the state chart used by the parser.


Show the grammar used by the parser.


Include hidden nonterminals in the output. This will include nonterminals from the original grammar marked with “-” and also nonterminals generated by the grammar transformation.


Include marks in the output. These are added as ixml:mark attributes on the elements.


Show (and log) the parse options specified in the configuration.


When priorities or other mechanisms are used to select a parse from an ambiguous forest, if those mechanisms successfully choose a unique parse, the result will not be marked as ambiguous. Using the --strict-ambiguity flag will always mark ambiguous parses as ambiguous.


Suppress ixml:state values. The ambiguous and prefix states can be suppressed.


Enables output of parse timings.


When using record-oriented input, this option enables output of timings for each record.


Force a newline at the end of the output?


If this option is enabled, multiple parses will be output as soon as they’re available. The resulting output will not have a wrapper to assure well-formedness.


Display the CoffeePot version number.

2.2In build tools

The HOWTO repository contains examples of using CoffeePot in build tools such as Gradle.