Chapter 2Running CoffeePot

CoffeePot can be run directly from the jar file with a command like:

java -jar coffeepot-.jar options

It may be more convenient, however, to write a small batch file or shell script to run it with an explicit classpath, Chapter 7, Configuration.

In the examples that follow, we assume that “coffeepot” will run CoffeePot. Depending on how you’ve installed it, you may have to substitute java -jar /path/to/coffeepot.jar or some other variation.

2.1The command line

Typical usage is:

coffeepot [--pretty-print] [--parse:number] [--grammar:file] [--output:file] [ | [--input:file] | input ]

(For a quick recap of all the possible options, run CoffeeJar with the --help option.)

Where:

--pretty-print

Specifies that XML output should be formatted with line breaks and indentation. Note: the built-in pretty printer is a little bit crude, especially in the face of mixed content.

--parse:number

Selects a specific parse. If the parse was ambiguous, use this option to inspect alternate parses. (If the parse wasn’t ambiguous, there’s only one.) You can also use --parse-count to display more than one parse.

--grammar:file

Specifies the input grammar. If unspecified, the Invisible XML specification grammar is used. The grammar specified may be an Invisible XML grammar in either the text or XML formats.

--output:file

Specifies the output location, defaults to standard output.

--input:file

Specifies the input file. If unspecified, the remaining command line arguments are taken as the input, separated by single spaces.

The following additional options are also available:

--analyze-ambiguity

Applies Anders Møller’s ambiguity analyzer to the grammar. This can help to identify where ambiguity occurs and what inputs will trigger it. The analyzer operates on the transformed grammar.

Note that the analyzer can be quite slow if the grammar uses Unicode character classes. This is especially true if it uses classes that contain a large number of characters such as [L] or [LC].

--bnf

Checks if the grammar is a “plain BNF” grammar. This doesn’t changer the parser, it simply reports if the grammar is BNF. (Why? Well, if you’re interested in exploring a tool like the ambiguity analyzer, the results can be easier to understand if the input grammar hasn’t been transformed in any way. This option let’s you check if that’s the case.)

--choose:XPath-expression

Use the XPath expression to select among ambiguous alternatives. See Chapter 6, Choosing among alternatives.

--config:file

Load a specific configuration file.

--debug

If the --debug option is specified and CoffeePot terminates because an exception was raised, the stack trace will be printed.

--describe-ambiguity

Requests a desription of where in the parse forest ambiguity arose. This option only applies to ambiguous parses. See also, Section 5.1, “Describing ambiguity”.

--describe-ambiguity-with:format

Selects how ambiguity will be described. Implies --describe-ambiguity.

The options are text, xml, and api-xml. See Section 5.1, “Describing ambiguity”.

--disable-pragma:pragma

Disables the specified pragma for this parse.

--earley

Use the Earley parser. Alternatively, --gll.

--encoding:encoding-name

Use encoding-name when parsing the input. This must be an encoding name recognized by Java. See also --grammar-encoding.

--format:type

Requests a specific format type. Available types are xml, the default, json or json-data, json-tree or json-text, or csv.

The json-data format is only possible for result trees that contain no mixed content.

The csv format is only possible for result trees that have a specific “shape”. The grand-children of the root element must all be atomic values and no mixed content is allowed.

--function-library:library

Load a function library to select among ambiguous alternatives. See Chapter 6, Choosing among alternatives.

--gll

Use the GLL parser. Alternatively, --earley.

--grammar-encoding:encoding-name

Use encoding-name when parsing the input grammar. This must be an encoding name recognized by Java. See also --encoding.

--forest:file

Output an XML representation of the structure of the parse forest. This is intended as the input to visualization processes, like the SVG output.

--graph:file

Output a diagram of the parse forest to the file. This option will only work if GraphViz has been configured. See also --graph-format.

--graph-format:format

Specifies the graph file format. This must be a format recognized by GraphViz. If no format is provided, the default is taken from the extension of the --graph file. (This option has no effect if --graph is not specified.)

--graph-option:option=value

Passes the specified $option with value to the stylesheet that generates the SVG. See Chapter 11, Diagrams.

--log:levels

Sets the log level, one of silent, error, warning, info, debug, or trace. It can also set logging at finer granularities by specifying a list of logger:level pairs, for example: CoffeePot:trace,CoffeeGrinder:info,CoffeeFilter:warning.

--mark-ambiguities

Enables marking ambiguities in the output. This option will cause additional attributes to be added to the result trees marking where ambiguous choices were made. (This only works for vertical ambiguities on nonterminals that aren’t suppressed.)

--normalize

Normalize line endings in the input file. This option will translate all occurrences of carriage return (#D), carriage return followed immediately by a line feed (#D#A), next line (#85), and line separator (#2028) into a single line feed (#A). Multiple line endings are not combined. In other words, #85#2028 becomes #A#A, not #A.

--no-output

Suppress output of the parse tree.

--omit-csv-headers

When generating CSV output, the nonterminal names are used as column headers by default. This option causes them to be omitted.

--parse-count:number|all

For an ambiguous parse, you can specify a parse count greater than one to get several (or all) the possible parses. It is not an error if you get multiple parses that have the same XML structure, this simply indicates that the ambiguity was in some part of the result that wasn’t serialized.

No attempt is made to enumerate the infinitely many possible parses that arise if the resulting parse forest contains a loop. Consequently, it’s possible to get an “infinitely ambiguous” result that has only a single parse.

--pedantic

By default, CoffeePot accepts certain grammar extensions, such as pragmas. With this option, only grammars strictly conforming to the Invisible XML specification may be used.

--priority-style:style

There are two styles for managing priorities: max and sum. If the style is max (the default), the priority of a node in the graph is the same as the highest priority among its descendants. If the style is sum, the priority of a node in the graph is the sum of the priorities of its descendants. In either case, if a node has an explicity priority, that priority takes precedence.

--progress-bar:value

Allows you to specify that the progress bar should be on, off, or only on if the output is going to a TTY (tty). The default for this option can be set in the configuration file.

--provenance

If provenance is requested, a comment is generated at the top of XML outputs that identifies the version of NineML used and details about the input and the grammar. (This only applies to XML outputs as neither JSON nor CSV have a standard mechanism for comments.)

--record-end:regular-expression

Specifies that the input should be parsed as a set of records delimited by the regular expression provided at the end of each record. (Implies --records.)

--record-start:regular-expression

Specifies that the input should be parsed as a set of records delimited by the regular expression provided at the start of each record. (Implies --records.)

--records

Specifies that the input should be parsed as a set of records. If neither --record-start or --record-end is provided, the default separator is the record-ending regular expression “\n”.

--repeat:number

Repeat the parse number times. This option only really exists for performance testing. If both --repeat and --time are specified, a summary of total parse time over all repeats will be printed.

--show-chart

Show the state chart used by the parser.

--show-grammar

Show the grammar used by the parser.

--show-hidden-nonterminals

Include hidden nonterminals in the output. This will include nonterminals from the original grammar marked with “-” and also nonterminals generated by the grammar transformation.

--show-marks

Include marks in the output. These are added as ixml:mark attributes on the elements.

--show-options

Show (and log) the parse options specified in the configuration.

--strict-ambiguity

When priorities or other mechanisms are used to select a parse from an ambiguous forest, if those mechanisms successfully choose a unique parse, the result will not be marked as ambiguous. Using the --strict-ambiguity flag will always mark ambiguous parses as ambiguous.

--suppress:state:state:…

Suppress ixml:state values. The ambiguous and prefix states can be suppressed.

--time

Enables output of parse timings.

--time-records

When using record-oriented input, this option enables output of timings for each record.

--trailing-newline

Force a newline at the end of the output?

--unbuffered

If this option is enabled, multiple parses will be output as soon as they’re available. The resulting output will not have a wrapper to assure well-formedness.

--version

Display the CoffeePot version number.

2.2In build tools

The HOWTO repository contains examples of using CoffeePot in build tools such as Gradle.