Chapter 2. Running CoffeePot
CoffeePot can be run directly from the jar file with a command like:
java
-jar
coffeepot-.jar options
It may be more convenient, however, to write a small batch file or shell script to run it with an explicit classpath, Chapter 7, Configuration.
In the examples that follow, we assume that “coffeepot
” will run
CoffeePot. Depending on how you’ve installed it, you may have to
substitute java -jar /path/to/coffeepot.jar
or some other variation.
2.1. The command line
Typical usage is:
coffeepot
[--pretty-print]
[--parse:number]
[--grammar:file]
[--output:file]
[
| [--input:file]
| input…
]
(For a quick recap of all the possible options, run
CoffeeJar with the --help
option.)
Where:
--pretty-print
-
Specifies that XML output should be formatted with line breaks and indentation. Note: the built-in pretty printer is a little bit crude, especially in the face of mixed content.
--parse:number
-
Selects a specific parse. If the parse was ambiguous, use this option to inspect alternate parses. (If the parse wasn’t ambiguous, there’s only one.) You can also use
--parse-count
to display more than one parse. --grammar:file
-
Specifies the input grammar. If unspecified, the Invisible XML specification grammar is used. The grammar specified may be an Invisible XML grammar in either the text or XML formats.
--output:file
-
Specifies the output location, defaults to standard output.
--input:file
-
Specifies the input file. If unspecified, the remaining command line arguments are taken as the input, separated by single spaces.
The following additional options are also available:
--analyze-ambiguity
-
Applies Anders Møller’s ambiguity analyzer to the grammar. This can help to identify where ambiguity occurs and what inputs will trigger it. The analyzer operates on the transformed grammar.
Note that the analyzer can be quite slow if the grammar uses Unicode character classes. This is especially true if it uses classes that contain a large number of characters such as
[L]
or[LC]
. --bnf
-
Checks if the grammar is a “plain BNF” grammar. This doesn’t changer the parser, it simply reports if the grammar is BNF. (Why? Well, if you’re interested in exploring a tool like the ambiguity analyzer, the results can be easier to understand if the input grammar hasn’t been transformed in any way. This option let’s you check if that’s the case.)
--choose:XPath-expression
-
Use the XPath expression to select among ambiguous alternatives. See Chapter 6, Choosing among alternatives.
--config:file
-
Load a specific configuration file.
--debug
-
If the
--debug
option is specified and CoffeePot terminates because an exception was raised, the stack trace will be printed. --describe-ambiguity
-
Requests a desription of where in the parse forest ambiguity arose. This option only applies to ambiguous parses. See also, Section 5.1, “Describing ambiguity”.
--describe-ambiguity-with:format
-
Selects how ambiguity will be described. Implies
--describe-ambiguity
.The options are
text
,xml
, andapi-xml
. See Section 5.1, “Describing ambiguity”. --disable-pragma:pragma
-
Disables the specified pragma for this parse.
--earley
-
Use the Earley parser. Alternatively,
--gll
. --encoding:encoding-name
-
Use encoding-name when parsing the input. This must be an encoding name recognized by Java. See also
--grammar-encoding
. --format:type
-
Requests a specific format type. Available types are
xml
, the default,json
orjson-data
,json-tree
orjson-text
, orcsv
.The
json-data
format is only possible for result trees that contain no mixed content.The
csv
format is only possible for result trees that have a specific “shape”. The grand-children of the root element must all be atomic values and no mixed content is allowed. --function-library:library
-
Load a function library to select among ambiguous alternatives. See Chapter 6, Choosing among alternatives.
--gll
-
Use the GLL parser. Alternatively,
--earley
. --grammar-encoding:encoding-name
-
Use encoding-name when parsing the input grammar. This must be an encoding name recognized by Java. See also
--encoding
. --forest:file
-
Output an XML representation of the structure of the parse forest. This is intended as the input to visualization processes, like the SVG output.
--graph:file
-
Output a diagram of the parse forest to the file. This option will only work if GraphViz has been configured. See also
--graph-format
. --graph-format:format
-
Specifies the graph file format. This must be a format recognized by GraphViz. If no format is provided, the default is taken from the extension of the
--graph
file. (This option has no effect if--graph
is not specified.) --graph-option:option=value
-
Passes the specified
$option
withvalue
to the stylesheet that generates the SVG. See Chapter 11, Diagrams. --log:levels
-
Sets the log level, one of
silent
,error
,warning
,info
,debug
, ortrace
. It can also set logging at finer granularities by specifying a list of logger:level pairs, for example:CoffeePot:trace,CoffeeGrinder:info,CoffeeFilter:warning
. --mark-ambiguities
-
Enables marking ambiguities in the output. This option will cause additional attributes to be added to the result trees marking where ambiguous choices were made. (This only works for vertical ambiguities on nonterminals that aren’t suppressed.)
--normalize
-
Normalize line endings in the input file. This option will translate all occurrences of carriage return (
#D
), carriage return followed immediately by a line feed (#D#A
), next line (#85
), and line separator (#2028
) into a single line feed (#A
). Multiple line endings are not combined. In other words,#85#2028
becomes#A#A
, not#A
. --no-output
-
Suppress output of the parse tree.
--omit-csv-headers
-
When generating CSV output, the nonterminal names are used as column headers by default. This option causes them to be omitted.
--parse-count:number|all
-
For an ambiguous parse, you can specify a parse count greater than one to get several (or
all
) the possible parses. It is not an error if you get multiple parses that have the same XML structure, this simply indicates that the ambiguity was in some part of the result that wasn’t serialized.No attempt is made to enumerate the infinitely many possible parses that arise if the resulting parse forest contains a loop. Consequently, it’s possible to get an “infinitely ambiguous” result that has only a single parse.
--pedantic
-
By default, CoffeePot accepts certain grammar extensions, such as pragmas. With this option, only grammars strictly conforming to the Invisible XML specification may be used.
--priority-style:style
-
There are two styles for managing priorities:
max
andsum
. If the style ismax
(the default), the priority of a node in the graph is the same as the highest priority among its descendants. If the style issum
, the priority of a node in the graph is the sum of the priorities of its descendants. In either case, if a node has an explicity priority, that priority takes precedence. --progress-bar:value
-
Allows you to specify that the progress bar should be
on
,off
, or only on if the output is going to a TTY (tty
). The default for this option can be set in the configuration file. --provenance
-
If provenance is requested, a comment is generated at the top of XML outputs that identifies the version of NineML used and details about the input and the grammar. (This only applies to XML outputs as neither JSON nor CSV have a standard mechanism for comments.)
--record-end:regular-expression
-
Specifies that the input should be parsed as a set of records delimited by the regular expression provided at the end of each record. (Implies
--records
.) --record-start:regular-expression
-
Specifies that the input should be parsed as a set of records delimited by the regular expression provided at the start of each record. (Implies
--records
.) --records
-
Specifies that the input should be parsed as a set of records. If neither
--record-start
or--record-end
is provided, the default separator is the record-ending regular expression “\n
”. --repeat:number
-
Repeat the parse number times. This option only really exists for performance testing. If both
--repeat
and--time
are specified, a summary of total parse time over all repeats will be printed. --show-chart
-
Show the state chart used by the parser.
--show-grammar
-
Show the grammar used by the parser.
--show-marks
-
Include marks in the output. These are added as
ixml:mark
attributes on the elements. --show-options
-
Show (and log) the parse options specified in the configuration.
--strict-ambiguity
-
When priorities or other mechanisms are used to select a parse from an ambiguous forest, if those mechanisms successfully choose a unique parse, the result will not be marked as ambiguous. Using the
--strict-ambiguity
flag will always mark ambiguous parses as ambiguous. --suppress:state:state:…
-
Suppress
ixml:state
values. Theambiguous
andprefix
states can be suppressed. --time
-
Enables output of parse timings.
--time-records
-
When using record-oriented input, this option enables output of timings for each record.
--trailing-newline
-
Force a newline at the end of the output?
--unbuffered
-
If this option is enabled, multiple parses will be output as soon as they’re available. The resulting output will not have a wrapper to assure well-formedness.
--version
-
Display the CoffeePot version number.
2.2. In build tools
The HOWTO repository contains examples of using CoffeePot in build tools such as Gradle.