Appendix A. Property files
A worked example
Introduction…
A.1. A first attempt
Suppose that we wanted to read Java-style property files, like this one:
# This is a comment.
name1 : value1
name2 = value2
Where:
Lines consist of name/value pairs, separated by “:” or “=”; whitespace around the separator is irrelevant.
-
If the first non-whitespace character in a line is “!” or “#”, it is a comment.
Our first attempt to parse this grammar might look something like prop1.ixml:
FIXME:
Judicious use of “-” characters before terminals and non-terminals keeps the output clean. If you run this through coffeepot, you’ll get:
$ coffeepot -v -g:examples/prop1.ixml -i:examples/example1.properties -pp
Loading ixml grammar: examples/prop1.ixml
Loading input from examples/example1.properties
There are 2 possible parses.
<property-file xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
<name-value># This is a comment.</name-value>
<name-value>name1 : value1</name-value>
<name-value>name2 = value2</name-value>
</property-file>
That looks reasonable for our initial grammar, but you might
wonder where the ambiguity arises. Let’s find out with the --describe-ambiguity
option:
$ coffeepot -v -g:examples/prop1.ixml -i:examples/example1.properties --describe-ambiguity -pp
Loading ixml grammar: examples/prop1.ixml
Loading input from examples/example1.properties
There are 2 possible parses.
Ambiguity:
$2, 0, 51
line, 0, 21 / $3ⁿ, 21, 51
line, 0, 21 / $3ⁿ, 21, 51
<property-file xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
<name-value># This is a comment.</name-value>
<name-value>name1 : value1</name-value>
<name-value>name2 = value2</name-value>
</property-file>
When you ask coffeepot to describe ambiguity, or when it fails to parse your document and attempts to report errors, it has little choice at the moment except to expose some of the inner workings of the parser. This is described more thoroughly in Chapter 3, How it works.
This output indicates that the nonterminal “$2”, covering the range of
characters 0-51 has two different derivations. Sometimes it’s useful to look at the graph
. You can get an SVG version of it with the --graph-svg
option:
There you can see that the culprit is that a line can be either
a comment
or a name-value
. Does that seem strange?
Well, look back at our proto-grammar:
comment: s, -["#!"], char*, NL .
name-value: char*, NL .
It says that a comment has to begin with a “#” or “!”, so line 1
could be a comment, but all that name-value
says at the moment is that
it doesn’t include newlines. So it could also match the first line!
A.2. Refining name-value
It’s reasonably straight-forward to improve on name-value
in prop2.ixml:
name-value: s, name, s, -[":="], s, value, NL.
name: namestart, namefollower* .
value: ~[Zs; #9; #d], char* .
-namestart: ["_"; L] .
-namefollower: namestart; ["-.·‿⁀"; Nd; Mn] .
Here we’re saying that a name-value
is a name, followed
by a “:” or “=” separator, followed by a value; a name is a name start character
followed by zero or more name follower characters, and a value is something
that isn’t whitespace followed by any characters.
This does a good job on our sample file:
$ coffeepot -v -g:examples/prop2.ixml -i:examples/example1.properties -pp
Loading ixml grammar: examples/prop2.ixml
Loading input from examples/example1.properties
<property-file>
<comment> This is a comment.</comment>
<name-value>
<name>name1</name>
<value>value1</value>
</name-value>
<name-value>
<name>name2</name>
<value>value2</value>
</name-value>
</property-file>
A.3. More line options
The format for property files is actually a bit more complicated. They allow blank lines, continuation lines, and several flavors of escaped characters:
# This is a comment.
name1 : value1
name2 = value2
name3 = apple,\
banana,\
pear
name4 = a\tb
name5 = a\u2192b
name6 = c:\\path\\to\\thing
In fact, the format as described by Java allows even more escaping, and allows names without values, which we’re not going to try to cover now. The Java description is a fine example of a messy, procedural description of a file format. Their parsing description is explicitly two-pass, though it’s unclear if that’s necessary or if the author was just describing what their code does.
Before looking at the solution, have a go at extending the grammar to support blank lines and continuations. Blank lines are easy, continuations are a little more complicated.
Here’s one solution: in prop3.ixml:
FIXME:
Now we get:
$ coffeepot -v -g:examples/prop3.ixml -i:examples/example.properties --describe-ambiguity -pp
Loading ixml grammar: examples/prop3.ixml
Loading input from examples/example.properties
<property-file>
<comment> This is a comment.</comment>
<name-value>
<name>name1</name>
<value>value1</value>
</name-value>
<name-value>
<name>name2</name>
<value>value2</value>
</name-value>
<name-value>
<name>name3</name>
<value>apple,banana,pear</value>
</name-value>
<blank/>
<name-value>
<name>name4</name>
<value>a\tb</value>
</name-value>
<name-value>
<name>name5</name>
<value>a\u2192b </value>
</name-value>
<name-value>
<name>name6</name>
<value>c:\\path\\to\\thing</value>
</name-value>
</property-file>
Note that “apple”, “banana”, and “pear” have been correctly combined into a single value. The blank line is explicit, but we could suppress it by putting “-” before it’s name.
A.4. Character escapes
The last thing we’ll look at are characater escapes. The property file format says that tab, carriage return, and newline can be escaped as “\t”, “\r”, and “\n”, respectively. This also requires introducing an escape for “\”, “\\”. In addition, Java-style Unicode references are allowed: “\uHHHH” where “HHHH” is any four hexidecimal digits.
As before, you might want to think about this before you look at the solution.
The solution in prop4.ixml is:
-char: ~["\";#a] ; tab; cr ; nl ; bs ; uref .
tab: -"\t" .
cr: -"\r" .
nl: -"\n" .
-bs: "\", -"\" .
uref: -"\u", digit, digit, digit, digit .
-digit: ["0"-"9"; "a"-"f"; "A"-"F"] .
We augment char
so that it’s a non-backslash character or
a backslash followed by one of “t”, “r”, “n”, or “\”. Or it’s a “\u”
followed by four hexidecimal digits.
Here we encounter an interesting consequence of the design of
Invisible XML version 1.0. Although for the “\\” case, we can suppress
one backslash and output the other, there’s nothing we can do, for
example, to replace “\t” with a literal tab character. Instead,
we leave <tab/>
, etc. in the output where they can be
cleaned up later.
$ coffeepot -v -g:examples/prop4.ixml -i:examples/example.properties --describe-ambiguity -pp
Loading ixml grammar: examples/prop4.ixml
Loading input from examples/example.properties
<property-file>
<comment> This is a comment.</comment>
<name-value>
<name>name1</name>
<value>value1</value>
</name-value>
<name-value>
<name>name2</name>
<value>value2</value>
</name-value>
<name-value>
<name>name3</name>
<value>apple,banana,pear</value>
</name-value>
<blank/>
<name-value>
<name>name4</name>
<value>a<tab/>b</value>
</name-value>
<name-value>
<name>name5</name>
<value>a<uref>2192</uref>b </value>
</name-value>
<name-value>
<name>name6</name>
<value>c:\path\to\thing</value>
</name-value>
</property-file>
A.5. Challenges for the reader
The example grammar in this chapter doesn’t cover all of the features of property files. If you’re looking for a challenge, consider these improvements:
-
The property file format also specifies that unnecessarily escaped characters are allowed, but the escaping is ignored. An occurrence of
\"
is the same as"
. -
The property file format allows “=” and “:” to occur in property names if they are escaped as
\=
and\:
, respectively. -
In a property file, the “end of file” marks the end of a value. In the grammar presented in this chapter, a terminating newline is required. Can this be fixed?