Class CharacterSet

java.lang.Object
org.nineml.coffeegrinder.tokens.CharacterSet

public class CharacterSet extends Object
A class that represents a range of Unicode characters.

Ranges can be constructed from a literal string, from a range of Unicode codepoints, or via Unicode character classes.

  • Method Details

    • isRange

      public boolean isRange()
      Does this character set represent a range of Unicode code points?
      Returns:
      true, if this character set is a range.
    • getRangeFrom

      public int getRangeFrom()
      Where does the range begin?

      Ranges are inclusive. There result of this method is undefined if isRange() returns false.

      Returns:
      the first character in the range
    • getRangeTo

      public int getRangeTo()
      Where does the range end?

      Ranges are inclusive. There result of this method is undefined if isRange() returns false.

      Returns:
      the last character in the range
    • isSetOfCharacters

      public boolean isSetOfCharacters()
      Does this character set represent a specific set of characters?
      Returns:
      true, if this character set is a specific set of characters.
    • getCharacters

      public String getCharacters()
      What characters are in the set?

      If this character set represents a set of characters, this method returns them as a string. Otherwise, it returns null.

      Returns:
      the set of characters as a string
    • isUnicodeCharacterClass

      public boolean isUnicodeCharacterClass()
      Does this character set represent a Unicode character class?
      Returns:
      true, if this character set is a Unicode character class
    • getUnicodeCharacterClass

      public String getUnicodeCharacterClass()
      What is the Unicode character class?

      Returns the one or two character string that defines the character class. Returns null if this character set does not represent a Unicode character class.

      Returns:
      the character class
    • literal

      public static CharacterSet literal(String literal)
      Construct a character set containing each of the characters in the literal string.
      Parameters:
      literal - The string of characters.
      Returns:
      A character set that will match any of those characters.
      Throws:
      NullPointerException - if the literal is null.
      IllegalArgumentException - if the literal is the empty string.
    • range

      public static CharacterSet range(int first, int last)
      Construct a character set containing each of the characters in the specified range, inclusive.
      Parameters:
      first - The first codepoint.
      last - The last codepoint.
      Returns:
      A character set that will match each of those characters.
      Throws:
      IllegalArgumentException - if the range is invalid.
    • unicodeClass

      public static CharacterSet unicodeClass(String charClass)
      Construct a character set representing the specified Unicode character class.
      Parameters:
      charClass - The character class, for example "L", or "Nd".
      Returns:
      A character set that will match characters in that class.
      Throws:
      NullPointerException - if the charClass is null.
      IllegalArgumentException - if the charClass is less than 1 or more than 2 characters long.
    • equals

      public boolean equals(Object obj)
      Tests for the equality of two CharacterSet objects.

      Two CharacterSet objects are equal only if they identify the same characters expressed in the same way. A set created from the literal "0123456789" is not equal to a set created from the range '0' to '9'.

      Overrides:
      equals in class Object
      Parameters:
      obj - A CharacterSet to test for equality against.
      Returns:
      true if and only if the character set provided identifies the same range of characters.
    • matches

      public boolean matches(int codepoint)
      Test if a code point occurs in the set.
      Parameters:
      codepoint - The Unicode codepoint to test.
      Returns:
      true if and only if the codepoint is in the set.
    • toString

      public String toString()
      Overrides:
      toString in class Object