java.lang.Object
org.apache.lucene.util.AttributeImpl
com.github.oeuvres.alix.lucene.analysis.tokenattributes.CharsAttImpl
All Implemented Interfaces:
Appendable, CharSequence, Cloneable, Comparable<CharSequence>, org.apache.lucene.analysis.tokenattributes.CharTermAttribute, org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute, org.apache.lucene.util.Attribute
Direct Known Subclasses:
LemAttImpl, OrthAttImpl

public class CharsAttImpl extends org.apache.lucene.util.AttributeImpl implements org.apache.lucene.analysis.tokenattributes.CharTermAttribute, org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute, Appendable, Cloneable, CharSequence, Comparable<CharSequence>
An implementation of Lucene CharTermAttribute designed to be an efficient key in an HashMap, and with tools for char manipulation (ex: capitalize).
  • Field Details

    • builder

      protected org.apache.lucene.util.BytesRefBuilder builder
      May be used by subclasses to convert to different charsets / encodings for implementing getBytesRef().
  • Constructor Details

    • CharsAttImpl

      public CharsAttImpl()
      Initialize this attribute with empty term text.
    • CharsAttImpl

      public CharsAttImpl(String s)
      Initialize the chars with a String
      Parameters:
      s - value.
    • CharsAttImpl

      public CharsAttImpl(Chain chain)
      Copy chars from a mutable String Chain. Use it to build an optimized key in an HashMap. Do not used in a token stream, getBytesRef() will not be available.
      Parameters:
      chain - value.
    • CharsAttImpl

      public CharsAttImpl(CharsAttImpl token)
      Copy chars from another attribute. Use it to build an optimized key in an HashMap. Do not used in a token stream, getBytesRef() will not be available.
      Parameters:
      token - another char attribute.
    • CharsAttImpl

      public CharsAttImpl(char[] buffer, int offset, int length)
      Copy chars from a char array. Use it to build an optimized key in an HashMap. Do not used in a token stream, getBytesRef() will not be available.
      Parameters:
      buffer - source char array.
      offset - position in buffer wher to start copy.
      length - amount of chars to copy.
  • Method Details

    • append

      public final CharsAttImpl append(CharSequence csq)
      Specified by:
      append in interface Appendable
      Specified by:
      append in interface org.apache.lucene.analysis.tokenattributes.CharTermAttribute
    • append

      public final CharsAttImpl append(CharSequence csq, int start, int end)
      Specified by:
      append in interface Appendable
      Specified by:
      append in interface org.apache.lucene.analysis.tokenattributes.CharTermAttribute
    • append

      public final CharsAttImpl append(char c)
      Specified by:
      append in interface Appendable
      Specified by:
      append in interface org.apache.lucene.analysis.tokenattributes.CharTermAttribute
    • append

      public final CharsAttImpl append(String s)
      Specified by:
      append in interface org.apache.lucene.analysis.tokenattributes.CharTermAttribute
    • append

      public final CharsAttImpl append(StringBuilder s)
      Specified by:
      append in interface org.apache.lucene.analysis.tokenattributes.CharTermAttribute
    • append

      public final CharsAttImpl append(org.apache.lucene.analysis.tokenattributes.CharTermAttribute ta)
      Specified by:
      append in interface org.apache.lucene.analysis.tokenattributes.CharTermAttribute
    • buffer

      public final char[] buffer()
      Specified by:
      buffer in interface org.apache.lucene.analysis.tokenattributes.CharTermAttribute
    • capitalize

      public CharsAttImpl capitalize()
      Try to capitalize (initial capital only) decently, according to some rules available in latin language. ex: états-unis -> États-Unis.
      Returns:
      this, for chaining.
    • charAt

      public final char charAt(int index)
      Specified by:
      charAt in interface CharSequence
    • clear

      public void clear()
      Specified by:
      clear in class org.apache.lucene.util.AttributeImpl
    • clone

      public CharsAttImpl clone()
      Overrides:
      clone in class org.apache.lucene.util.AttributeImpl
    • compareTo

      public int compareTo(CharSequence other)
      Specified by:
      compareTo in interface Comparable<CharSequence>
    • copy

      public final CharsAttImpl copy(org.apache.lucene.analysis.tokenattributes.CharTermAttribute ta)
      Copy a CharTermAttribute in the buffer.
      Parameters:
      ta - attribute.
      Returns:
      this.
    • copy

      public final CharsAttImpl copy(org.apache.lucene.analysis.tokenattributes.CharTermAttribute ta, int start, int len)
      Copy a substring of a CharTermAttribute in the buffer.
      Parameters:
      ta - a term AttributeImpl.
      start - offset in char[] of the term.
      len - amount of char to copy.
      Returns:
      this.
    • copy

      public final CharsAttImpl copy(org.apache.lucene.util.BytesRef bytes)
      Copy UTF-8 bytes BytesRef in the char[] buffer. Used by Alix to test UTF-8 bytes against chars[] stores in HashMap FrDics
      Parameters:
      bytes - UTF8 as bytes.
      Returns:
      this.
    • copyTo

      public void copyTo(org.apache.lucene.util.AttributeImpl target)
      Specified by:
      copyTo in class org.apache.lucene.util.AttributeImpl
    • copyTo

      public void copyTo(org.apache.lucene.analysis.tokenattributes.CharTermAttribute target)
      Copy chars from this attribute to an other.
      Parameters:
      target - destination attribute.
    • copyBuffer

      public final void copyBuffer(char[] buffer, int offset, int length)
      Specified by:
      copyBuffer in interface org.apache.lucene.analysis.tokenattributes.CharTermAttribute
    • endsWith

      public boolean endsWith(char c)
      Test an ending char.
      Parameters:
      c - char to test.
      Returns:
      true if last char == c, false otherwise.
    • endsWith

      public boolean endsWith(String suffix)
      Test a suffix, char by char.
      Parameters:
      suffix - to test.
      Returns:
      true if attribute ends by suffix, false otherwise.
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object
    • getBytesRef

      public org.apache.lucene.util.BytesRef getBytesRef()
      Specified by:
      getBytesRef in interface org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute
    • hashCode

      public int hashCode()
      Same hashCode() as a String computed as
       s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
       
      using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
      Overrides:
      hashCode in class Object
      Returns:
      a hash code value for this object.
    • indexOf

      public int indexOf(char c)
      First position of a char.
      Parameters:
      c - char to search.
      Returns:
      -1 if not found or positive index if found
    • isEmpty

      public final boolean isEmpty()
      Test if there is no chars registred.
      Specified by:
      isEmpty in interface CharSequence
      Returns:
      true if empty, false otherwise.
    • lastChar

      public char lastChar() throws ArrayIndexOutOfBoundsException
      Get last char
      Returns:
      last char.
      Throws:
      ArrayIndexOutOfBoundsException - if there is no char
    • lastIndexOf

      public int lastIndexOf(char c)
      Find index of last occurrence of a char.
      Parameters:
      c - char to search.
      Returns:
      -1 if not found or positive index if found
    • length

      public final int length()
      Specified by:
      length in interface CharSequence
    • mark

      public final CharsAttImpl mark()
      Record actual size of string to go back to this state with @see #rewind(), like @see java.io.Reader#mark(int).
      Returns:
      this, for chaining.
    • resizeBuffer

      public final char[] resizeBuffer(int newSize)
      Specified by:
      resizeBuffer in interface org.apache.lucene.analysis.tokenattributes.CharTermAttribute
    • reflectWith

      public void reflectWith(org.apache.lucene.util.AttributeReflector reflector)
      Specified by:
      reflectWith in class org.apache.lucene.util.AttributeImpl
    • rewind

      public final CharsAttImpl rewind()
      Restore String size like it was recorded with last @see #mark(). If no mark has been set, nothing is done. Used mark is deleted, explicit @see #mark() is needed to record this state. Works a bit like @see java.io.Reader#reset() with a less confusing name.
      Returns:
      this, for chaining.
    • rtrim

      public final CharsAttImpl rtrim()
      Delete spaces at end (right trim)
      Returns:
      this
    • rtrim

      public final CharsAttImpl rtrim(String spaces)
      Delete different characters at the end (right trim)
      Parameters:
      spaces - char codes to delete
      Returns:
      this
    • setCharAt

      public void setCharAt(int pos, char c)
      Change a char at a specific position.
      Parameters:
      pos - position to change.
      c - new char value.
    • setEmpty

      public final CharsAttImpl setEmpty()
      Specified by:
      setEmpty in interface org.apache.lucene.analysis.tokenattributes.CharTermAttribute
    • setLength

      public final CharsAttImpl setLength(int length)
      Specified by:
      setLength in interface org.apache.lucene.analysis.tokenattributes.CharTermAttribute
    • startsWith

      public boolean startsWith(char c)
      Test a starting char.
      Parameters:
      c - char to test.
      Returns:
      true if starting char == c, false otherwise.
    • startsWith

      public boolean startsWith(String prefix)
      Test a prefix, char by char.
      Parameters:
      prefix - to test
      Returns:
      true if attribute ends by suffix, false otherwise
    • subSequence

      public final CharSequence subSequence(int start, int end)
      Specified by:
      subSequence in interface CharSequence
    • toLower

      public CharsAttImpl toLower()
      Convert all chars from the buffer to lower case. To avoid default JDK conversion, some efficiency come from tests with the Char.
      Returns:
      this, for chaining.
    • toString

      public String toString()
      Returns solely the term text as specified by the CharSequence interface.
      Specified by:
      toString in interface CharSequence
      Overrides:
      toString in class Object
    • unmark

      public final CharsAttImpl unmark()
      If we can’t remember if @see #mark() has been set, ensure, reset it.
      Returns:
      this, for chaining.