Class CharsAttImpl
java.lang.Object
org.apache.lucene.util.AttributeImpl
com.github.oeuvres.alix.lucene.analysis.tokenattributes.CharsAttImpl
- All Implemented Interfaces:
Appendable,CharSequence,Cloneable,Comparable<CharSequence>,org.apache.lucene.analysis.tokenattributes.CharTermAttribute,org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute,org.apache.lucene.util.Attribute
- Direct Known Subclasses:
LemAttImpl,OrthAttImpl
public class CharsAttImpl
extends org.apache.lucene.util.AttributeImpl
implements org.apache.lucene.analysis.tokenattributes.CharTermAttribute, org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute, Appendable, Cloneable, CharSequence, Comparable<CharSequence>
An implementation of Lucene
CharTermAttribute designed to be an
efficient key in an HashMap, and with tools for char manipulation (ex:
capitalize).-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected org.apache.lucene.util.BytesRefBuilderMay be used by subclasses to convert to different charsets / encodings for implementinggetBytesRef(). -
Constructor Summary
ConstructorsConstructorDescriptionInitialize this attribute with empty term text.CharsAttImpl(char[] buffer, int offset, int length) Copy chars from a char array.CharsAttImpl(CharsAttImpl token) Copy chars from another attribute.CharsAttImpl(Chain chain) Copy chars from a mutable StringChain.Initialize the chars with a String -
Method Summary
Modifier and TypeMethodDescriptionfinal CharsAttImplappend(char c) final CharsAttImplappend(CharSequence csq) final CharsAttImplappend(CharSequence csq, int start, int end) final CharsAttImplfinal CharsAttImplfinal CharsAttImplappend(org.apache.lucene.analysis.tokenattributes.CharTermAttribute ta) final char[]buffer()Try to capitalize (initial capital only) decently, according to some rules available in latin language. ex: états-unis -> États-Unis.final charcharAt(int index) voidclear()clone()intcompareTo(CharSequence other) final CharsAttImplcopy(org.apache.lucene.analysis.tokenattributes.CharTermAttribute ta) Copy aCharTermAttributein the buffer.final CharsAttImplcopy(org.apache.lucene.analysis.tokenattributes.CharTermAttribute ta, int start, int len) Copy a substring of aCharTermAttributein the buffer.final CharsAttImplcopy(org.apache.lucene.util.BytesRef bytes) Copy UTF-8 bytesBytesRefin the char[] buffer.final voidcopyBuffer(char[] buffer, int offset, int length) voidcopyTo(org.apache.lucene.analysis.tokenattributes.CharTermAttribute target) Copy chars from this attribute to an other.voidcopyTo(org.apache.lucene.util.AttributeImpl target) booleanendsWith(char c) Test an ending char.booleanTest a suffix, char by char.booleanorg.apache.lucene.util.BytesRefinthashCode()Same hashCode() as a String computed asintindexOf(char c) First position of a char.final booleanisEmpty()Test if there is no chars registred.charlastChar()Get last charintlastIndexOf(char c) Find index of last occurrence of a char.final intlength()final CharsAttImplmark()Record actual size of string to go back to this state with @see #rewind(), like @see java.io.Reader#mark(int).voidreflectWith(org.apache.lucene.util.AttributeReflector reflector) final char[]resizeBuffer(int newSize) final CharsAttImplrewind()Restore String size like it was recorded with last @see #mark().final CharsAttImplrtrim()Delete spaces at end (right trim)final CharsAttImplDelete different characters at the end (right trim)voidsetCharAt(int pos, char c) Change a char at a specific position.final CharsAttImplsetEmpty()final CharsAttImplsetLength(int length) booleanstartsWith(char c) Test a starting char.booleanstartsWith(String prefix) Test a prefix, char by char.final CharSequencesubSequence(int start, int end) toLower()Convert all chars from the buffer to lower case.toString()Returns solely the term text as specified by theCharSequenceinterface.final CharsAttImplunmark()If we can’t remember if @see #mark() has been set, ensure, reset it.Methods inherited from class org.apache.lucene.util.AttributeImpl
end, reflectAsStringMethods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.CharSequence
chars, codePoints
-
Field Details
-
builder
protected org.apache.lucene.util.BytesRefBuilder builderMay be used by subclasses to convert to different charsets / encodings for implementinggetBytesRef().
-
-
Constructor Details
-
CharsAttImpl
public CharsAttImpl()Initialize this attribute with empty term text. -
CharsAttImpl
Initialize the chars with a String- Parameters:
s- value.
-
CharsAttImpl
Copy chars from a mutable StringChain. Use it to build an optimized key in an HashMap. Do not used in a token stream, getBytesRef() will not be available.- Parameters:
chain- value.
-
CharsAttImpl
Copy chars from another attribute. Use it to build an optimized key in an HashMap. Do not used in a token stream, getBytesRef() will not be available.- Parameters:
token- another char attribute.
-
CharsAttImpl
public CharsAttImpl(char[] buffer, int offset, int length) Copy chars from a char array. Use it to build an optimized key in an HashMap. Do not used in a token stream, getBytesRef() will not be available.- Parameters:
buffer- source char array.offset- position in buffer wher to start copy.length- amount of chars to copy.
-
-
Method Details
-
append
- Specified by:
appendin interfaceAppendable- Specified by:
appendin interfaceorg.apache.lucene.analysis.tokenattributes.CharTermAttribute
-
append
- Specified by:
appendin interfaceAppendable- Specified by:
appendin interfaceorg.apache.lucene.analysis.tokenattributes.CharTermAttribute
-
append
- Specified by:
appendin interfaceAppendable- Specified by:
appendin interfaceorg.apache.lucene.analysis.tokenattributes.CharTermAttribute
-
append
- Specified by:
appendin interfaceorg.apache.lucene.analysis.tokenattributes.CharTermAttribute
-
append
- Specified by:
appendin interfaceorg.apache.lucene.analysis.tokenattributes.CharTermAttribute
-
append
- Specified by:
appendin interfaceorg.apache.lucene.analysis.tokenattributes.CharTermAttribute
-
buffer
public final char[] buffer()- Specified by:
bufferin interfaceorg.apache.lucene.analysis.tokenattributes.CharTermAttribute
-
capitalize
Try to capitalize (initial capital only) decently, according to some rules available in latin language. ex: états-unis -> États-Unis.- Returns:
- this, for chaining.
-
charAt
public final char charAt(int index) - Specified by:
charAtin interfaceCharSequence
-
clear
public void clear()- Specified by:
clearin classorg.apache.lucene.util.AttributeImpl
-
clone
- Overrides:
clonein classorg.apache.lucene.util.AttributeImpl
-
compareTo
- Specified by:
compareToin interfaceComparable<CharSequence>
-
copy
Copy aCharTermAttributein the buffer.- Parameters:
ta- attribute.- Returns:
- this.
-
copy
public final CharsAttImpl copy(org.apache.lucene.analysis.tokenattributes.CharTermAttribute ta, int start, int len) Copy a substring of aCharTermAttributein the buffer.- Parameters:
ta- a termAttributeImpl.start- offset in char[] of the term.len- amount of char to copy.- Returns:
- this.
-
copy
Copy UTF-8 bytesBytesRefin the char[] buffer. Used by Alix to test UTF-8 bytes against chars[] stores in HashMapFrDics- Parameters:
bytes- UTF8 as bytes.- Returns:
- this.
-
copyTo
public void copyTo(org.apache.lucene.util.AttributeImpl target) - Specified by:
copyToin classorg.apache.lucene.util.AttributeImpl
-
copyTo
public void copyTo(org.apache.lucene.analysis.tokenattributes.CharTermAttribute target) Copy chars from this attribute to an other.- Parameters:
target- destination attribute.
-
copyBuffer
public final void copyBuffer(char[] buffer, int offset, int length) - Specified by:
copyBufferin interfaceorg.apache.lucene.analysis.tokenattributes.CharTermAttribute
-
endsWith
public boolean endsWith(char c) Test an ending char.- Parameters:
c- char to test.- Returns:
- true if last char == c, false otherwise.
-
endsWith
Test a suffix, char by char.- Parameters:
suffix- to test.- Returns:
- true if attribute ends by suffix, false otherwise.
-
equals
-
getBytesRef
public org.apache.lucene.util.BytesRef getBytesRef()- Specified by:
getBytesRefin interfaceorg.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute
-
hashCode
public int hashCode()Same hashCode() as a String computed as
usings[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
intarithmetic, wheres[i]is the ith character of the string,nis the length of the string, and^indicates exponentiation. (The hash value of the empty string is zero.) -
indexOf
public int indexOf(char c) First position of a char.- Parameters:
c- char to search.- Returns:
- -1 if not found or positive index if found
-
isEmpty
public final boolean isEmpty()Test if there is no chars registred.- Specified by:
isEmptyin interfaceCharSequence- Returns:
- true if empty, false otherwise.
-
lastChar
Get last char- Returns:
- last char.
- Throws:
ArrayIndexOutOfBoundsException- if there is no char
-
lastIndexOf
public int lastIndexOf(char c) Find index of last occurrence of a char.- Parameters:
c- char to search.- Returns:
- -1 if not found or positive index if found
-
length
public final int length()- Specified by:
lengthin interfaceCharSequence
-
mark
Record actual size of string to go back to this state with @see #rewind(), like @see java.io.Reader#mark(int).- Returns:
- this, for chaining.
-
resizeBuffer
public final char[] resizeBuffer(int newSize) - Specified by:
resizeBufferin interfaceorg.apache.lucene.analysis.tokenattributes.CharTermAttribute
-
reflectWith
public void reflectWith(org.apache.lucene.util.AttributeReflector reflector) - Specified by:
reflectWithin classorg.apache.lucene.util.AttributeImpl
-
rewind
Restore String size like it was recorded with last @see #mark(). If no mark has been set, nothing is done. Used mark is deleted, explicit @see #mark() is needed to record this state. Works a bit like @see java.io.Reader#reset() with a less confusing name.- Returns:
- this, for chaining.
-
rtrim
Delete spaces at end (right trim)- Returns:
this
-
rtrim
Delete different characters at the end (right trim)- Parameters:
spaces- char codes to delete- Returns:
this
-
setCharAt
public void setCharAt(int pos, char c) Change a char at a specific position.- Parameters:
pos- position to change.c- new char value.
-
setEmpty
- Specified by:
setEmptyin interfaceorg.apache.lucene.analysis.tokenattributes.CharTermAttribute
-
setLength
- Specified by:
setLengthin interfaceorg.apache.lucene.analysis.tokenattributes.CharTermAttribute
-
startsWith
public boolean startsWith(char c) Test a starting char.- Parameters:
c- char to test.- Returns:
- true if starting char == c, false otherwise.
-
startsWith
Test a prefix, char by char.- Parameters:
prefix- to test- Returns:
- true if attribute ends by suffix, false otherwise
-
subSequence
- Specified by:
subSequencein interfaceCharSequence
-
toLower
Convert all chars from the buffer to lower case. To avoid default JDK conversion, some efficiency come from tests with theChar.- Returns:
- this, for chaining.
-
toString
Returns solely the term text as specified by theCharSequenceinterface.- Specified by:
toStringin interfaceCharSequence- Overrides:
toStringin classObject
-
unmark
If we can’t remember if @see #mark() has been set, ensure, reset it.- Returns:
- this, for chaining.
-