All Classes and Interfaces
Class
Description
An Alix object is a wrapper around a Lucene index with lexical tools, to be
shared across a complex application (ex: web servlet).
Ways to open a lucene index
From HTML data, Populate a lucene/alix document ready to index, with right
fields names and types.
An XML parser allowing to index XML or HTMTL.
A txt indexer
Analysis scenario for French in Alix.
Analysis scenario for French in Alix.
Analysis scenario for French in Alix.
An Analyzer for metadata.
Analysis scenario for French in Alix.
Analysis scenario for French in Alix.
A fixed size collection of lucene
AttributeSource,
allowing insertion and removal at both ends,
a bit like a Deque.A kind of
LinkedList of reusable attributes, without AttributeImpl.clone()
and other creation of objects, for efficiency.An attribute factory to use a
CharsAttImpl for a CharTermAttribute.Adhoc tool to extract Names.
Data structure to write and read ints in a binary form suited for lucene
stored field
StoredField(String, BytesRef),
Document.getBinaryValue(String) or binary fields
BinaryDocValuesField, BinaryDocValues.Data structure to write and read unsigned bytes (0-255) in a binary form
suited for lucene stored field
StoredField(String, BytesRef),
Document.getBinaryValue(String) or binary fields
BinaryDocValuesField, BinaryDocValues.Collect found documents as a set of docids in a bitSet.
Not yet tested
A search giving results as bits.
https://nic.schraudolph.org/pubs/Schraudolph99.pdf
https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/math/SloppyMath.java
A mutable string implementation that grows on the right (
Appendable,
but also, on the left Chain.prepend(char).
Efficient character categorizer, faster than Character.is*(), optimized for
tokenizer in latin scripts.
An implementation of Lucene
CharTermAttribute designed to be an
efficient key in an HashMap, and with tools for char manipulation (ex:
capitalize).Machine-dependent arithmetic constants.
A light fast csv parser without Strings, especially to load jar resources.
Static tools to deal with directories (and files).
Distribution laws, could be used as scorer to sort
FormEnum.Tools to display a document
Get stats from a
FieldQuery in the context of the package
to access to protected methods.List of native double numbers.
An edge between 2 int nodes with a score, optimized to be sorted by score in
an Array, or to be a value in HashMap.
Iterator of edges backed on a sorted array.
An object to record
Edges events between int id nodes.A matrix to record edges between predefined node Ids.
An object to record edges events (a pair of int), recorder as list of long.
Error messages for Alix.
Defines methods for factorials.
A dedicated dictionary for stats on facets.
Retrieve all values of an int field, store it in docId order, calculate some
stats..
Persistent storage of full sequence of all document search for a field.
An object recording stats for an indexed and tokenized lucene field
TextField.A filter that decomposes words on a list of suffixes and prefixes, mainly to handle
hyphenation and apostrophe ellision in French.
A final token filter before indexation, to plug after a lemmatizer filter,
providing most significant tokens for word cloud.
A token Filter to plug after a Lemmatizer.
Plug behind a linguistic tagger, will concat unknown names from dictionaries
like Victor Hugo, V.
A final token filter before indexation, to plug after a lemmatizer filter,
providing most significant tokens for word cloud.
Plug behind TokenLem, take a Trie dictionary, and try to compound locutions.
A final token filter before indexation, to plug after a lemmatizer filter,
providing most significant tokens for word cloud.
A token filter counting tokens.
Calculate Fisher's exact test for a 2x2 frequency table.
This implementation of the FormIterator contract allow to build custom lists
of forms with freq (occurrences count) and score (a double) calculated else
where.
Some stats about a form.
This object is outputed by an Alix field
FieldCharsAbstract, to provide
list of terms with stats, for example for queries. according to filters or queries; calculated by thA contract to loop on a list of forms, accessing to different properties.
Possible sort order.
Preloaded word List for lucene indexation in
HashMap.An entry for a dictionary te get lemma from
an inflected form.
Gamma distribution functions.
Taken from jdk7 if there is one day a problem with windows globs in
Dir.ls(String).Creates a formatted snippet from the top passages.
Index words and lemma a set of html files in an SQLite base
A mutable list of ints.
A mutable pair of ints.
An efficient rolling array of
int without creation of elements.A mutable list of ints with useful metadata, for example to calculate
average.
A non mutable list of ints, designed to be a good key in an Hashmap.
Resolve XSL url fro a jar.
Jsp toolbox.
Lucene token attribute for lemma event.
Custom
CharTermAttribute used to normalize a lemma form of a
token.Automates tabs link in a navigation bar.
Load an XML/TEI corpus in a custom Lucene index for Alix.
A fixed list of longs, useful in arrays to be sorted.
A light hiliter using a Lucene analyzer and a compiled automaton, designed
for short texts (ex: show found words when searching in titles).
Some algorithms to score co-occurrency.
List of mime types.
Some useful tools to deal with “Markup Languages” (xml, but also html tag
soup)
Murmur3A (murmurhash3_x86_32)
Source: https://github.com/greenrobot/essentials/blob/master/java-essentials/src/main/java/org/greenrobot/essentials/hash/Murmur3A.java
List of static names for Alix.
Data structure to write and read the “offsets” of a document.
An html <option>.
A recast of CharsAtt
Custom
CharTermAttribute used to normalize an orthographic form of a
token.A fixed size queue, rolling if full.
Efficient Object to handle a sliding window, on different types, works like a
circular array.
An object to record coordinates, a pair of ints (row, col).
Handle data to display results as a chronology, according to subset of an
index, given as a bitset.
A row of data for a crossing axis
Implementation of a Chi2 Scoring with negative scores to get the most
repulsed doc from a search.
Implementation of a Chi2 Scoring with negative scores to get the most
repulsed doc from a search.
Very simple scorer by freq, intersting for testing.
Implementation of a G-test Scoring with negative scores to get the most
repulsed doc from a search.
Implementation of a G-test Scoring with negative scores to get the most
repulsed doc from a search.
Build an efficient suggestion of words.
Content of a suggested form.
Jeu d’étiquettes morphosyntaxique pour le français.
A vector of 256 boolean positions to record different flags.
A tokenizer for latin script languages and possible XML like tags.
A queue to select the top elements according to a float score.
A mutable pair (rank, Object), used in the data array of the top queue.
A queue to select the top elements from a score array where index is a kind
of id, and value is a score.
A mutable pair (id, score), sortable on score only, used as cells in arrays.
Because Lucene DaciukMihovAutomatonBuilder is final and can't be extended...
A worker for parallel lucene indexing.