All Classes and Interfaces (alix 0.9.2-SNAPSHOT API)

Class

Description

An Alix object is a wrapper around a Lucene index with lexical tools, to be shared across a complex application (ex: web servlet).

Alix.FSDirectoryType

Ways to open a lucene index

From HTML data, Populate a lucene/alix document ready to index, with right fields names and types.

An XML parser allowing to index XML or HTMTL.

A txt indexer

Analysis scenario for French in Alix.

Analysis scenario for French in Alix.

Analysis scenario for French in Alix.

An Analyzer for metadata.

Analysis scenario for French in Alix.

Analysis scenario for French in Alix.

A fixed size collection of lucene AttributeSource, allowing insertion and removal at both ends, a bit like a Deque.

A kind of LinkedList of reusable attributes, without AttributeImpl.clone() and other creation of objects, for efficiency.

AttributeFactoryAlix

An attribute factory to use a CharsAttImpl for a CharTermAttribute.

Adhoc tool to extract Names.

Data structure to write and read ints in a binary form suited for lucene stored field StoredField(String, BytesRef), Document.getBinaryValue(String) or binary fields BinaryDocValuesField, BinaryDocValues.

Data structure to write and read unsigned bytes (0-255) in a binary form suited for lucene stored field StoredField(String, BytesRef), Document.getBinaryValue(String) or binary fields BinaryDocValuesField, BinaryDocValues.

Collect found documents as a set of docids in a bitSet.

BitsCollectorManager

Not yet tested

A search giving results as bits.

https://nic.schraudolph.org/pubs/Schraudolph99.pdf https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/math/SloppyMath.java

A mutable string implementation that grows on the right (Appendable, but also, on the left Chain.prepend(char).

Efficient character categorizer, faster than Character.is*(), optimized for tokenizer in latin scripts.

An implementation of Lucene CharTermAttribute designed to be an efficient key in an HashMap, and with tools for char manipulation (ex: capitalize).

Machine-dependent arithmetic constants.

A light fast csv parser without Strings, especially to load jar resources.

Static tools to deal with directories (and files).

Distribution laws, could be used as scorer to sort FormEnum.

Tools to display a document

Get stats from a FieldQuery in the context of the package to access to protected methods.

List of native double numbers.

An edge between 2 int nodes with a score, optimized to be sorted by score in an Array, or to be a value in HashMap.

Iterator of edges backed on a sorted array.

An object to record Edges events between int id nodes.

A matrix to record edges between predefined node Ids.

An object to record edges events (a pair of int), recorder as list of long.

Error messages for Alix.

Defines methods for factorials.

A dedicated dictionary for stats on facets.

Retrieve all values of an int field, store it in docId order, calculate some stats..

Persistent storage of full sequence of all document search for a field.

An object recording stats for an indexed and tokenized lucene field TextField.

FilterAposHyphenFr

A filter that decomposes words on a list of suffixes and prefixes, mainly to handle hyphenation and apostrophe ellision in French.

A final token filter before indexation, to plug after a lemmatizer filter, providing most significant tokens for word cloud.

A token Filter to plug after a Lemmatizer.

FilterFrPersname

Plug behind a linguistic tagger, will concat unknown names from dictionaries like Victor Hugo, V.

A final token filter before indexation, to plug after a lemmatizer filter, providing most significant tokens for word cloud.

FilterLemmatize

A lucene token filter adding other channels to the token stream an orthographic form (normalized) in a OrthAtt a lemma in a LemAtt a pos as a lucene int flag FlagsAttribute (according to the semantic of Tag

Plug behind TokenLem, take a Trie dictionary, and try to compound locutions.

A final token filter before indexation, to plug after a lemmatizer filter, providing most significant tokens for word cloud.

A token filter counting tokens.

Calculate Fisher's exact test for a 2x2 frequency table.

This implementation of the FormIterator contract allow to build custom lists of forms with freq (occurrences count) and score (a double) calculated else where.

FormCollector.FormStats

Some stats about a form.

This object is outputed by an Alix field FieldCharsAbstract, to provide list of terms with stats, for example for queries. according to filters or queries; calculated by th

A contract to loop on a list of forms, accessing to different properties.

FormIterator.Order

Possible sort order.

Preloaded word List for lucene indexation in HashMap.

FrDics.LexEntry

An entry for a dictionary te get lemma from an inflected form.

Gamma distribution functions.

Taken from jdk7 if there is one day a problem with windows globs in Dir.ls(String).

HiliteFormatter

Creates a formatted snippet from the top passages.

Index words and lemma a set of html files in an SQLite base

A mutable list of ints.

A mutable pair of ints.

An IntPair mutable, for example as a testing key in a HashSet

An efficient rolling array of int without creation of elements.

A mutable list of ints with useful metadata, for example to calculate average.

A non mutable list of ints, designed to be a good key in an Hashmap.

Resolve XSL url fro a jar.

Jsp toolbox.

Lucene token attribute for lemma event.

Custom CharTermAttribute used to normalize a lemma form of a token.

Automates tabs link in a navigation bar.

Load an XML/TEI corpus in a custom Lucene index for Alix.

A fixed list of longs, useful in arrays to be sorted.

A light hiliter using a Lucene analyzer and a compiled automaton, designed for short texts (ex: show found words when searching in titles).

Some algorithms to score co-occurrency.

List of mime types.

Some useful tools to deal with “Markup Languages” (xml, but also html tag soup)

Murmur3A (murmurhash3_x86_32) Source: https://github.com/greenrobot/essentials/blob/master/java-essentials/src/main/java/org/greenrobot/essentials/hash/Murmur3A.java

List of static names for Alix.

Data structure to write and read the “offsets” of a document.

An html <option>.

A recast of CharsAtt

Custom CharTermAttribute used to normalize an orthographic form of a token.

A fixed size queue, rolling if full.

Efficient Object to handle a sliding window, on different types, works like a circular array.

An object to record coordinates, a pair of ints (row, col).

Handle data to display results as a chronology, according to subset of an index, given as a bitset.

A row of data for a crossing axis

Implementation of a Chi2 Scoring with negative scores to get the most repulsed doc from a search.

SimilarityChi2inv

Implementation of a Chi2 Scoring with negative scores to get the most repulsed doc from a search.

Very simple scorer by freq, intersting for testing.

Implementation of a G-test Scoring with negative scores to get the most repulsed doc from a search.

SimilarityGsimple

Implementation of a G-test Scoring with negative scores to get the most repulsed doc from a search.

Build an efficient suggestion of words.

SuggestForm.Suggestion

Content of a suggested form.

Jeu d’étiquettes morphosyntaxique pour le français.

A vector of 256 boolean positions to record different flags.

A tokenizer for latin script languages and possible XML like tags.

A queue to select the top elements according to a float score.

A mutable pair (rank, Object), used in the data array of the top queue.

A queue to select the top elements from a score array where index is a kind of id, and value is a score.

TopArray.IdScore

A mutable pair (id, score), sortable on score only, used as cells in arrays.

WordsAutomatonBuilder

Because Lucene DaciukMihovAutomatonBuilder is final and can't be extended...

A worker for parallel lucene indexing.