public final class FuzzyTermsEnum extends BaseTermsEnum
Term enumerations are always ordered by
BytesRef.compareTo(org.apache.lucene.util.BytesRef). Each term in the enumeration is
greater than all that precede it.
| Modifier and Type | Class and Description |
|---|---|
static interface |
FuzzyTermsEnum.LevenshteinAutomataAttribute
reuses compiled automata across different segments,
because they are independent of the index
|
static class |
FuzzyTermsEnum.LevenshteinAutomataAttributeImpl
Stores compiled automata as a list (indexed by edit distance)
|
TermsEnum.SeekStatus| Modifier and Type | Field and Description |
|---|---|
private TermsEnum |
actualEnum |
private CompiledAutomaton[] |
automata |
private BoostAttribute |
boostAtt |
private float |
bottom |
private BytesRef |
bottomTerm |
private FuzzyTermsEnum.LevenshteinAutomataAttribute |
dfaAtt |
private MaxNonCompetitiveBoostAttribute |
maxBoostAtt |
private int |
maxEdits |
private BytesRef |
queuedBottom |
(package private) int |
realPrefixLength |
(package private) Term |
term |
(package private) int |
termLength |
(package private) Terms |
terms |
(package private) int[] |
termText |
(package private) boolean |
transpositions |
| Constructor and Description |
|---|
FuzzyTermsEnum(Terms terms,
AttributeSource atts,
Term term,
int maxEdits,
int prefixLength,
boolean transpositions)
Constructor for enumeration of all terms from specified
reader which share a prefix of
length prefixLength with term and which have at most maxEdits edits. |
| Modifier and Type | Method and Description |
|---|---|
private void |
bottomChanged(BytesRef lastTerm)
fired when the max non-competitive boost has changed.
|
private static Automaton[] |
buildAutomata(int[] termText,
int prefixLength,
boolean transpositions,
int maxEdits) |
static Automaton |
buildAutomaton(java.lang.String text,
int prefixLength,
boolean transpositions,
int maxEdits)
Builds a binary Automaton to match a fuzzy term
|
int |
docFreq()
Returns the number of documents containing the current
term.
|
private TermsEnum |
getAutomatonEnum(int editDistance,
BytesRef lastTerm)
return an automata-based enum for matching up to editDistance from
lastTerm, if possible
|
ImpactsEnum |
impacts(int flags)
Return a
ImpactsEnum. |
private boolean |
matches(BytesRef termIn,
int k)
returns true if term is within k edits of the query term
|
BytesRef |
next()
Increments the iteration to the next
BytesRef in the iterator. |
long |
ord()
Returns ordinal position for current term.
|
PostingsEnum |
postings(PostingsEnum reuse,
int flags)
Get
PostingsEnum for the current term, with
control over whether freqs, positions, offsets or payloads
are required. |
TermsEnum.SeekStatus |
seekCeil(BytesRef text)
Seeks to the specified term, if it exists, or to the
next (ceiling) term.
|
boolean |
seekExact(BytesRef text)
Attempts to seek to the exact term, returning true if the term is found.
|
void |
seekExact(BytesRef term,
TermState state)
Expert: Seeks a specific position by
TermState previously obtained
from TermsEnum.termState(). |
void |
seekExact(long ord)
Seeks to the specified term by ordinal (position) as
previously returned by
TermsEnum.ord(). |
private static int[] |
stringToUTF32(java.lang.String text) |
BytesRef |
term()
Returns current term.
|
TermState |
termState()
Expert: Returns the TermsEnums internal state to position the TermsEnum
without re-seeking the term dictionary.
|
long |
totalTermFreq()
Returns the total number of occurrences of this term
across all documents (the sum of the freq() for each
doc that has this term).
|
attributesprivate TermsEnum actualEnum
private final BoostAttribute boostAtt
private final MaxNonCompetitiveBoostAttribute maxBoostAtt
private final FuzzyTermsEnum.LevenshteinAutomataAttribute dfaAtt
private float bottom
private BytesRef bottomTerm
private final CompiledAutomaton[] automata
private BytesRef queuedBottom
final int termLength
private int maxEdits
final Terms terms
final Term term
final int[] termText
final int realPrefixLength
final boolean transpositions
public FuzzyTermsEnum(Terms terms, AttributeSource atts, Term term, int maxEdits, int prefixLength, boolean transpositions) throws java.io.IOException
reader which share a prefix of
length prefixLength with term and which have at most maxEdits edits.
After calling the constructor the enumeration is already pointing to the first valid term if such a term exists.
terms - Delivers terms.atts - AttributeSource created by the rewrite method of MultiTermQuery
thats contains information about competitive boosts during rewrite. It is also used
to cache DFAs between segment transitions.term - Pattern term.maxEdits - Maximum edit distance.prefixLength - Length of required common prefix. Default value is 0.java.io.IOException - if there is a low-level IO errorpublic static Automaton buildAutomaton(java.lang.String text, int prefixLength, boolean transpositions, int maxEdits)
text - the term to matchprefixLength - length of a required common prefixtranspositions - true if transpositions should count as a single editmaxEdits - the maximum edit distance of matching termsprivate static int[] stringToUTF32(java.lang.String text)
private static Automaton[] buildAutomata(int[] termText, int prefixLength, boolean transpositions, int maxEdits)
private TermsEnum getAutomatonEnum(int editDistance, BytesRef lastTerm) throws java.io.IOException
java.io.IOExceptionprivate void bottomChanged(BytesRef lastTerm) throws java.io.IOException
java.io.IOExceptionpublic BytesRef next() throws java.io.IOException
BytesRefIteratorBytesRef in the iterator.
Returns the resulting BytesRef or null if the end of
the iterator is reached. The returned BytesRef may be re-used across calls
to next. After this method returns null, do not call it again: the results
are undefined.BytesRef in the iterator or null if
the end of the iterator is reached.java.io.IOException - If there is a low-level I/O error.private boolean matches(BytesRef termIn, int k)
public int docFreq()
throws java.io.IOException
TermsEnumTermsEnum.SeekStatus.END.public long totalTermFreq()
throws java.io.IOException
TermsEnumtotalTermFreq in class TermsEnumjava.io.IOExceptionpublic PostingsEnum postings(PostingsEnum reuse, int flags) throws java.io.IOException
TermsEnumPostingsEnum for the current term, with
control over whether freqs, positions, offsets or payloads
are required. Do not call this when the enum is
unpositioned. This method will not return null.
NOTE: the returned iterator may return deleted documents, so
deleted documents have to be checked on top of the PostingsEnum.
postings in class TermsEnumreuse - pass a prior PostingsEnum for possible reuseflags - specifies which optional per-document values
you require; see PostingsEnum.FREQSjava.io.IOExceptionpublic ImpactsEnum impacts(int flags) throws java.io.IOException
TermsEnumImpactsEnum.impacts in class TermsEnumjava.io.IOExceptionTermsEnum.postings(PostingsEnum, int)public void seekExact(BytesRef term, TermState state) throws java.io.IOException
TermsEnumTermState previously obtained
from TermsEnum.termState(). Callers should maintain the TermState to
use this method. Low-level implementations may position the TermsEnum
without re-seeking the term dictionary.
Seeking by TermState should only be used iff the state was obtained
from the same TermsEnum instance.
NOTE: Using this method with an incompatible TermState might leave
this TermsEnum in undefined state. On a segment level
TermState instances are compatible only iff the source and the
target TermsEnum operate on the same field. If operating on segment
level, TermState instances must not be used across segments.
NOTE: A seek by TermState might not restore the
AttributeSource's state. AttributeSource states must be
maintained separately if this method is used.
seekExact in class BaseTermsEnumterm - the term the TermState corresponds tostate - the TermStatejava.io.IOExceptionpublic TermState termState() throws java.io.IOException
TermsEnum
NOTE: A seek by TermState might not capture the
AttributeSource's state. Callers must maintain the
AttributeSource states separately
termState in class BaseTermsEnumjava.io.IOExceptionTermState,
TermsEnum.seekExact(BytesRef, TermState)public long ord()
throws java.io.IOException
TermsEnumUnsupportedOperationException). Do not call this
when the enum is unpositioned.public boolean seekExact(BytesRef text) throws java.io.IOException
TermsEnumTermsEnum.seekCeil(org.apache.lucene.util.BytesRef).
seekExact in class BaseTermsEnumjava.io.IOExceptionpublic TermsEnum.SeekStatus seekCeil(BytesRef text) throws java.io.IOException
TermsEnumpublic void seekExact(long ord)
throws java.io.IOException
TermsEnumTermsEnum.ord(). The target ord
may be before or after the current ord, and must be
within bounds.