final class IndexedDISI extends DocIdSetIterator
DocIdSetIterator which can return
the index of the current document, i.e. the ordinal of the current document
among the list of documents that this iterator can return. This is useful
to implement sparse doc values by only having to encode values for documents
that actually have a value.
Implementation-wise, this DocIdSetIterator is inspired of
roaring bitmaps and encodes ranges of 65536
documents independently and picks between 3 encodings depending on the
density of the range:
ALL if the range contains 65536 documents exactly,
DENSE if the range contains 4096 documents or more; in that
case documents are stored in a bit set,
SPARSE otherwise, and the lower 16 bits of the doc IDs are
stored in a short.
Only ranges that contain at least one value are encoded.
This implementation uses 6 bytes per document in the worst-case, which happens in the case that all ranges contain exactly one document.
| Modifier and Type | Class and Description |
|---|---|
(package private) static class |
IndexedDISI.Method |
| Modifier and Type | Field and Description |
|---|---|
private int |
block |
private long |
blockEnd |
private long |
cost |
private int |
doc |
(package private) boolean |
exists |
private int |
gap |
private int |
index |
(package private) static int |
MAX_ARRAY_LENGTH |
(package private) IndexedDISI.Method |
method |
private int |
nextBlockIndex |
private int |
numberOfOnes |
private IndexInput |
slice
The slice that stores the
DocIdSetIterator. |
private long |
word |
private int |
wordIndex |
NO_MORE_DOCS| Constructor and Description |
|---|
IndexedDISI(IndexInput slice,
long cost) |
IndexedDISI(IndexInput in,
long offset,
long length,
long cost) |
| Modifier and Type | Method and Description |
|---|---|
int |
advance(int target)
Advances to the first beyond the current whose document number is greater
than or equal to target, and returns the document number itself.
|
private void |
advanceBlock(int targetBlock) |
boolean |
advanceExact(int target) |
long |
cost()
Returns the estimated cost of this
DocIdSetIterator. |
int |
docID()
Returns the following:
-1 if DocIdSetIterator.nextDoc() or
DocIdSetIterator.advance(int) were not called yet. |
private static void |
flush(int block,
FixedBitSet buffer,
int cardinality,
IndexOutput out) |
int |
index() |
int |
nextDoc()
Advances to the next document in the set and returns the doc it is
currently on, or
DocIdSetIterator.NO_MORE_DOCS if there are no more docs in the
set.NOTE: after the iterator has exhausted you should not call this method, as it may result in unpredicted behavior. |
private void |
readBlockHeader() |
(package private) static void |
writeBitSet(DocIdSetIterator it,
IndexOutput out) |
all, empty, range, slowAdvancestatic final int MAX_ARRAY_LENGTH
private final IndexInput slice
DocIdSetIterator.private final long cost
private int block
private long blockEnd
private int nextBlockIndex
IndexedDISI.Method method
private int doc
private int index
boolean exists
private long word
private int wordIndex
private int numberOfOnes
private int gap
IndexedDISI(IndexInput in, long offset, long length, long cost) throws java.io.IOException
java.io.IOExceptionIndexedDISI(IndexInput slice, long cost) throws java.io.IOException
java.io.IOExceptionprivate static void flush(int block,
FixedBitSet buffer,
int cardinality,
IndexOutput out)
throws java.io.IOException
java.io.IOExceptionstatic void writeBitSet(DocIdSetIterator it, IndexOutput out) throws java.io.IOException
java.io.IOExceptionpublic int docID()
DocIdSetIterator-1 if DocIdSetIterator.nextDoc() or
DocIdSetIterator.advance(int) were not called yet.
DocIdSetIterator.NO_MORE_DOCS if the iterator has exhausted.
docID in class DocIdSetIteratorpublic int advance(int target)
throws java.io.IOException
DocIdSetIteratorDocIdSetIterator.NO_MORE_DOCS if target
is greater than the highest document number in the set.
The behavior of this method is undefined when called with
target ≤ current, or after the iterator has exhausted.
Both cases may result in unpredicted behavior.
When target > current it behaves as if written:
int advance(int target) {
int doc;
while ((doc = nextDoc()) < target) {
}
return doc;
}
Some implementations are considerably more efficient than that.
NOTE: this method may be called with DocIdSetIterator.NO_MORE_DOCS for
efficiency by some Scorers. If your implementation cannot efficiently
determine that it should exhaust, it is recommended that you check for that
value in each call to this method.
advance in class DocIdSetIteratorjava.io.IOExceptionpublic boolean advanceExact(int target)
throws java.io.IOException
java.io.IOExceptionprivate void advanceBlock(int targetBlock)
throws java.io.IOException
java.io.IOExceptionprivate void readBlockHeader()
throws java.io.IOException
java.io.IOExceptionpublic int nextDoc()
throws java.io.IOException
DocIdSetIteratorDocIdSetIterator.NO_MORE_DOCS if there are no more docs in the
set.nextDoc in class DocIdSetIteratorjava.io.IOExceptionpublic int index()
public long cost()
DocIdSetIteratorDocIdSetIterator.
This is generally an upper bound of the number of documents this iterator might match, but may be a rough heuristic, hardcoded value, or otherwise completely inaccurate.
cost in class DocIdSetIterator