public class PostingsHighlighter
extends java.lang.Object
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual
passages as if they were documents in this corpus. It uses a BreakIterator to find
passages in the text; by default it breaks using getSentenceInstance(Locale.ROOT). It then iterates in parallel (merge sorting by offset) through
the positions of all terms from the query, coalescing those hits that occur in a single passage
into a Passage, and then scores each Passage using a separate PassageScorer.
Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:
getBreakIterator(String): Customize how the text is divided into passages.
getScorer(String): Customize how passages are ranked.
getFormatter(String): Customize how snippets are formatted.
getIndexAnalyzer(String): Enable highlighting of MultiTermQuerys such as WildcardQuery.
WARNING: The code is very new and probably still has some exciting bugs!
Example usage:
// configure field with offsets at index time
FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
PostingsHighlighter highlighter = new PostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.search(query, n);
String highlights[] = highlighter.highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
| Modifier and Type | Class and Description |
|---|---|
private static class |
PostingsHighlighter.LimitedStoredFieldVisitor |
private static class |
PostingsHighlighter.OffsetsEnum |
| Modifier and Type | Field and Description |
|---|---|
static int |
DEFAULT_MAX_LENGTH
Default maximum content size to process.
|
private PassageFormatter |
defaultFormatter
Set the first time
getFormatter(java.lang.String) is called,
and then reused. |
private PassageScorer |
defaultScorer
Set the first time
getScorer(java.lang.String) is called,
and then reused. |
private static PostingsEnum |
EMPTY |
private static IndexSearcher |
EMPTY_INDEXSEARCHER
for rewriting: we don't want slow processing from MTQs
|
private int |
maxLength |
| Constructor and Description |
|---|
PostingsHighlighter()
Creates a new highlighter with
DEFAULT_MAX_LENGTH. |
PostingsHighlighter(int maxLength)
Creates a new highlighter, specifying maximum content length.
|
| Modifier and Type | Method and Description |
|---|---|
protected java.text.BreakIterator |
getBreakIterator(java.lang.String field)
Returns the
BreakIterator to use for
dividing text into passages. |
protected Passage[] |
getEmptyHighlight(java.lang.String fieldName,
java.text.BreakIterator bi,
int maxPassages)
Called to summarize a document when no hits were
found.
|
protected PassageFormatter |
getFormatter(java.lang.String field)
Returns the
PassageFormatter to use for
formatting passages into highlighted snippets. |
protected Analyzer |
getIndexAnalyzer(java.lang.String field)
Returns the analyzer originally used to index the content for
field. |
protected char |
getMultiValuedSeparator(java.lang.String field)
Returns the logical separator between values for multi-valued fields.
|
protected PassageScorer |
getScorer(java.lang.String field)
Returns the
PassageScorer to use for
ranking passages. |
java.lang.String[] |
highlight(java.lang.String field,
Query query,
IndexSearcher searcher,
TopDocs topDocs)
Highlights the top passages from a single field.
|
java.lang.String[] |
highlight(java.lang.String field,
Query query,
IndexSearcher searcher,
TopDocs topDocs,
int maxPassages)
Highlights the top-N passages from a single field.
|
private Passage[] |
highlightDoc(java.lang.String field,
BytesRef[] terms,
int contentLength,
java.text.BreakIterator bi,
int doc,
TermsEnum termsEnum,
PostingsEnum[] postings,
int n) |
private java.util.Map<java.lang.Integer,java.lang.Object> |
highlightField(java.lang.String field,
java.lang.String[] contents,
java.text.BreakIterator bi,
BytesRef[] terms,
int[] docids,
java.util.List<LeafReaderContext> leaves,
int maxPassages,
Query query) |
java.util.Map<java.lang.String,java.lang.String[]> |
highlightFields(java.lang.String[] fieldsIn,
Query query,
IndexSearcher searcher,
int[] docidsIn,
int[] maxPassagesIn)
Highlights the top-N passages from multiple fields,
for the provided int[] docids.
|
java.util.Map<java.lang.String,java.lang.String[]> |
highlightFields(java.lang.String[] fields,
Query query,
IndexSearcher searcher,
TopDocs topDocs)
Highlights the top passages from multiple fields.
|
java.util.Map<java.lang.String,java.lang.String[]> |
highlightFields(java.lang.String[] fields,
Query query,
IndexSearcher searcher,
TopDocs topDocs,
int[] maxPassages)
Highlights the top-N passages from multiple fields.
|
protected java.util.Map<java.lang.String,java.lang.Object[]> |
highlightFieldsAsObjects(java.lang.String[] fieldsIn,
Query query,
IndexSearcher searcher,
int[] docidsIn,
int[] maxPassagesIn)
Expert: highlights the top-N passages from multiple fields,
for the provided int[] docids, to custom Object as
returned by the
PassageFormatter. |
protected java.lang.String[][] |
loadFieldValues(IndexSearcher searcher,
java.lang.String[] fields,
int[] docids,
int maxLength)
Loads the String values for each field X docID to be
highlighted.
|
private static final IndexSearcher EMPTY_INDEXSEARCHER
public static final int DEFAULT_MAX_LENGTH
private final int maxLength
private PassageFormatter defaultFormatter
getFormatter(java.lang.String) is called,
and then reused.private PassageScorer defaultScorer
getScorer(java.lang.String) is called,
and then reused.private static final PostingsEnum EMPTY
public PostingsHighlighter()
DEFAULT_MAX_LENGTH.public PostingsHighlighter(int maxLength)
maxLength - maximum content size to process.java.lang.IllegalArgumentException - if maxLength is negative or Integer.MAX_VALUEprotected java.text.BreakIterator getBreakIterator(java.lang.String field)
BreakIterator to use for
dividing text into passages. This returns
BreakIterator.getSentenceInstance(Locale) by default;
subclasses can override to customize.protected PassageFormatter getFormatter(java.lang.String field)
PassageFormatter to use for
formatting passages into highlighted snippets. This
returns a new PassageFormatter by default;
subclasses can override to customize.protected PassageScorer getScorer(java.lang.String field)
PassageScorer to use for
ranking passages. This
returns a new PassageScorer by default;
subclasses can override to customize.public java.lang.String[] highlight(java.lang.String field,
Query query,
IndexSearcher searcher,
TopDocs topDocs)
throws java.io.IOException
field - field name to highlight.
Must have a stored string value and also be indexed with offsets.query - query to highlight.searcher - searcher that was previously used to execute the query.topDocs - TopDocs containing the summary result documents to highlight.topDocs.
If no highlights were found for a document, the
first sentence for the field will be returned.java.io.IOException - if an I/O error occurred during processingjava.lang.IllegalArgumentException - if field was indexed without
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETSpublic java.lang.String[] highlight(java.lang.String field,
Query query,
IndexSearcher searcher,
TopDocs topDocs,
int maxPassages)
throws java.io.IOException
field - field name to highlight.
Must have a stored string value and also be indexed with offsets.query - query to highlight.searcher - searcher that was previously used to execute the query.topDocs - TopDocs containing the summary result documents to highlight.maxPassages - The maximum number of top-N ranked passages used to
form the highlighted snippets.topDocs.
If no highlights were found for a document, the
first maxPassages sentences from the
field will be returned.java.io.IOException - if an I/O error occurred during processingjava.lang.IllegalArgumentException - if field was indexed without
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETSpublic java.util.Map<java.lang.String,java.lang.String[]> highlightFields(java.lang.String[] fields,
Query query,
IndexSearcher searcher,
TopDocs topDocs)
throws java.io.IOException
Conceptually, this behaves as a more efficient form of:
Map m = new HashMap();
for (String field : fields) {
m.put(field, highlight(field, query, searcher, topDocs));
}
return m;
fields - field names to highlight.
Must have a stored string value and also be indexed with offsets.query - query to highlight.searcher - searcher that was previously used to execute the query.topDocs - TopDocs containing the summary result documents to highlight.topDocs.
If no highlights were found for a document, the
first sentence from the field will be returned.java.io.IOException - if an I/O error occurred during processingjava.lang.IllegalArgumentException - if field was indexed without
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETSpublic java.util.Map<java.lang.String,java.lang.String[]> highlightFields(java.lang.String[] fields,
Query query,
IndexSearcher searcher,
TopDocs topDocs,
int[] maxPassages)
throws java.io.IOException
Conceptually, this behaves as a more efficient form of:
Map m = new HashMap();
for (String field : fields) {
m.put(field, highlight(field, query, searcher, topDocs, maxPassages));
}
return m;
fields - field names to highlight.
Must have a stored string value and also be indexed with offsets.query - query to highlight.searcher - searcher that was previously used to execute the query.topDocs - TopDocs containing the summary result documents to highlight.maxPassages - The maximum number of top-N ranked passages per-field used to
form the highlighted snippets.topDocs.
If no highlights were found for a document, the
first maxPassages sentences from the
field will be returned.java.io.IOException - if an I/O error occurred during processingjava.lang.IllegalArgumentException - if field was indexed without
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETSpublic java.util.Map<java.lang.String,java.lang.String[]> highlightFields(java.lang.String[] fieldsIn,
Query query,
IndexSearcher searcher,
int[] docidsIn,
int[] maxPassagesIn)
throws java.io.IOException
fieldsIn - field names to highlight.
Must have a stored string value and also be indexed with offsets.query - query to highlight.searcher - searcher that was previously used to execute the query.docidsIn - containing the document IDs to highlight.maxPassagesIn - The maximum number of top-N ranked passages per-field used to
form the highlighted snippets.docidsIn.
If no highlights were found for a document, the
first maxPassages from the field will
be returned.java.io.IOException - if an I/O error occurred during processingjava.lang.IllegalArgumentException - if field was indexed without
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETSprotected java.util.Map<java.lang.String,java.lang.Object[]> highlightFieldsAsObjects(java.lang.String[] fieldsIn,
Query query,
IndexSearcher searcher,
int[] docidsIn,
int[] maxPassagesIn)
throws java.io.IOException
PassageFormatter. Use
this API to render to something other than String.fieldsIn - field names to highlight.
Must have a stored string value and also be indexed with offsets.query - query to highlight.searcher - searcher that was previously used to execute the query.docidsIn - containing the document IDs to highlight.maxPassagesIn - The maximum number of top-N ranked passages per-field used to
form the highlighted snippets.docidsIn.
If no highlights were found for a document, the
first maxPassages from the field will
be returned.java.io.IOException - if an I/O error occurred during processingjava.lang.IllegalArgumentException - if field was indexed without
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETSprotected java.lang.String[][] loadFieldValues(IndexSearcher searcher, java.lang.String[] fields, int[] docids, int maxLength) throws java.io.IOException
java.io.IOExceptionprotected char getMultiValuedSeparator(java.lang.String field)
U+2029 PARAGRAPH SEPARATOR (PS)
if each value holds a discrete passage for highlighting.protected Analyzer getIndexAnalyzer(java.lang.String field)
field.
This is used to highlight some MultiTermQueries.
private java.util.Map<java.lang.Integer,java.lang.Object> highlightField(java.lang.String field,
java.lang.String[] contents,
java.text.BreakIterator bi,
BytesRef[] terms,
int[] docids,
java.util.List<LeafReaderContext> leaves,
int maxPassages,
Query query)
throws java.io.IOException
java.io.IOExceptionprivate Passage[] highlightDoc(java.lang.String field, BytesRef[] terms, int contentLength, java.text.BreakIterator bi, int doc, TermsEnum termsEnum, PostingsEnum[] postings, int n) throws java.io.IOException
java.io.IOExceptionprotected Passage[] getEmptyHighlight(java.lang.String fieldName, java.text.BreakIterator bi, int maxPassages)
maxPassages sentences; subclasses can override
to customize.