class WordSegmenter
extends java.lang.Object
| Modifier and Type | Field and Description |
|---|---|
private HHMMSegmenter |
hhmmSegmenter |
private SegTokenFilter |
tokenFilter |
| Constructor and Description |
|---|
WordSegmenter() |
| Modifier and Type | Method and Description |
|---|---|
SegToken |
convertSegToken(SegToken st,
java.lang.String sentence,
int sentenceStartOffset)
Process a
SegToken so that it is ready for indexing. |
java.util.List<SegToken> |
segmentSentence(java.lang.String sentence,
int startOffset)
Segment a sentence into words with
HHMMSegmenter |
private HHMMSegmenter hhmmSegmenter
private SegTokenFilter tokenFilter
public java.util.List<SegToken> segmentSentence(java.lang.String sentence, int startOffset)
HHMMSegmentersentence - input sentencestartOffset - start offset of sentenceList of SegTokenpublic SegToken convertSegToken(SegToken st, java.lang.String sentence, int sentenceStartOffset)
SegToken so that it is ready for indexing.
This method calculates offsets and normalizes the token with SegTokenFilter.