public class ThaiTokenizer extends SegmentingTokenizerBase
BreakIterator to tokenize Thai text.
WARNING: this tokenizer may not be supported by all JREs. It is known to work with Sun/Oracle and Harmony JREs. If your application needs to be fully portable, consider using ICUTokenizer instead, which uses an ICU Thai BreakIterator that will always be available.
AttributeSource.State| Modifier and Type | Field and Description |
|---|---|
static boolean |
DBBI_AVAILABLE
True if the JRE supports a working dictionary-based breakiterator for Thai.
|
private OffsetAttribute |
offsetAtt |
private static java.text.BreakIterator |
proto |
(package private) int |
sentenceEnd |
private static java.text.BreakIterator |
sentenceProto
used for breaking the text into sentences
|
(package private) int |
sentenceStart |
private CharTermAttribute |
termAtt |
private java.text.BreakIterator |
wordBreaker |
private CharArrayIterator |
wrapper |
buffer, BUFFERMAX, offsetDEFAULT_TOKEN_ATTRIBUTE_FACTORY| Constructor and Description |
|---|
ThaiTokenizer()
Creates a new ThaiTokenizer
|
ThaiTokenizer(AttributeFactory factory)
Creates a new ThaiTokenizer, supplying the AttributeFactory
|
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
incrementWord()
Returns true if another word is available
|
protected void |
setNextSentence(int sentenceStart,
int sentenceEnd)
Provides the next input sentence for analysis
|
end, incrementToken, isSafeEnd, resetclose, correctOffset, setReaderaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toStringpublic static final boolean DBBI_AVAILABLE
private static final java.text.BreakIterator proto
private static final java.text.BreakIterator sentenceProto
private final java.text.BreakIterator wordBreaker
private final CharArrayIterator wrapper
int sentenceStart
int sentenceEnd
private final CharTermAttribute termAtt
private final OffsetAttribute offsetAtt
public ThaiTokenizer()
public ThaiTokenizer(AttributeFactory factory)
protected void setNextSentence(int sentenceStart,
int sentenceEnd)
SegmentingTokenizerBasesetNextSentence in class SegmentingTokenizerBaseprotected boolean incrementWord()
SegmentingTokenizerBaseincrementWord in class SegmentingTokenizerBase