public class PatternTokenizerFactory extends BaseTokenizerFactory
PatternTokenizer.
This tokenizer uses regex pattern matching to construct distinct tokens
for the input stream. It takes two arguments: "pattern" and "group".
group=-1 (the default) is equivalent to "split". In this case, the tokens will
be equivalent to the output from (without empty tokens):
String.split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc'the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
NOTE: This Tokenizer does not output tokens that are of zero length.
<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
</analyzer>
</fieldType>PatternTokenizer| Modifier and Type | Field and Description |
|---|---|
protected int |
group |
static java.lang.String |
GROUP |
protected java.util.regex.Pattern |
pattern |
static java.lang.String |
PATTERN |
logargs, luceneMatchVersion| Constructor and Description |
|---|
PatternTokenizerFactory() |
| Modifier and Type | Method and Description |
|---|---|
org.apache.lucene.analysis.Tokenizer |
create(java.io.Reader in)
Split the input using configured pattern
|
static java.util.List<org.apache.lucene.analysis.Token> |
group(java.util.regex.Matcher matcher,
java.lang.String input,
int group)
Deprecated.
|
void |
init(java.util.Map<java.lang.String,java.lang.String> args)
Require a configured pattern
|
static java.util.List<org.apache.lucene.analysis.Token> |
split(java.util.regex.Matcher matcher,
java.lang.String input)
Deprecated.
|
assureMatchVersion, getArgs, getBoolean, getBoolean, getInt, getInt, getInt, getSnowballWordSet, getWordSet, warnDeprecatedclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetArgspublic static final java.lang.String PATTERN
public static final java.lang.String GROUP
protected java.util.regex.Pattern pattern
protected int group
public void init(java.util.Map<java.lang.String,java.lang.String> args)
init in interface TokenizerFactoryinit in class BaseTokenStreamFactorypublic org.apache.lucene.analysis.Tokenizer create(java.io.Reader in)
@Deprecated
public static java.util.List<org.apache.lucene.analysis.Token> split(java.util.regex.Matcher matcher,
java.lang.String input)
@Deprecated
public static java.util.List<org.apache.lucene.analysis.Token> group(java.util.regex.Matcher matcher,
java.lang.String input,
int group)