public class HyphenationCompoundWordTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware
HyphenationCompoundWordTokenFilter.
This factory accepts the following parameters:
hyphenator (mandatory): path to the FOP xml hyphenation pattern.
See http://offo.sourceforge.net/hyphenation/.
encoding (optional): encoding of the xml hyphenation file. defaults to UTF-8.
dictionary (optional): dictionary of words. defaults to no dictionary.
minWordSize (optional): minimal word length that gets decomposed. defaults to 5.
minSubwordSize (optional): minimum length of subwords. defaults to 2.
maxSubwordSize (optional): maximum length of subwords. defaults to 15.
onlyLongestMatch (optional): if true, adds only the longest matching subword
to the stream. defaults to false.
<fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8"
dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/>
</analyzer>
</fieldType>HyphenationCompoundWordTokenFilter| Modifier and Type | Field and Description |
|---|---|
private java.lang.String |
dictFile |
private org.apache.lucene.analysis.CharArraySet |
dictionary |
private java.lang.String |
encoding |
private java.lang.String |
hypFile |
private org.apache.lucene.analysis.compound.hyphenation.HyphenationTree |
hyphenator |
private int |
maxSubwordSize |
private int |
minSubwordSize |
private int |
minWordSize |
private boolean |
onlyLongestMatch |
logargs, luceneMatchVersion| Constructor and Description |
|---|
HyphenationCompoundWordTokenFilterFactory() |
| Modifier and Type | Method and Description |
|---|---|
org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter |
create(org.apache.lucene.analysis.TokenStream input)
Transform the specified input TokenStream
|
void |
inform(ResourceLoader loader) |
void |
init(java.util.Map<java.lang.String,java.lang.String> args)
init will be called just once, immediately after creation. |
assureMatchVersion, getArgs, getBoolean, getBoolean, getInt, getInt, getInt, getSnowballWordSet, getWordSet, warnDeprecatedclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetArgsprivate org.apache.lucene.analysis.CharArraySet dictionary
private org.apache.lucene.analysis.compound.hyphenation.HyphenationTree hyphenator
private java.lang.String dictFile
private java.lang.String hypFile
private java.lang.String encoding
private int minWordSize
private int minSubwordSize
private int maxSubwordSize
private boolean onlyLongestMatch
public HyphenationCompoundWordTokenFilterFactory()
public void init(java.util.Map<java.lang.String,java.lang.String> args)
TokenFilterFactoryinit will be called just once, immediately after creation.
The args are user-level initialization parameters that may be specified when declaring the factory in the schema.xml
init in interface TokenFilterFactoryinit in class BaseTokenStreamFactorypublic void inform(ResourceLoader loader)
inform in interface ResourceLoaderAwarepublic org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter create(org.apache.lucene.analysis.TokenStream input)
TokenFilterFactorycreate in interface TokenFilterFactory