public class CachingNaiveBayesClassifier extends SimpleNaiveBayesClassifier
http://en.wikipedia.org/wiki/Naive_Bayes_classifier
This is NOT an online classifier.
| Modifier and Type | Field and Description |
|---|---|
private java.util.ArrayList<BytesRef> |
cclasses |
private java.util.Map<BytesRef,java.lang.Double> |
classTermFreq |
private int |
docsWithClassSize |
private boolean |
justCachedTerms |
private java.util.Map<java.lang.String,java.util.Map<BytesRef,java.lang.Integer>> |
termCClassHitCache |
analyzer, classFieldName, indexReader, indexSearcher, query, textFieldNames| Constructor and Description |
|---|
CachingNaiveBayesClassifier(IndexReader indexReader,
Analyzer analyzer,
Query query,
java.lang.String classFieldName,
java.lang.String... textFieldNames)
Creates a new NaiveBayes classifier with inside caching.
|
| Modifier and Type | Method and Description |
|---|---|
protected java.util.List<ClassificationResult<BytesRef>> |
assignClassNormalizedList(java.lang.String inputDocument)
Calculate probabilities for all classes for a given input text
|
private java.util.List<ClassificationResult<BytesRef>> |
calculateLogLikelihood(java.lang.String[] tokenizedText) |
private java.util.Map<BytesRef,java.lang.Integer> |
getWordFreqForClassess(java.lang.String word) |
void |
reInitCache(int minTermOccurrenceInCache,
boolean justCachedTerms)
This function is building the frame of the cache.
|
assignClass, countDocsWithClass, getClasses, getClasses, normClassificationResults, tokenizeprivate final java.util.ArrayList<BytesRef> cclasses
private final java.util.Map<java.lang.String,java.util.Map<BytesRef,java.lang.Integer>> termCClassHitCache
private final java.util.Map<BytesRef,java.lang.Double> classTermFreq
private boolean justCachedTerms
private int docsWithClassSize
public CachingNaiveBayesClassifier(IndexReader indexReader, Analyzer analyzer, Query query, java.lang.String classFieldName, java.lang.String... textFieldNames)
reInitCache().indexReader - the reader on the index to be used for classificationanalyzer - an Analyzer used to analyze unseen textquery - a Query to eventually filter the docs used for training the classifier, or null
if all the indexed docs should be usedclassFieldName - the name of the field used as the output for the classifiertextFieldNames - the name of the fields used as the inputs for the classifierprotected java.util.List<ClassificationResult<BytesRef>> assignClassNormalizedList(java.lang.String inputDocument) throws java.io.IOException
SimpleNaiveBayesClassifierassignClassNormalizedList in class SimpleNaiveBayesClassifierinputDocument - the input text as a StringList of ClassificationResult, one for each existing classjava.io.IOException - if assigning probabilities failsprivate java.util.List<ClassificationResult<BytesRef>> calculateLogLikelihood(java.lang.String[] tokenizedText) throws java.io.IOException
java.io.IOExceptionprivate java.util.Map<BytesRef,java.lang.Integer> getWordFreqForClassess(java.lang.String word) throws java.io.IOException
java.io.IOExceptionpublic void reInitCache(int minTermOccurrenceInCache,
boolean justCachedTerms)
throws java.io.IOException
minTermOccurrenceInCache - Lower cache size with higher value.justCachedTerms - The switch for fully exclude low occurrence docs.java.io.IOException - If there is a low-level I/O error.