com.jalios.jcms.search
Class NGramFingerPrint

java.lang.Object
  extended by java.util.Dictionary<K,V>
      extended by java.util.Hashtable<String,Integer>
          extended by com.jalios.jcms.search.NGramFingerPrint
All Implemented Interfaces:
Serializable, Cloneable, Map<String,Integer>

public class NGramFingerPrint
extends Hashtable<String,Integer>

A FingerPrint maps so called NGrams to their number of occurences in the corresponding text. It is able to categorize itself by comparing its FingerPrint with the FingerPrints of a collection of categories. See sdair-94-bc.pdf in the doc direcory of the jar-file for more information.

See Also:
Serialized Form

Constructor Summary
NGramFingerPrint()
           
 
Method Summary
 Map<String,Integer> categorize(Collection<NGramFingerPrint> categories)
          categorizes the FingerPrint by computing the distance to the FingerPrints in the passed Collection.
 void create(String text)
          fills the FingerPrint with all the NGrams and their numer of occurences in the passed text.
 String getCategory()
          returns the category of the FingerPrint or "unknown" if the FingerPrint wasn't categorized yet.
 Map<String,Integer> getCategoryDistances()
           
 int getPosition(String key)
          gets the position of the NGram passed to method in the FingerPrint.
 void load(String ngram)
           
protected  void setCategory(String category)
          sets the category of the FingerPrint
 String toString()
          returns the FingerPrint as a String in the FingerPrint file-format
 
Methods inherited from class java.util.Hashtable
clear, clone, contains, containsKey, containsValue, elements, entrySet, equals, get, hashCode, isEmpty, keys, keySet, put, putAll, rehash, remove, size, values
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

NGramFingerPrint

public NGramFingerPrint()
Method Detail

load

public void load(String ngram)

create

public void create(String text)
fills the FingerPrint with all the NGrams and their numer of occurences in the passed text.

Parameters:
text - text to be analysed

categorize

public Map<String,Integer> categorize(Collection<NGramFingerPrint> categories)
categorizes the FingerPrint by computing the distance to the FingerPrints in the passed Collection. the category of the FingerPrint with the lowest distance is assigned to this FingerPrint.

Parameters:
categories -

getCategoryDistances

public Map<String,Integer> getCategoryDistances()

getPosition

public int getPosition(String key)
gets the position of the NGram passed to method in the FingerPrint. the NGrams are in descending order according to the number of occurences in the text which was used creating the FingerPrint.

Parameters:
key - the NGram
Returns:
the position of the NGram in the FingerPrint

getCategory

public String getCategory()
returns the category of the FingerPrint or "unknown" if the FingerPrint wasn't categorized yet.

Returns:
the category of the FingerPrint

toString

public String toString()
returns the FingerPrint as a String in the FingerPrint file-format

Overrides:
toString in class Hashtable<String,Integer>

setCategory

protected void setCategory(String category)
sets the category of the FingerPrint

Parameters:
category - the category


Copyright © 2001-2010 Jalios SA. All Rights Reserved.