NERsuite  1.1.1
Classes | Enumerations | Functions
NERsuite

This is the main module of NERsuite. More...

Classes

class  NER::SentenceTagger
class  NER::FeatureExtractor
class  NER::Suite
struct  NER::COLUMN_INFO
class  NER::Dictionary
struct  NER::NE
class  NER::nersuite_exception
class  NER::OPTION_PARSER
class  NER::Tokenizer

Enumerations

enum  NER::NormalizeType {
  NER::NormalizeNone = 0, NER::NormalizeCase = 0x01, NER::NormalizeNumber = 0x02, NER::NormalizeSymbol = 0x04,
  NER::NormalizeToken = 0x08
}

Functions

template<typename T1 >
int NER::tokenize (T1 &V_STR, std::string &one_line, std::string del)
int NER::get_sent (std::istream &cin, V2_STR &one_sent)
int NER::get_sent (std::ifstream &cin, V2_STR &one_sent)

Detailed Description

This is the main module of NERsuite.

The following diagram shows the overall structure of NERsuite. The components "nersuite", "nersuite_dic_compiler", "nersuite_dic_tagger", and "nersuite_tokenizer" are command executables. These executables require the library "nersuite_common".

"nersuite" requires the external library "CRFsuite" (and "cqdb" which is then required by "CRFsuite") as a classifier.

"nersuite_common" requires the internal library "cdbpp".

nersuite.png
Component Diagram of NERsuite

Enumeration Type Documentation

String normalization type definitions The members of this enum are used as bit flags, combined with OR operator and produce an int value.

Enumerator:
NormalizeNone 

No Normalizatoin

NormalizeCase 

Case Normalization

NormalizeNumber 

Number Normalization

NormalizeSymbol 

Symbol Normalization

NormalizeToken 

Token Normalization


Function Documentation

int NER::get_sent ( std::istream &  cin,
V2_STR &  one_sent 
)

Read a sentence from a stream and construct Token List for the sentence.

Each line consists of tab-separated columns (label + feature list). First column is the label assigned for the feature list. The following columns are the feature list.

Parameters:
[in]cinInput stream (Each sentence must be separated with an empty line.)
[in,out]one_sentToken List for a sentence. Any existing contents are cleared before processing.
Returns:
count of tokens in the sentence
int NER::get_sent ( std::ifstream &  cin,
V2_STR &  one_sent 
)

Read a sentence from a file stream and construct Token List for the sentence.

Each line consists of tab-separated columns (label + feature list). First column is the label assigned for the feature list. The following columns are the feature list.

Parameters:
[in]cinInput stream (Each sentence must be separated with an empty line.)
[in,out]one_sentToken List for a sentence. Any existing contents are cleared before processing.
Returns:
count of tokens in the sentence
template<typename T1 >
int NER::tokenize ( T1 &  V_STR,
std::string &  one_line,
std::string  del 
)

Tokenize a string param[out] V_STR List of string where output is pushed. (Cleared before processing) param[in] one_line Input string representing a line of text param[out] del A delimiter

 All Classes Functions Variables