NERsuite
1.1.1
|
This is the main module of NERsuite. More...
Classes | |
class | NER::SentenceTagger |
class | NER::FeatureExtractor |
class | NER::Suite |
struct | NER::COLUMN_INFO |
class | NER::Dictionary |
struct | NER::NE |
class | NER::nersuite_exception |
class | NER::OPTION_PARSER |
class | NER::Tokenizer |
Enumerations | |
enum | NER::NormalizeType { NER::NormalizeNone = 0, NER::NormalizeCase = 0x01, NER::NormalizeNumber = 0x02, NER::NormalizeSymbol = 0x04, NER::NormalizeToken = 0x08 } |
Functions | |
template<typename T1 > | |
int | NER::tokenize (T1 &V_STR, std::string &one_line, std::string del) |
int | NER::get_sent (std::istream &cin, V2_STR &one_sent) |
int | NER::get_sent (std::ifstream &cin, V2_STR &one_sent) |
This is the main module of NERsuite.
The following diagram shows the overall structure of NERsuite. The components "nersuite", "nersuite_dic_compiler", "nersuite_dic_tagger", and "nersuite_tokenizer" are command executables. These executables require the library "nersuite_common".
"nersuite" requires the external library "CRFsuite" (and "cqdb" which is then required by "CRFsuite") as a classifier.
"nersuite_common" requires the internal library "cdbpp".
enum NER::NormalizeType |
String normalization type definitions The members of this enum are used as bit flags, combined with OR operator and produce an int value.
NormalizeNone |
No Normalizatoin |
NormalizeCase |
Case Normalization |
NormalizeNumber |
Number Normalization |
NormalizeSymbol |
Symbol Normalization |
NormalizeToken |
Token Normalization |
int NER::get_sent | ( | std::istream & | cin, |
V2_STR & | one_sent | ||
) |
Read a sentence from a stream and construct Token List for the sentence.
Each line consists of tab-separated columns (label + feature list). First column is the label assigned for the feature list. The following columns are the feature list.
[in] | cin | Input stream (Each sentence must be separated with an empty line.) |
[in,out] | one_sent | Token List for a sentence. Any existing contents are cleared before processing. |
int NER::get_sent | ( | std::ifstream & | cin, |
V2_STR & | one_sent | ||
) |
Read a sentence from a file stream and construct Token List for the sentence.
Each line consists of tab-separated columns (label + feature list). First column is the label assigned for the feature list. The following columns are the feature list.
[in] | cin | Input stream (Each sentence must be separated with an empty line.) |
[in,out] | one_sent | Token List for a sentence. Any existing contents are cleared before processing. |
int NER::tokenize | ( | T1 & | V_STR, |
std::string & | one_line, | ||
std::string | del | ||
) |
Tokenize a string param[out] V_STR List of string where output is pushed. (Cleared before processing) param[in] one_line Input string representing a line of text param[out] del A delimiter