NERsuite
1.1.1
|
#include <tokenizer.h>
Public Member Functions | |
int | tokenize (const std::string &raw_sent, V2_STR &data, size_t init_offset=0) |
Tokenizer Utility
int NER::Tokenizer::tokenize | ( | const std::string & | raw_sent, |
V2_STR & | data, | ||
size_t | init_offset = 0 |
||
) |
Tokenize input text and generate the (begin_pos, end_pos, token) triplet for each token.
[in] | raw_sent | Input sentence |
[in,out] | data | A vector to store the result |
[in] | init_offset | The offset value added to all begin_pos |