reflex::Pattern Class Reference

updated Tue Sep 10 2019 by Robert van Engelen
 
Classes | Public Types | Public Member Functions | Protected Member Functions | Private Types | Private Member Functions | Static Private Member Functions | Private Attributes | Friends | List of all members
reflex::Pattern Class Reference

Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine. More...

#include <pattern.h>

Collaboration diagram for reflex::Pattern:
Collaboration graph
[legend]

Classes

struct  Const
 Common constants. More...
 
struct  Option
 Global modifier modes, syntax flags, and compiler options. More...
 
struct  Position
 Finite state machine construction position information. More...
 
struct  State
 Finite state machine. More...
 

Public Types

typedef uint8_t Pred
 predict match bits More...
 
typedef uint16_t Hash
 hash type (uint16_t), may be narrowed to uint8_t when Const::HASH == 0x100 More...
 
typedef uint16_t Index
 index into opcodes array Pattern::opc_ and subpattern indexing More...
 
typedef uint32_t Opcode
 32 bit opcode word More...
 
typedef void(* FSM) (class Matcher &)
 

Public Member Functions

 Pattern ()
 Construct an unset pattern. More...
 
 Pattern (const char *regex, const char *options=NULL)
 Construct a pattern object given a regex string. More...
 
 Pattern (const char *regex, const std::string &options)
 Construct a pattern object given a regex string. More...
 
 Pattern (const std::string &regex, const char *options=NULL)
 Construct a pattern object given a regex string. More...
 
 Pattern (const std::string &regex, const std::string &options)
 Construct a pattern object given a regex string. More...
 
 Pattern (const Opcode *code, const uint8_t *pred=NULL)
 Construct a pattern object given an opcode table. More...
 
 Pattern (FSM fsm, const uint8_t *pred=NULL)
 Construct a pattern object given a function pointer to FSM code. More...
 
virtual ~Pattern ()
 Destructor, deletes internal code array when owned and allocated. More...
 
void clear ()
 Clear and delete pattern data. More...
 
Patternassign (const char *regex, const char *options=NULL)
 Assign a (new) pattern. More...
 
Patternassign (const char *regex, const std::string &options)
 Assign a (new) pattern. More...
 
Patternassign (const std::string &regex, const char *options=NULL)
 Assign a (new) pattern. More...
 
Patternassign (const std::string &regex, const std::string &options)
 Assign a (new) pattern. More...
 
Patternassign (const Opcode *code, const uint8_t *pred=NULL)
 Assign a (new) pattern. More...
 
Patternassign (FSM fsm, const uint8_t *pred=NULL)
 Assign a (new) pattern. More...
 
Patternoperator= (const Pattern &pattern)
 Assign a (new) pattern. More...
 
Patternoperator= (const char *regex)
 Assign a (new) pattern. More...
 
Patternoperator= (const std::string &regex)
 Assign a (new) pattern. More...
 
Patternoperator= (const Opcode *code)
 Assign a (new) pattern. More...
 
Patternoperator= (FSM fsm)
 Assign a (new) pattern. More...
 
Index size () const
 Number of subpatterns of this pattern object. More...
 
const std::string operator[] (Index choice) const
 Get subpattern regex of this pattern object or the whole regex with index 0. More...
 
bool reachable (Index choice) const
 Check is subpattern is reachable by a match. More...
 
size_t nodes () const
 Get the number of finite state machine nodes (vertices). More...
 
size_t edges () const
 Get the number of finite state machine edges (transitions on input characters). More...
 
size_t words () const
 Get the code size in number of words. More...
 
float parse_time () const
 Get elapsed regex parsing and analysis time. More...
 
float nodes_time () const
 Get elapsed DFA vertices construction time. More...
 
float edges_time () const
 Get elapsed DFA edges construction time. More...
 
float words_time () const
 Get elapsed code words assembly time. More...
 

Protected Member Functions

virtual void error (regex_error_type code, size_t pos=0) const
 Throw an error. More...
 

Private Types

enum  Meta {
  META_MIN = 0x100, META_NWB = 0x101, META_NWE = 0x102, META_BWB = 0x103,
  META_EWB = 0x104, META_BWE = 0x105, META_EWE = 0x106, META_BOL = 0x107,
  META_EOL = 0x108, META_BOB = 0x109, META_EOB = 0x10A, META_UND = 0x10B,
  META_IND = 0x10C, META_DED = 0x10D, META_MAX
}
 Meta characters. More...
 
typedef unsigned int Char
 
typedef ORanges< CharChars
 represent char (0-255 + meta chars) set as a set of ranges More...
 
typedef size_t Location
 
typedef ORanges< LocationLocations
 
typedef std::set< LocationSet
 
typedef std::map< int, LocationsMap
 
typedef std::set< PositionPositions
 
typedef std::map< Position, PositionsFollow
 
typedef std::pair< Chars, PositionsMove
 
typedef std::list< MoveMoves
 

Private Member Functions

void init (const char *options, const uint8_t *pred=NULL)
 Initialize the pattern at construction. More...
 
void init_options (const char *options)
 
void parse (Positions &startpos, Follow &followpos, Map &modifiers, Map &lookahead)
 
void parse1 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Positions &lazypos, Map &modifiers, Locations &lookahead, Index &iter)
 
void parse2 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Positions &lazypos, Map &modifiers, Locations &lookahead, Index &iter)
 
void parse3 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Positions &lazypos, Map &modifiers, Locations &lookahead, Index &iter)
 
void parse4 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Positions &lazypos, Map &modifiers, Locations &lookahead, Index &iter)
 
void parse_esc (Location &loc) const
 
void compile (State &start, Follow &followpos, const Map &modifiers, const Map &lookahead)
 
void lazy (const Positions &lazypos, Positions &pos) const
 
void lazy (const Positions &lazypos, const Positions &pos, Positions &pos1) const
 
void greedy (Positions &pos) const
 
void trim_lazy (Positions &pos) const
 
void compile_transition (State *state, Follow &followpos, const Map &modifiers, const Map &lookahead, Moves &moves) const
 
void transition (Moves &moves, const Chars &chars, const Positions &follow) const
 
Char compile_esc (Location loc, Chars &chars) const
 
void compile_list (Location loc, Chars &chars, const Map &modifiers) const
 
void posix (size_t index, Chars &chars) const
 
void flip (Chars &chars) const
 
void assemble (State &start)
 
void compact_dfa (State &start)
 
void encode_dfa (State &start)
 
void gencode_dfa (const State &start) const
 
void gencode_dfa_closure (FILE *fd, const State *start, int nest) const
 
void delete_dfa (State &start)
 
void export_dfa (const State &start) const
 
void export_code () const
 
void predict_match_dfa (State &start)
 
void gen_predict_match (State *state)
 
void gen_predict_match_transitions (State *state, std::map< State *, ORanges< Char > > &states)
 
void gen_predict_match_transitions (Index level, State *state, ORanges< Char > &labels, std::map< State *, ORanges< Char > > &states)
 
void write_predictor (FILE *fd) const
 
void write_namespace_open (FILE *fd) const
 
void write_namespace_close (FILE *fd) const
 
Location find_at (Location loc, char c) const
 
Char at (Location k) const
 
bool eq_at (Location loc, const char *s) const
 
Char escape_at (Location loc) const
 
Char escapes_at (Location loc, const char *escapes) const
 

Static Private Member Functions

static bool is_modified (Char mode, const Map &modifiers, Location loc)
 
static void update_modified (Char mode, Map &modifiers, Location from, Location to)
 
static bool is_meta (Char c)
 
static Opcode opcode_take (Index index)
 
static Opcode opcode_redo ()
 
static Opcode opcode_tail (Index index)
 
static Opcode opcode_head (Index index)
 
static Opcode opcode_goto (Char lo, Char hi, Index index)
 
static Opcode opcode_halt ()
 
static bool is_opcode_redo (Opcode opcode)
 
static bool is_opcode_take (Opcode opcode)
 
static bool is_opcode_tail (Opcode opcode)
 
static bool is_opcode_head (Opcode opcode)
 
static bool is_opcode_halt (Opcode opcode)
 
static bool is_opcode_meta (Opcode opcode)
 
static bool is_opcode_meta (Opcode opcode, Char a)
 
static bool is_opcode_match (Opcode opcode, unsigned char c)
 
static Char meta_of (Opcode opcode)
 
static Char lo_of (Opcode opcode)
 
static Char hi_of (Opcode opcode)
 
static Index index_of (Opcode opcode)
 
static Char lowercase (Char c)
 
static Char uppercase (Char c)
 
static Char reversecase (Char c)
 
static Hash hash (Hash h1, Hash h2)
 
static Hash hash (Hash h)
 

Private Attributes

Option opt_
 pattern compiler options More...
 
std::string rex_
 regular expression string More...
 
std::vector< Locationend_
 entries point to the subpattern's ending '|' or '\0' More...
 
std::vector< bool > acc_
 true if subpattern n is accepting (state is reachable) More...
 
size_t vno_
 number of finite state machine vertices |V| More...
 
size_t eno_
 number of finite state machine edges |E| More...
 
const Opcodeopc_
 points to the opcode table More...
 
Index nop_
 number of opcodes generated More...
 
FSM fsm_
 function pointer to FSM code More...
 
std::string pre_
 pattern prefix, no more than 255 bytes More...
 
size_t min_
 patterns after the prefix are at least this long but no more than 8 More...
 
Pred bit_ [256]
 bitap array More...
 
Pred pmh_ [Const::HASH]
 predict-match hash array More...
 
Pred pma_ [Const::HASH]
 predict-match array More...
 
float pms_
 ms elapsed time to parse regex More...
 
float vms_
 ms elapsed time to compile DFA vertices More...
 
float ems_
 ms elapsed time to compile DFA edges More...
 
float wms_
 ms elapsed time to assemble code words More...
 

Friends

class Matcher
 permit access by the reflex::Matcher engine More...
 

Detailed Description

Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine.

More info TODO

Member Typedef Documentation

typedef unsigned int reflex::Pattern::Char
private

represent char (0-255 + meta chars) set as a set of ranges

typedef std::map<Position,Positions> reflex::Pattern::Follow
private
typedef void(* reflex::Pattern::FSM) (class Matcher &)

function pointer to FSM code

typedef uint16_t reflex::Pattern::Hash

hash type (uint16_t), may be narrowed to uint8_t when Const::HASH == 0x100

typedef uint16_t reflex::Pattern::Index

index into opcodes array Pattern::opc_ and subpattern indexing

typedef size_t reflex::Pattern::Location
private
typedef std::map<int,Locations> reflex::Pattern::Map
private
typedef std::pair<Chars,Positions> reflex::Pattern::Move
private
typedef std::list<Move> reflex::Pattern::Moves
private
typedef uint32_t reflex::Pattern::Opcode

32 bit opcode word

typedef std::set<Position> reflex::Pattern::Positions
private
typedef uint8_t reflex::Pattern::Pred

predict match bits

typedef std::set<Location> reflex::Pattern::Set
private

Member Enumeration Documentation

enum reflex::Pattern::Meta
private

Meta characters.

Enumerator
META_MIN 
META_NWB 

non-word boundary at begin \Bx

META_NWE 

non-word boundary at end x\B

META_BWB 

begin of word at begin \<x where =(<|>)x

META_EWB 

end of word at begin \>x

META_BWE 

begin of word at end x\< where x=x(<|>)

META_EWE 

end of word at end x\>

META_BOL 

begin of line ^

META_EOL 

end of line $

META_BOB 

begin of buffer \A

META_EOB 

end of buffer \Z

META_UND 

undent boundary \k

META_IND 

indent boundary \i (must be one but the largest META code)

META_DED 

dedent boundary \j (must be the largest META code)

META_MAX 

max meta characters

Constructor & Destructor Documentation

reflex::Pattern::Pattern ( )
inlineexplicit

Construct an unset pattern.

reflex::Pattern::Pattern ( const char *  regex,
const char *  options = NULL 
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const char *  regex,
const std::string &  options 
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const std::string &  regex,
const char *  options = NULL 
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const std::string &  regex,
const std::string &  options 
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const Opcode code,
const uint8_t *  pred = NULL 
)
inlineexplicit

Construct a pattern object given an opcode table.

reflex::Pattern::Pattern ( FSM  fsm,
const uint8_t *  pred = NULL 
)
inlineexplicit

Construct a pattern object given a function pointer to FSM code.

virtual reflex::Pattern::~Pattern ( )
inlinevirtual

Destructor, deletes internal code array when owned and allocated.

Member Function Documentation

void reflex::Pattern::assemble ( State start)
private
Pattern& reflex::Pattern::assign ( const char *  regex,
const char *  options = NULL 
)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::assign ( const char *  regex,
const std::string &  options 
)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::assign ( const std::string &  regex,
const char *  options = NULL 
)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::assign ( const std::string &  regex,
const std::string &  options 
)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::assign ( const Opcode code,
const uint8_t *  pred = NULL 
)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::assign ( FSM  fsm,
const uint8_t *  pred = NULL 
)
inline

Assign a (new) pattern.

Char reflex::Pattern::at ( Location  k) const
inlineprivate
void reflex::Pattern::clear ( )
inline

Clear and delete pattern data.

void reflex::Pattern::compact_dfa ( State start)
private
void reflex::Pattern::compile ( State start,
Follow followpos,
const Map modifiers,
const Map lookahead 
)
private
Char reflex::Pattern::compile_esc ( Location  loc,
Chars chars 
) const
private
void reflex::Pattern::compile_list ( Location  loc,
Chars chars,
const Map modifiers 
) const
private
void reflex::Pattern::compile_transition ( State state,
Follow followpos,
const Map modifiers,
const Map lookahead,
Moves moves 
) const
private
void reflex::Pattern::delete_dfa ( State start)
private
size_t reflex::Pattern::edges ( ) const
inline

Get the number of finite state machine edges (transitions on input characters).

Returns
number of edges or 0 when no finite state machine was constructed by this pattern.
float reflex::Pattern::edges_time ( ) const
inline

Get elapsed DFA edges construction time.

void reflex::Pattern::encode_dfa ( State start)
private
bool reflex::Pattern::eq_at ( Location  loc,
const char *  s 
) const
inlineprivate
virtual void reflex::Pattern::error ( regex_error_type  code,
size_t  pos = 0 
) const
protectedvirtual

Throw an error.

Parameters
codeerror code
posoptional location of the error in regex string Pattern::rex_
Char reflex::Pattern::escape_at ( Location  loc) const
inlineprivate
Char reflex::Pattern::escapes_at ( Location  loc,
const char *  escapes 
) const
inlineprivate
void reflex::Pattern::export_code ( ) const
private
void reflex::Pattern::export_dfa ( const State start) const
private
Location reflex::Pattern::find_at ( Location  loc,
char  c 
) const
inlineprivate
void reflex::Pattern::flip ( Chars chars) const
private
void reflex::Pattern::gen_predict_match ( State state)
private
void reflex::Pattern::gen_predict_match_transitions ( State state,
std::map< State *, ORanges< Char > > &  states 
)
private
void reflex::Pattern::gen_predict_match_transitions ( Index  level,
State state,
ORanges< Char > &  labels,
std::map< State *, ORanges< Char > > &  states 
)
private
void reflex::Pattern::gencode_dfa ( const State start) const
private
void reflex::Pattern::gencode_dfa_closure ( FILE *  fd,
const State start,
int  nest 
) const
private
void reflex::Pattern::greedy ( Positions pos) const
private
static Hash reflex::Pattern::hash ( Hash  h1,
Hash  h2 
)
inlinestaticprivate
static Hash reflex::Pattern::hash ( Hash  h)
inlinestaticprivate
static Char reflex::Pattern::hi_of ( Opcode  opcode)
inlinestaticprivate
static Index reflex::Pattern::index_of ( Opcode  opcode)
inlinestaticprivate
void reflex::Pattern::init ( const char *  options,
const uint8_t *  pred = NULL 
)
private

Initialize the pattern at construction.

void reflex::Pattern::init_options ( const char *  options)
private
static bool reflex::Pattern::is_meta ( Char  c)
inlinestaticprivate
static bool reflex::Pattern::is_modified ( Char  mode,
const Map modifiers,
Location  loc 
)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_halt ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_head ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_match ( Opcode  opcode,
unsigned char  c 
)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_meta ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_meta ( Opcode  opcode,
Char  a 
)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_redo ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_tail ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_take ( Opcode  opcode)
inlinestaticprivate
void reflex::Pattern::lazy ( const Positions lazypos,
Positions pos 
) const
private
void reflex::Pattern::lazy ( const Positions lazypos,
const Positions pos,
Positions pos1 
) const
private
static Char reflex::Pattern::lo_of ( Opcode  opcode)
inlinestaticprivate
static Char reflex::Pattern::lowercase ( Char  c)
inlinestaticprivate
static Char reflex::Pattern::meta_of ( Opcode  opcode)
inlinestaticprivate
size_t reflex::Pattern::nodes ( ) const
inline

Get the number of finite state machine nodes (vertices).

Returns
number of nodes or 0 when no finite state machine was constructed by this pattern.
float reflex::Pattern::nodes_time ( ) const
inline

Get elapsed DFA vertices construction time.

static Opcode reflex::Pattern::opcode_goto ( Char  lo,
Char  hi,
Index  index 
)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_halt ( )
inlinestaticprivate
static Opcode reflex::Pattern::opcode_head ( Index  index)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_redo ( )
inlinestaticprivate
static Opcode reflex::Pattern::opcode_tail ( Index  index)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_take ( Index  index)
inlinestaticprivate
Pattern& reflex::Pattern::operator= ( const Pattern pattern)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::operator= ( const char *  regex)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::operator= ( const std::string &  regex)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::operator= ( const Opcode code)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::operator= ( FSM  fsm)
inline

Assign a (new) pattern.

const std::string reflex::Pattern::operator[] ( Index  choice) const

Get subpattern regex of this pattern object or the whole regex with index 0.

Returns
subpattern string or "" when not set.
void reflex::Pattern::parse ( Positions startpos,
Follow followpos,
Map modifiers,
Map lookahead 
)
private
void reflex::Pattern::parse1 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Positions lazypos,
Map modifiers,
Locations lookahead,
Index iter 
)
private
void reflex::Pattern::parse2 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Positions lazypos,
Map modifiers,
Locations lookahead,
Index iter 
)
private
void reflex::Pattern::parse3 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Positions lazypos,
Map modifiers,
Locations lookahead,
Index iter 
)
private
void reflex::Pattern::parse4 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Positions lazypos,
Map modifiers,
Locations lookahead,
Index iter 
)
private
void reflex::Pattern::parse_esc ( Location loc) const
private
float reflex::Pattern::parse_time ( ) const
inline

Get elapsed regex parsing and analysis time.

void reflex::Pattern::posix ( size_t  index,
Chars chars 
) const
private
void reflex::Pattern::predict_match_dfa ( State start)
private
bool reflex::Pattern::reachable ( Index  choice) const
inline

Check is subpattern is reachable by a match.

Returns
true if subpattern is reachable.
static Char reflex::Pattern::reversecase ( Char  c)
inlinestaticprivate
Index reflex::Pattern::size ( ) const
inline

Number of subpatterns of this pattern object.

Returns
number of subpatterns.
void reflex::Pattern::transition ( Moves moves,
const Chars chars,
const Positions follow 
) const
private
void reflex::Pattern::trim_lazy ( Positions pos) const
private
static void reflex::Pattern::update_modified ( Char  mode,
Map modifiers,
Location  from,
Location  to 
)
inlinestaticprivate
static Char reflex::Pattern::uppercase ( Char  c)
inlinestaticprivate
size_t reflex::Pattern::words ( ) const
inline

Get the code size in number of words.

Returns
number of words or 0 when no code was generated by this pattern.
float reflex::Pattern::words_time ( ) const
inline

Get elapsed code words assembly time.

void reflex::Pattern::write_namespace_close ( FILE *  fd) const
private
void reflex::Pattern::write_namespace_open ( FILE *  fd) const
private
void reflex::Pattern::write_predictor ( FILE *  fd) const
private

Friends And Related Function Documentation

friend class Matcher
friend

permit access by the reflex::Matcher engine

Member Data Documentation

std::vector<bool> reflex::Pattern::acc_
private

true if subpattern n is accepting (state is reachable)

Pred reflex::Pattern::bit_[256]
private

bitap array

float reflex::Pattern::ems_
private

ms elapsed time to compile DFA edges

std::vector<Location> reflex::Pattern::end_
private

entries point to the subpattern's ending '|' or '\0'

size_t reflex::Pattern::eno_
private

number of finite state machine edges |E|

FSM reflex::Pattern::fsm_
private

function pointer to FSM code

size_t reflex::Pattern::min_
private

patterns after the prefix are at least this long but no more than 8

Index reflex::Pattern::nop_
private

number of opcodes generated

const Opcode* reflex::Pattern::opc_
private

points to the opcode table

Option reflex::Pattern::opt_
private

pattern compiler options

Pred reflex::Pattern::pma_[Const::HASH]
private

predict-match array

Pred reflex::Pattern::pmh_[Const::HASH]
private

predict-match hash array

float reflex::Pattern::pms_
private

ms elapsed time to parse regex

std::string reflex::Pattern::pre_
private

pattern prefix, no more than 255 bytes

std::string reflex::Pattern::rex_
private

regular expression string

float reflex::Pattern::vms_
private

ms elapsed time to compile DFA vertices

size_t reflex::Pattern::vno_
private

number of finite state machine vertices |V|

float reflex::Pattern::wms_
private

ms elapsed time to assemble code words


The documentation for this class was generated from the following file: