FastqPuri
|
sequence manipulation for alignment More...
Go to the source code of this file.
Classes | |
struct | _ad_seq |
stores an adapter entry More... | |
Typedefs | |
typedef struct _ad_seq | Ad_seq |
stores an adapter entry | |
Functions | |
void | init_alLUTs () |
look up table initialization for alignment (used for adapters) More... | |
int | process_seq (unsigned char *packed, unsigned char *read, int L, bool shift, bool isreverse) |
Packs a sequence using alfw0, alfw1, albw0, albw1. More... | |
Ad_seq * | pack_adapter (Fa_data *ptr_fa) |
reads a Fa_data with adapters and stores them in an array of Ad_seq structs. More... | |
double | obtain_score (Fq_read *seq, int pos_seq, Ad_seq *ptr_adap, int pos_ad) |
computes score of a possible alignment, after having found a seed. More... | |
sequence manipulation for alignment
void init_alLUTs | ( | ) |
look up table initialization for alignment (used for adapters)
It initializes: fw_1B, bw_1B. They are uint8_t arrays with 256 elements. All elements are set to 0xFF excepting the ones corresponding to 'a', 'A', 'c', 'C', 'g', 'G', 't', 'T':
Var | a,A | c,C | g,G | t,T | Var | a,A | c,C | g,G | t,T |
---|---|---|---|---|---|---|---|---|---|
alfw0 | 0x01 | 0x02 | 0x04 | 0x08 | albw0 | 0x08 | 0x04 | 0x02 | 0x01 |
alfw1 | 0x10 | 0x20 | 0x40 | 0x80 | albw1 | 0x80 | 0x40 | 0x20 | 0x10 |
With this variables we will encode sequences that can be compared later on. Using the bitwise XOR operator, every mismatch will amount to two bits set to 1.
computes score of a possible alignment, after having found a seed.
The score is computed as follows:
seq | pointer to Fq_read. |
pos_seq | read starting position of the alignment |
ptr_adap | pointer to Ad_seq, contains the adapter info |
pos_ad | adapter starting position of the alignment (reverse) |
reads a Fa_data with adapters and stores them in an array of Ad_seq structs.
It reads the fasta structure. For every entry, an Ad_seq structure is allocated and the sequences are processed to create the packed sequences.
ptr_fa | pointer to Fa_data structure |
int process_seq | ( | unsigned char * | packed, |
unsigned char * | sequence, | ||
int | L, | ||
bool | shift, | ||
bool | isreverse | ||
) |
Packs a sequence using alfw0, alfw1, albw0, albw1.
It takes a sequence of length L and packs it using the look up tables into an unsigned char array, where every bytes corresponds to 2 nucleotides. One can encode the reverse complement or the sequence shifted by 1/2 byte.
packed | packed sequence |
sequence | original sequence |
L | original sequence length |
shift | 0 if taken as is we want to shift the output 1/2 byte (>>4) |
isreverse | 0 if we want the forward sequence, 1 reverse complement |