sequence manipulation for alignment
More...
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include "adapters.h"
#include "Lmer.h"
|
void | init_alLUTs () |
| look up table initialization for alignment (used for adapters) More...
|
|
int | process_seq (unsigned char *packed, unsigned char *sequence, int L, bool shift, bool isreverse) |
| Packs a sequence using alfw0, alfw1, albw0, albw1. More...
|
|
Ad_seq * | pack_adapter (Fa_data *ptr_fa) |
| reads a Fa_data with adapters and stores them in an array of Ad_seq structs. More...
|
|
double | obtain_score (Fq_read *seq, int pos_seq, Ad_seq *ptr_adap, int pos_ad) |
| computes score of a possible alignment, after having found a seed. More...
|
|
sequence manipulation for alignment
- Author
- Paula Perez paula.nosp@m.pere.nosp@m.zrubi.nosp@m.o@gm.nosp@m.ail.c.nosp@m.om
- Date
- 23.09.2017
◆ init_alLUTs()
look up table initialization for alignment (used for adapters)
It initializes: fw_1B, bw_1B. They are uint8_t arrays with 256 elements. All elements are set to 0xFF excepting the ones corresponding to 'a', 'A', 'c', 'C', 'g', 'G', 't', 'T':
Var | a,A | c,C | g,G | t,T | Var | a,A | c,C | g,G | t,T |
alfw0 | 0x01 | 0x02 | 0x04 | 0x08 | albw0 | 0x08 | 0x04 | 0x02 | 0x01 |
alfw1 | 0x10 | 0x20 | 0x40 | 0x80 | albw1 | 0x80 | 0x40 | 0x20 | 0x10 |
With this variables we will encode sequences that can be compared later on. Using the bitwise XOR operator, every mismatch will amount to two bits set to 1.
◆ obtain_score()
double obtain_score |
( |
Fq_read * |
seq, |
|
|
int |
pos_seq, |
|
|
Ad_seq * |
ptr_adap, |
|
|
int |
pos_ad |
|
) |
| |
computes score of a possible alignment, after having found a seed.
The score is computed as follows:
- matching bases: score += log_10(4)
- unmatching bases: score -= Q/10, where Q is the quality score.
- Parameters
-
seq | pointer to Fq_read. |
pos_seq | read starting position of the alignment |
ptr_adap | pointer to Ad_seq, contains the adapter info |
pos_ad | adapter starting position of the alignment (reverse) |
- Returns
- score of the alignment
◆ pack_adapter()
reads a Fa_data with adapters and stores them in an array of Ad_seq structs.
It reads the fasta structure. For every entry, an Ad_seq structure is allocated and the sequences are processed to create the packed sequences.
- Parameters
-
ptr_fa | pointer to Fa_data structure |
- Returns
- pointer to Ad_seq, where the information is stored.
◆ process_seq()
int process_seq |
( |
unsigned char * |
packed, |
|
|
unsigned char * |
sequence, |
|
|
int |
L, |
|
|
bool |
shift, |
|
|
bool |
isreverse |
|
) |
| |
Packs a sequence using alfw0, alfw1, albw0, albw1.
It takes a sequence of length L and packs it using the look up tables into an unsigned char array, where every bytes corresponds to 2 nucleotides. One can encode the reverse complement or the sequence shifted by 1/2 byte.
- Parameters
-
packed | packed sequence |
sequence | original sequence |
L | original sequence length |
shift | 0 if taken as is we want to shift the output 1/2 byte (>>4) |
isreverse | 0 if we want the forward sequence, 1 reverse complement |
- Returns
- Lhalf, length in Bytes of the packed sequence
◆ albw0
variable for brackward packing, first half
◆ albw1
variable for brackward packing, second half
◆ alfw0
variable for forward packing, first half
◆ alfw1
variable for forward packing, second half
◆ bw_1B
global variable. Lookup table.
◆ fw_1B
global variable. Lookup table.