FastqPuri
Classes | Typedefs | Functions
adapters.h File Reference

sequence manipulation for alignment More...

#include "fq_read.h"
#include "fa_read.h"
#include "defines.h"
Include dependency graph for adapters.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  _ad_seq
 stores an adapter entry More...
 

Typedefs

typedef struct _ad_seq Ad_seq
 stores an adapter entry
 

Functions

void init_alLUTs ()
 look up table initialization for alignment (used for adapters) More...
 
int process_seq (unsigned char *packed, unsigned char *read, int L, bool shift, bool isreverse)
 Packs a sequence using alfw0, alfw1, albw0, albw1. More...
 
Ad_seqpack_adapter (Fa_data *ptr_fa)
 reads a Fa_data with adapters and stores them in an array of Ad_seq structs. More...
 
double obtain_score (Fq_read *seq, int pos_seq, Ad_seq *ptr_adap, int pos_ad)
 computes score of a possible alignment, after having found a seed. More...
 

Detailed Description

sequence manipulation for alignment

Date
22.09.2017

Function Documentation

◆ init_alLUTs()

void init_alLUTs ( )

look up table initialization for alignment (used for adapters)

It initializes: fw_1B, bw_1B. They are uint8_t arrays with 256 elements. All elements are set to 0xFF excepting the ones corresponding to 'a', 'A', 'c', 'C', 'g', 'G', 't', 'T':

Var a,A c,C g,G t,T Var a,A c,C g,G t,T
alfw0 0x01 0x02 0x04 0x08 albw0 0x08 0x04 0x02 0x01
alfw1 0x10 0x20 0x40 0x80 albw1 0x80 0x40 0x20 0x10

With this variables we will encode sequences that can be compared later on. Using the bitwise XOR operator, every mismatch will amount to two bits set to 1.

◆ obtain_score()

double obtain_score ( Fq_read seq,
int  pos_seq,
Ad_seq ptr_adap,
int  pos_ad 
)

computes score of a possible alignment, after having found a seed.

The score is computed as follows:

  • matching bases: score += log_10(4)
  • unmatching bases: score -= Q/10, where Q is the quality score.
Parameters
seqpointer to Fq_read.
pos_seqread starting position of the alignment
ptr_adappointer to Ad_seq, contains the adapter info
pos_adadapter starting position of the alignment (reverse)
Returns
score of the alignment

◆ pack_adapter()

Ad_seq* pack_adapter ( Fa_data ptr_fa)

reads a Fa_data with adapters and stores them in an array of Ad_seq structs.

It reads the fasta structure. For every entry, an Ad_seq structure is allocated and the sequences are processed to create the packed sequences.

Parameters
ptr_fapointer to Fa_data structure
Returns
pointer to Ad_seq, where the information is stored.

◆ process_seq()

int process_seq ( unsigned char *  packed,
unsigned char *  sequence,
int  L,
bool  shift,
bool  isreverse 
)

Packs a sequence using alfw0, alfw1, albw0, albw1.

It takes a sequence of length L and packs it using the look up tables into an unsigned char array, where every bytes corresponds to 2 nucleotides. One can encode the reverse complement or the sequence shifted by 1/2 byte.

Parameters
packedpacked sequence
sequenceoriginal sequence
Loriginal sequence length
shift0 if taken as is we want to shift the output 1/2 byte (>>4)
isreverse0 if we want the forward sequence, 1 reverse complement
Returns
Lhalf, length in Bytes of the packed sequence