FastqPuri
Functions | Variables
adapters.c File Reference

sequence manipulation for alignment More...

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include "adapters.h"
#include "Lmer.h"
Include dependency graph for adapters.c:

Functions

void init_alLUTs ()
 look up table initialization for alignment (used for adapters) More...
 
int process_seq (unsigned char *packed, unsigned char *sequence, int L, bool shift, bool isreverse)
 Packs a sequence using alfw0, alfw1, albw0, albw1. More...
 
Ad_seqpack_adapter (Fa_data *ptr_fa)
 reads a Fa_data with adapters and stores them in an array of Ad_seq structs. More...
 
double obtain_score (Fq_read *seq, int pos_seq, Ad_seq *ptr_adap, int pos_ad)
 computes score of a possible alignment, after having found a seed. More...
 

Variables

static uint8_t alfw0 [256]
 
static uint8_t alfw1 [256]
 
static uint8_t albw0 [256]
 
static uint8_t albw1 [256]
 
uint8_t fw_1B [256]
 
uint8_t bw_1B [256]
 

Detailed Description

sequence manipulation for alignment

Author
Paula Perez paula.nosp@m.pere.nosp@m.zrubi.nosp@m.o@gm.nosp@m.ail.c.nosp@m.om
Date
23.09.2017

Function Documentation

◆ init_alLUTs()

void init_alLUTs ( )

look up table initialization for alignment (used for adapters)

It initializes: fw_1B, bw_1B. They are uint8_t arrays with 256 elements. All elements are set to 0xFF excepting the ones corresponding to 'a', 'A', 'c', 'C', 'g', 'G', 't', 'T':

Var a,A c,C g,G t,T Var a,A c,C g,G t,T
alfw0 0x01 0x02 0x04 0x08 albw0 0x08 0x04 0x02 0x01
alfw1 0x10 0x20 0x40 0x80 albw1 0x80 0x40 0x20 0x10

With this variables we will encode sequences that can be compared later on. Using the bitwise XOR operator, every mismatch will amount to two bits set to 1.

◆ obtain_score()

double obtain_score ( Fq_read seq,
int  pos_seq,
Ad_seq ptr_adap,
int  pos_ad 
)

computes score of a possible alignment, after having found a seed.

The score is computed as follows:

  • matching bases: score += log_10(4)
  • unmatching bases: score -= Q/10, where Q is the quality score.
Parameters
seqpointer to Fq_read.
pos_seqread starting position of the alignment
ptr_adappointer to Ad_seq, contains the adapter info
pos_adadapter starting position of the alignment (reverse)
Returns
score of the alignment

◆ pack_adapter()

Ad_seq* pack_adapter ( Fa_data ptr_fa)

reads a Fa_data with adapters and stores them in an array of Ad_seq structs.

It reads the fasta structure. For every entry, an Ad_seq structure is allocated and the sequences are processed to create the packed sequences.

Parameters
ptr_fapointer to Fa_data structure
Returns
pointer to Ad_seq, where the information is stored.

◆ process_seq()

int process_seq ( unsigned char *  packed,
unsigned char *  sequence,
int  L,
bool  shift,
bool  isreverse 
)

Packs a sequence using alfw0, alfw1, albw0, albw1.

It takes a sequence of length L and packs it using the look up tables into an unsigned char array, where every bytes corresponds to 2 nucleotides. One can encode the reverse complement or the sequence shifted by 1/2 byte.

Parameters
packedpacked sequence
sequenceoriginal sequence
Loriginal sequence length
shift0 if taken as is we want to shift the output 1/2 byte (>>4)
isreverse0 if we want the forward sequence, 1 reverse complement
Returns
Lhalf, length in Bytes of the packed sequence

Variable Documentation

◆ albw0

uint8_t albw0[256]
static

variable for brackward packing, first half

◆ albw1

uint8_t albw1[256]
static

variable for brackward packing, second half

◆ alfw0

uint8_t alfw0[256]
static

variable for forward packing, first half

◆ alfw1

uint8_t alfw1[256]
static

variable for forward packing, second half

◆ bw_1B

uint8_t bw_1B[256]

global variable. Lookup table.

◆ fw_1B

uint8_t fw_1B[256]

global variable. Lookup table.