FastqPuri
|
trims/filter sequences after Quality, N's contaminations. More...
#include <string.h>
#include <stdlib.h>
#include "trim.h"
#include "str_manip.h"
#include "defines.h"
#include "config.h"
#include "struct_trimFilter.h"
Macros | |
#define | TRIM_STRING 20 |
Functions | |
static int | no_N (Fq_read *seq) |
checks if a sequence contains any non standard base callings (N's) More... | |
static int | Nfree_Lmer (Fq_read *seq, int minL) |
Finds the largest Nfree sub-seq and keeps it if larger than minL. More... | |
static int | Ntrim_ends (Fq_read *seq, int minL) |
trims a read if N's are at the ends and the remaining sub-seq >= minL More... | |
static int | no_lowQ (Fq_read *seq, int minQ) |
checks if a sequence contains lowQ nucleotides More... | |
static int | Qtrim_ends (Fq_read *seq, int minQ, int minL) |
trims a read if lowQs are at the ends and remaining sub-seq >= minL More... | |
static int | Qtrim_frac (Fq_read *seq, int minQ, int nlowQ) |
accepts the sequence as is if there are less than nlowQ More... | |
static int | Qtrim_endsfrac (Fq_read *seq, int minQ, int minL, int nlowQ) |
int | Qtrim_global (Fq_read *seq, int left, int right, char type) |
trims left from the left and right from the right More... | |
static int | align_uint32 (Fq_read *seq, Ad_seq *ptr_adap, bool all) |
alignment search between a fq read, and an adapter sequence, with a seed of 8 nucleotides. More... | |
static int | align_uint64 (Fq_read *seq, Ad_seq *ptr_adap) |
Alignment search between a fq read, and an adapter sequence, w with a seed of 8 nucleotides. More... | |
int | trim_adapter (Fq_read *seq, Ad_seq *adap_list) |
trims sequence based on presence of N nucleotides More... | |
int | trim_sequenceN (Fq_read *seq) |
trims sequence based on presence of N nucleotides More... | |
int | trim_sequenceQ (Fq_read *seq) |
trims sequence based on lowQ base callings More... | |
bool | is_read_inTree (Tree *tree_ptr, Fq_read *seq) |
check if Lread is contained in tree. It computes the score for the read and its reverse complement; if one ot them exceeds the user selected threshold, it returns true. Otherwise, it returns false. More... | |
bool | is_read_inBloom (Bfilter *ptr_bf, Fq_read *seq, Bfkmer *ptr_bfkmer) |
checks if a read is in Bloom filter. It computes the score for the read and returns true if it exceeds the user selected threshold. Returns false othersise. More... | |
Variables | |
int | Nencode |
Iparam_trimFilter | par_TF |
trims/filter sequences after Quality, N's contaminations.
#define TRIM_STRING 20 |
maximal length of trimming info string.
alignment search between a fq read, and an adapter sequence, with a seed of 8 nucleotides.
This function checks whether there is adapter contamination in a given read. It works stand alone if the adapter is shorter than 16 nucleotides, and is called from align_uint64 when no 16-nucleotides long seeds are found. The criteria are the same as in align_uint64, the seed length being 8-nucleotides long instead of 16. See the align_uint64 documentation for more details.
seq | pointer to Fq_read |
ptr_adap | pointer to Ad_seq |
all | true if the whole read has to be sweeped, false if only the ends. When this function is called from align_uint64, only the ends need to be considered. |
Alignment search between a fq read, and an adapter sequence, w with a seed of 8 nucleotides.
seq | pointer to Fq_read |
ptr_adap | pointer to Ad_seq |
This function checks whether there is adapter contamination in a given read. We start by looking for 16-nucleotides long seeds, where a user defined number of mismatches is allowed. If found, a score is computed. If the score is larger than the user defined threshold and the number of matched nucleotides exceeds MIN_NMATCHES (12), then the read is trimmed if the remaining part is longer than minL (user defined) and discarded otherwise. If no 16-nucleotides long seeds are found, we proceed with 8-nucleotides long seeds (see align_uint32) and apply the same criteria to trim/discard a read. A list of possible situations follows, to illustrate how it works (minL=25, mismatches=2):
checks if a read is in Bloom filter. It computes the score for the read and returns true if it exceeds the user selected threshold. Returns false othersise.
ptr_bf | pointer to Bfilter |
seq | fastq read |
ptr_bfkmer | pointer to Procs_kmer structure (will store global) |
check if Lread is contained in tree. It computes the score for the read and its reverse complement; if one ot them exceeds the user selected threshold, it returns true. Otherwise, it returns false.
tree_ptr | pointer to Tree structure |
seq | fastq read |
|
static |
Finds the largest Nfree sub-seq and keeps it if larger than minL.
seq | fastq read |
minL | minimum accepted trimmed length |
|
static |
checks if a sequence contains lowQ nucleotides
seq | fastq read |
minQ | minimum accepted quality value |
|
static |
checks if a sequence contains any non standard base callings (N's)
This function checks if any of the base callings in a given fastq read is different from A, C, G, T. Basically, any char different from the former ones is classified as N.
|
static |
trims a read if N's are at the ends and the remaining sub-seq >= minL
seq | fastq read |
minL | minimum accepted trimmed length |
|
static |
trims a read if lowQs are at the ends and remaining sub-seq >= minL
seq | fastq read |
minQ | minimum accepted quality value |
minL | minimum accepted trimmed length |
|
static |
accepts the sequence as is if there are less than nlowQ
seq | fastq read |
minQ | minimum accepted quality value |
nlowQ | threshold on lowQ nucleotides (>= NOT allowed) |
int Qtrim_global | ( | Fq_read * | seq, |
int | left, | ||
int | right, | ||
char | type | ||
) |
trims left from the left and right from the right
seq | fastq read |
left | number of nucleotides to be trimmed from the left |
right | number of nucleotides to be trimmed from the right |
type | char indicating the type of trimming (Q,A). |
trims sequence based on presence of N nucleotides
if (adapter length < 16) -> search for seeds 8 nucleotides long else -> search for seeds 16 nucleotides long if (seed found) -> calculate score if score > threshold -> aligner found, trim / discard and exit. else -> search for seeds 8 nucleotides long
seq | pointer to Fq_read |
adap_list | array of Ad_seq |
int trim_sequenceN | ( | Fq_read * | seq | ) |
trims sequence based on presence of N nucleotides
seq | fastq read |
This function calls a different function depending on the method passed as input par_TF.trimN:
int trim_sequenceQ | ( | Fq_read * | seq | ) |
trims sequence based on lowQ base callings
seq | fastq read |
This function calls a different function depending on the method passed as input par_TF.trimQ:
int Nencode |
global variable. Encoding for N's(\004)
Iparam_trimFilter par_TF |
global variable: Input parameters trimFilter.
global variable: Input parameters of makeTree.