FastqPuri
Classes | Typedefs | Functions
fa_read.h File Reference

reads in and stores fasta files More...

#include <stdint.h>
Include dependency graph for fa_read.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  _fa_entry
 fasta entry More...
 
struct  _fa_data
 stores sequences of a fasta file More...
 

Typedefs

typedef struct _fa_entry Fa_entry
 fasta entry
 
typedef struct _fa_data Fa_data
 stores sequences of a fasta file
 

Functions

int read_fasta (char *filename, Fa_data *ptr_fa)
 reads a fasta file and stores the contents in a Fa_data structure. More...
 
uint64_t size_fasta (Fa_data *ptr_fa)
 computes length of genome in fasta structure More...
 
uint64_t nkmers (Fa_data *ptr_fa, int kmersize)
 number of kmers of length kmersize contained in a fasta structure More...
 
void free_fasta (Fa_data *ptr_fa)
 free fasta file More...
 

Detailed Description

reads in and stores fasta files

Author
Paula Perez paula.nosp@m.pere.nosp@m.zrubi.nosp@m.o@gm.nosp@m.ail.c.nosp@m.om
Date
16.08.2017

Function Documentation

◆ free_fasta()

void free_fasta ( Fa_data ptr_fa)

free fasta file

Parameters
ptr_fapointer to Fa_data structure.

The dynamically allocated memory in a Fa_data struct is deallocated and counted, so that we can

◆ nkmers()

uint64_t nkmers ( Fa_data ptr_fa,
int  kmersize 
)

number of kmers of length kmersize contained in a fasta structure

Returns
number of kmers of length kmersize contained in a fasta structure

◆ read_fasta()

int read_fasta ( char *  filename,
Fa_data ptr_fa 
)

reads a fasta file and stores the contents in a Fa_data structure.

Parameters
filenamepath to a fasta input file.
ptr_fapointer to Fa_data structure.
Returns
number of entries in the fasta file.

A fasta file is read and stored in a structure Fa_data The basic problem with reading FASTA files is that there is no end-of-record indicator. When you're reading sequence n, you don't know you're done until you've read the header line for sequence n+1, which you won't parse 'til later (when you're reading in the sequence n+1). The solution implemented here is to read the file twice. The first time, (sweep_fa), we initialize Fa_data and store the parameters:

  • nlines: number of lines of the fasta file.
  • nentries: number of entries in the fasta file.
  • linelen: length of a line in the considered fasta file.
  • entrylen: array containing the lengths of every entry. With this information, the pointer to Fa_entry can be allocated and the file is read again and the entries are stored in the structure.

◆ size_fasta()

uint64_t size_fasta ( Fa_data ptr_fa)

computes length of genome in fasta structure

Parameters
ptr_fapointer to Fa_data
Returns
total number of nucleotides