FastqPuri
Functions | Variables
fa_read.c File Reference

reads in and stores fasta files More...

#include <stdlib.h>
#include <string.h>
#include "fa_read.h"
#include "defines.h"
#include "fopen_gen.h"
Include dependency graph for fa_read.c:

Functions

static int ignore_line (char *line)
 ignore header lines. More...
 
static void init_fa (Fa_data *ptr_fa)
 Initialization of Fa_data. More...
 
static void realloc_fa (Fa_data *ptr_fa)
 Reallocation of Fa_data, in case the length of entrylen is exhausted. More...
 
static void init_entries (Fa_data *ptr_fa)
 Allocation of Fa_entries. More...
 
static uint64_t sweep_fa (char *filename, Fa_data *ptr_fa)
 this function sweeps a fasta file to obtain structure details. More...
 
int read_fasta (char *filename, Fa_data *ptr_fa)
 reads a fasta file and stores the contents in a Fa_data structure. More...
 
uint64_t size_fasta (Fa_data *ptr_fa)
 computes length of genome in fasta structure More...
 
uint64_t nkmers (Fa_data *ptr_fa, int kmersize)
 number of kmers of length kmersize contained in a fasta structure More...
 
void free_fasta (Fa_data *ptr_fa)
 free fasta file More...
 

Variables

uint64_t alloc_mem
 

Detailed Description

reads in and stores fasta files

Author
Paula Perez paula.nosp@m.pere.nosp@m.zrubi.nosp@m.o@gm.nosp@m.ail.c.nosp@m.om
Date
18.08.2017

Function Documentation

◆ free_fasta()

void free_fasta ( Fa_data ptr_fa)

free fasta file

Parameters
ptr_fapointer to Fa_data structure.

The dynamically allocated memory in a Fa_data struct is deallocated and counted, so that we can

◆ ignore_line()

static int ignore_line ( char *  line)
static

ignore header lines.

Parameters
linestring of characters.
Returns
number of characters to jump until a
is found.

◆ init_entries()

static void init_entries ( Fa_data ptr_fa)
static

Allocation of Fa_entries.

Parameters
ptr_fapointer to Fa_data structure.

When we have sweeped the fasta file once, we can proceed to allocate the memory for the entries (now we have registered their length).

◆ init_fa()

static void init_fa ( Fa_data ptr_fa)
static

Initialization of Fa_data.

Parameters
ptr_fapointer to Fa_data structure.

Initializes nlines, linelen, nentries to 0 and allocates memory for entrylen (FA_ENTRY_BUF entries).

◆ nkmers()

uint64_t nkmers ( Fa_data ptr_fa,
int  kmersize 
)

number of kmers of length kmersize contained in a fasta structure

Returns
number of kmers of length kmersize contained in a fasta structure

◆ read_fasta()

int read_fasta ( char *  filename,
Fa_data ptr_fa 
)

reads a fasta file and stores the contents in a Fa_data structure.

Parameters
filenamepath to a fasta input file.
ptr_fapointer to Fa_data structure.
Returns
number of entries in the fasta file.

A fasta file is read and stored in a structure Fa_data The basic problem with reading FASTA files is that there is no end-of-record indicator. When you're reading sequence n, you don't know you're done until you've read the header line for sequence n+1, which you won't parse 'til later (when you're reading in the sequence n+1). The solution implemented here is to read the file twice. The first time, (sweep_fa), we initialize Fa_data and store the parameters:

  • nlines: number of lines of the fasta file.
  • nentries: number of entries in the fasta file.
  • linelen: length of a line in the considered fasta file.
  • entrylen: array containing the lengths of every entry. With this information, the pointer to Fa_entry can be allocated and the file is read again and the entries are stored in the structure.

◆ realloc_fa()

static void realloc_fa ( Fa_data ptr_fa)
static

Reallocation of Fa_data, in case the length of entrylen is exhausted.

Parameters
ptr_fapointer to Fa_data structure.

◆ size_fasta()

uint64_t size_fasta ( Fa_data ptr_fa)

computes length of genome in fasta structure

Parameters
ptr_fapointer to Fa_data
Returns
total number of nucleotides

◆ sweep_fa()

static uint64_t sweep_fa ( char *  filename,
Fa_data ptr_fa 
)
static

this function sweeps a fasta file to obtain structure details.

Parameters
filenamepath to a fasta input file.
ptr_fapointer to Fa_data structure.
Returns
size of fasta file.

This function sweeps over the fasta file once to annotate how many entries there are, how long they are, how many characters there are per line, and how many lines the file has.

Variable Documentation

◆ alloc_mem

uint64_t alloc_mem

global variable. Memory allocated in the heap.