reads in and stores fasta files
More...
#include <stdlib.h>
#include <string.h>
#include "fa_read.h"
#include "defines.h"
#include "fopen_gen.h"
reads in and stores fasta files
- Author
- Paula Perez paula.nosp@m.pere.nosp@m.zrubi.nosp@m.o@gm.nosp@m.ail.c.nosp@m.om
- Date
- 18.08.2017
◆ free_fasta()
void free_fasta |
( |
Fa_data * |
ptr_fa | ) |
|
free fasta file
- Parameters
-
ptr_fa | pointer to Fa_data structure. |
The dynamically allocated memory in a Fa_data struct is deallocated and counted, so that we can
◆ ignore_line()
static int ignore_line |
( |
char * |
line | ) |
|
|
static |
ignore header lines.
- Parameters
-
line | string of characters. |
- Returns
- number of characters to jump until a
is found.
◆ init_entries()
static void init_entries |
( |
Fa_data * |
ptr_fa | ) |
|
|
static |
Allocation of Fa_entries.
- Parameters
-
ptr_fa | pointer to Fa_data structure. |
When we have sweeped the fasta file once, we can proceed to allocate the memory for the entries (now we have registered their length).
◆ init_fa()
static void init_fa |
( |
Fa_data * |
ptr_fa | ) |
|
|
static |
Initialization of Fa_data.
- Parameters
-
ptr_fa | pointer to Fa_data structure. |
Initializes nlines, linelen, nentries to 0 and allocates memory for entrylen (FA_ENTRY_BUF entries).
◆ nkmers()
uint64_t nkmers |
( |
Fa_data * |
ptr_fa, |
|
|
int |
kmersize |
|
) |
| |
number of kmers of length kmersize contained in a fasta structure
- Returns
- number of kmers of length kmersize contained in a fasta structure
◆ read_fasta()
int read_fasta |
( |
char * |
filename, |
|
|
Fa_data * |
ptr_fa |
|
) |
| |
reads a fasta file and stores the contents in a Fa_data structure.
- Parameters
-
filename | path to a fasta input file. |
ptr_fa | pointer to Fa_data structure. |
- Returns
- number of entries in the fasta file.
A fasta file is read and stored in a structure Fa_data The basic problem with reading FASTA files is that there is no end-of-record indicator. When you're reading sequence n, you don't know you're done until you've read the header line for sequence n+1, which you won't parse 'til later (when you're reading in the sequence n+1). The solution implemented here is to read the file twice. The first time, (sweep_fa), we initialize Fa_data and store the parameters:
- nlines: number of lines of the fasta file.
- nentries: number of entries in the fasta file.
- linelen: length of a line in the considered fasta file.
- entrylen: array containing the lengths of every entry. With this information, the pointer to Fa_entry can be allocated and the file is read again and the entries are stored in the structure.
◆ realloc_fa()
static void realloc_fa |
( |
Fa_data * |
ptr_fa | ) |
|
|
static |
Reallocation of Fa_data, in case the length of entrylen is exhausted.
- Parameters
-
ptr_fa | pointer to Fa_data structure. |
◆ size_fasta()
uint64_t size_fasta |
( |
Fa_data * |
ptr_fa | ) |
|
computes length of genome in fasta structure
- Parameters
-
- Returns
- total number of nucleotides
◆ sweep_fa()
static uint64_t sweep_fa |
( |
char * |
filename, |
|
|
Fa_data * |
ptr_fa |
|
) |
| |
|
static |
this function sweeps a fasta file to obtain structure details.
- Parameters
-
filename | path to a fasta input file. |
ptr_fa | pointer to Fa_data structure. |
- Returns
- size of fasta file.
This function sweeps over the fasta file once to annotate how many entries there are, how long they are, how many characters there are per line, and how many lines the file has.
◆ alloc_mem
global variable. Memory allocated in the heap.