reads in and stores fasta files
More...
#include <stdint.h>
Go to the source code of this file.
reads in and stores fasta files
- Author
- Paula Perez paula.nosp@m.pere.nosp@m.zrubi.nosp@m.o@gm.nosp@m.ail.c.nosp@m.om
- Date
- 16.08.2017
◆ free_fasta()
void free_fasta |
( |
Fa_data * |
ptr_fa | ) |
|
free fasta file
- Parameters
-
ptr_fa | pointer to Fa_data structure. |
The dynamically allocated memory in a Fa_data struct is deallocated and counted, so that we can
◆ nkmers()
uint64_t nkmers |
( |
Fa_data * |
ptr_fa, |
|
|
int |
kmersize |
|
) |
| |
number of kmers of length kmersize contained in a fasta structure
- Returns
- number of kmers of length kmersize contained in a fasta structure
◆ read_fasta()
int read_fasta |
( |
char * |
filename, |
|
|
Fa_data * |
ptr_fa |
|
) |
| |
reads a fasta file and stores the contents in a Fa_data structure.
- Parameters
-
filename | path to a fasta input file. |
ptr_fa | pointer to Fa_data structure. |
- Returns
- number of entries in the fasta file.
A fasta file is read and stored in a structure Fa_data The basic problem with reading FASTA files is that there is no end-of-record indicator. When you're reading sequence n, you don't know you're done until you've read the header line for sequence n+1, which you won't parse 'til later (when you're reading in the sequence n+1). The solution implemented here is to read the file twice. The first time, (sweep_fa), we initialize Fa_data and store the parameters:
- nlines: number of lines of the fasta file.
- nentries: number of entries in the fasta file.
- linelen: length of a line in the considered fasta file.
- entrylen: array containing the lengths of every entry. With this information, the pointer to Fa_entry can be allocated and the file is read again and the entries are stored in the structure.
◆ size_fasta()
uint64_t size_fasta |
( |
Fa_data * |
ptr_fa | ) |
|
computes length of genome in fasta structure
- Parameters
-
- Returns
- total number of nucleotides