Sanger::CGP::TraFiC::Formatter
use Sanger::CGP::TraFiC::Formatter; my $formatter = Sanger::CGP::TraFiC::Formatter->new; $formatter->set_output($output_dir); $formatter->format_rm_results($repeatmasker_files, $fasta_files);
Designed to take one or more repeatmasker *.out files along with the input files to generate two output files (positive anchors, negative anchors) containing the full information about the mapped end of those reads that were successfully processed by repeat masker.
All the data required is encoded in the fasta header for each read:
2,3 and 5 are based on the data for the mapped end of the pair.
This means we only need to store the readname and family of repeat in memory from the rm output as the rest can be read back from the rm input files with no memory overhead.
Initialises the object and ensures that unix sort is available in path.
my $formatter = Sanger::CGP::TraFiC::Formatter->new;
Will croak
if unable to find sort
in path with this message:
Unable to find standard unix 'sort' in path
Specify the output folder for collated results.
$formatter->set_output($output_path);
Allows user to silence internal messages. Mainly added for testing system
$formatter->silence;
Allows user resume internal messages.
$formatter->unsilence;
Main processing function. set_output
must be called prior to this.
$formatter->format_rm_results(\@rm_out_files, \@rm_in_fasta);
rm_out_files - \@ of repeat masker output files (*.out). rm_in_fasta - \@ reference of the fasta files presented to repeat masker.
Nothing is returned, results are written to the specified output location as pos/neg_hits.txt
Load the repeat-masker output file data. Only best/longest hits are retained. Not really intended for external use, but no reason you can't use it if you want to parse a RM file for the basic data retrieved.
rm_out_files - \@ of repeat masker output files (*.out).
\% where key is readname and value is the family that masked this read.