Sanger::CGP::TraFiC::Cluster
my $cluster = Sanger::CGP::TraFiC::Cluster->new; $cluster->set_input_output($output_folder); # which should already contain data from Formatter.pm $cluster->set_min_reads($minimum_reads_required); $cluster->cluster_hits; $cluster->reciprocal_clusters;
Take the merged repeat-masker *.out and fasta data generated by Formatter.pm and generate clusters of nearby/overlapping events.
Generates three files:
Data for reads anchored on positive strand that are able to be clustered with greater or equal to the value of min_reads.
Data for reads anchored on negative strand that are able to be clustered with greater or equal to the value of min_reads.
Where entries in pos/neg_clusters.txt support each other they are additionally output here. The pos_cluster is always the primary entry.
The output files all follow the same format
These are present at the start of each line in every file.
Chromosome or sequence identifier.
Masking identified reads as hitting repeat of this type.
These are present in the pos/neg_clusters.txt files as described here.
In the reciprocal_clusters.txt file these are presented twice, the positive cluster first (prefixed P_) and then the negative cluster data (prefixed N_).
Left most position of cluster.
Left most position of cluster.
Total number of reads that support this cluster
Number of single end mapped reads that contribute to this cluster.
Number of inter-chromosomal mapped reads that contribute to this cluster.
Number of aberrantly paired reads that contribute to this cluster.
Names of reads contributing to SINGLE_END_COUNT.
Names of reads contributing to INTER_CHROM_COUNT.
Names of reads contributing to ABERRANT_COUNT.
No options, sets up object with default values.
Define the path to the input files (generated by Sanger::CGP::TraFiC::Formatter). The output will be written to this area as it is expected for these file to be generated in a single step.
Path to output of Sanger::CGP::TraFiC::Formatter
Define the minimum number of reads required to generate a cluster. Defaults to 5.
Integer, number of reads that must exist within a cluster.
Load the hit files, cluster and output in cluster format.
No arguments as paths are determined at higher level.
Compare the clustered data for pos and neg data identifying reciprocal entities. A reciprocal output file is generated. See formatting section.
No arguments as paths are determined at higher level.