<<

NAME

Sanger::CGP::TraFiC::Cluster

SYNOPSIS

 my $cluster = Sanger::CGP::TraFiC::Cluster->new;

 $cluster->set_input_output($output_folder);
    # which should already contain data from Formatter.pm

 $cluster->set_min_reads($minimum_reads_required);
 $cluster->cluster_hits;
 $cluster->reciprocal_clusters;

GENERAL

Take the merged repeat-masker *.out and fasta data generated by Formatter.pm and generate clusters of nearby/overlapping events.

Generates three files:

pos_clusters.txt

Data for reads anchored on positive strand that are able to be clustered with greater or equal to the value of min_reads.

neg_clusters.txt

Data for reads anchored on negative strand that are able to be clustered with greater or equal to the value of min_reads.

reciprocal_clusters.txt

Where entries in pos/neg_clusters.txt support each other they are additionally output here. The pos_cluster is always the primary entry.

Format of output

The output files all follow the same format

Core fields

These are present at the start of each line in every file.

CHR

Chromosome or sequence identifier.

FAMILY

Masking identified reads as hitting repeat of this type.

Data fields

These are present in the pos/neg_clusters.txt files as described here.

In the reciprocal_clusters.txt file these are presented twice, the positive cluster first (prefixed P_) and then the negative cluster data (prefixed N_).

L_POS

Left most position of cluster.

R_POS

Left most position of cluster.

TOTAL_READS

Total number of reads that support this cluster

SINGLE_END_COUNT

Number of single end mapped reads that contribute to this cluster.

INTER_CHROM_COUNT

Number of inter-chromosomal mapped reads that contribute to this cluster.

ABERRANT_COUNT

Number of aberrantly paired reads that contribute to this cluster.

SINGLE_END_READS

Names of reads contributing to SINGLE_END_COUNT.

INTER_CHROM_READS

Names of reads contributing to INTER_CHROM_COUNT.

ABERRANT_READS

Names of reads contributing to ABERRANT_COUNT.

METHODS

Constructor/configuration

new

No options, sets up object with default values.

set_input_output

Description

Define the path to the input files (generated by Sanger::CGP::TraFiC::Formatter). The output will be written to this area as it is expected for these file to be generated in a single step.

Args

Path to output of Sanger::CGP::TraFiC::Formatter

set_min_reads

Description

Define the minimum number of reads required to generate a cluster. Defaults to 5.

Args

Integer, number of reads that must exist within a cluster.

Processing

cluster_hits

Description

Load the hit files, cluster and output in cluster format.

Note

No arguments as paths are determined at higher level.

reciprocal_clusters

Description

Compare the clustered data for pos and neg data identifying reciprocal entities. A reciprocal output file is generated. See formatting section.

Note

No arguments as paths are determined at higher level.

<<