Module implementing the TFFMs.
platform: | Unix |
---|---|
synopsis: | Define the class representing the Transcription Factor Flexible Models and the necessary functions to manipulate them. |
todo: | Allow the construction of TFFMs using a different de novo motif finding tool than MEME. |
Bases: ghmm.DiscreteEmissionHMM
Define the Transcription Factor Flexible Models.
Note
Instances of this class have to be created through the functions tffm_from_xml() or tffm_from_meme().
Delete the underlying C structures.
Note: | The destruction is made using the ghmm.DiscreteEmissionHMM destructor. |
---|
Construct an instance of the TFFM class.
Parameters: |
|
---|---|
Raises: | exceptions.TFFMKindError when the given kind is neither ‘1st-order’ nor ‘detailed’. |
Give the length of the TFFM, i.e. the number of nucleotides in the model excluding the background.
Compute the TFBS hits on a sequence given the posterior probabilities and construct the corresponding instances of HIT.
Parameters: |
|
---|---|
Returns: | The list of TFBS hits predicted on the sequence strand. |
Return type: | list of HIT |
Predict TFBS hits in the sequence given the TFFM.
Parameters: |
|
---|---|
Returns: | The list of TFBS hits predicted on the sequence strand. |
Return type: | list of HIT |
Get the posterior probabilities at each nucleotide position given the TFFM.
Parameters: | sequence_split (list) – The sequence splitted in subsequences to not consider non ACGT nucleotides. |
---|---|
Returns: | The posterior probabilities at each position of the sequence. |
Return type: | list of list |
Note: | One example of a sequence_split is [“ACT”, “N”, “ATC”]. |
Return the new trimmed HMM.
Parameters: |
|
---|---|
Returns: | The new trimmed HMM. |
Return type: | ghmm.DiscreteEmissionHMM |
Todo: | Raise an error rather than a sys.exit() when the trimmed HMM becomes empty. |
Return the emission probabilities of the nucleotides in the background state.
Returns: | A dictionnary with characters ‘A’, ‘C’, ‘G’, and ‘T’ as keys and the corresponding probabilities as values. |
---|---|
Return type: | dict |
Give the list of final states in the HMM (i.e. corresponding to the last matching position in the TFFM).
Returns: | A list of final states as int. |
---|---|
Return type: | list |
Get the emission probabilities of ACGT at position position and update the emission probabilities in position_proba given the emission probabilities at the previous position (previous_position_proba).
Note: | This function is used state by state and several states represent the same position in detailed TFFM, this is why we need to update the probabilities listed in position_proba. |
---|---|
Parameters: |
|
Returns: | The emission probabilities of ACGT by the state indexed by index at position position in the TFFM. |
Return type: | list |
Give the information content of the whole TFFM.
Returns: | A float corresponding to the information content of the TFFM. |
---|---|
Return type: | float |
Give the position of the first matching state.
Returns: | The position of the first matching state of the TFFM. |
---|---|
Return type: | float |
Warning: | The position is given 0-based. |
Give the information content for every positions of the motif modeled by the TFFM.
Returns: | A list of floats giving the information contents of the positions. |
---|---|
Return type: | list |
Note: | The output is an ordered list following the order of the positions within the motif. |
Get the first and last significant position the TFFM where the insignificant positions are the ones on the edges with low information content.
Parameters: | threshold (float) – The minimal information content to consider a position to be significant. |
---|---|
Returns: | The positions of the first and last positions that are to be considered significant (given in this order). |
Return type: | tuple |
Trim the current TFFM by removing edges with low information content.
Parameters: |
|
---|---|
Returns: | A TFFM corresponding to the current TFFM trimmed. |
Return type: | |
See also: |
Apply the TFFM on the fasta sequences and return the Pocc value (probability of occupancy) for each sequence.
Parameters: |
|
---|---|
Returns: | Pocc values through a generator. |
Return type: | Generator of HIT |
Note: | (0.0<= threshold <=1.0) |
Print the svg code of the corresponding dense logo (i.e. displaying the dinucleotide dependencies captured by the TFFM).
Parameters: | output (file) – Stream where to output the svg (defaut: sys.stdout). |
---|---|
Note: | The output argument is not a file name but it is an already open file stream. |
Print the svg code of the corresponding summary logo (i.e. similar to a regular sequence logo).
Parameters: | output (file) – Stream where to output the svg (defaut: sys.stdout). |
---|---|
Note: | The output argument is not a file name but it is an already open file stream. |
Apply the TFFM on the fasta sequence and return the TFBS hits.
Parameters: |
|
---|---|
Returns: | TFBS hits. |
Return type: | list of HIT |
Note: | (0.0<= threshold <=1.0) |
Apply the TFFM on the fasta sequences and return the TFBS hits.
Parameters: |
|
---|---|
Returns: | TFBS hits through a generator. |
Return type: | Generator of HIT |
Note: | (0.0<= threshold <=1.0) |
Train the TFFM using the fasta sequences to learn emission and transition probabilities.
Note: | The training of the underlying HMM is made using the Baum-Welsh algorithm. |
---|---|
Parameters: |
|
Trim the current TFFM by removing edges with low information content.
Parameters: | threshold (float) – The minimal information content value for an edge TFFM match position to be kept. |
---|---|
Warning: | Trims the TFFM in place. To preserve the TFFM, use the get_trimmed() method which returns a trimmed copy of the TFFM but does not alter this TFFM. |
See also: | get_trimmed() |
Give the best hit in a sequence by considering both positive and negative strands.
Parameters: |
|
---|---|
Returns: | The best hit (None if no hit). |
Return type: | HIT |
Compute the entropy given the emission probabilities of the ACGT nucleotides.
Parameters: | emissions (list of float) – Emission probabilities of the ACGT nucleotides. |
---|---|
Returns: | The computed entropy. |
Return type: | float |
Warning: | The list gives the probabilities corresponding to A, C, G, and T in this order. |
Create a 0-order HMM initialized from MEME result
Parameters: |
|
---|---|
Returns: | The constructed HMM |
Return type: | ghmm.DiscreteEmissionHMM |
Create a 1st-order HMM initialized from MEME result
Parameters: |
|
---|---|
Returns: | The constructed HMM |
Return type: | ghmm.DiscreteEmissionHMM |
Create a detailed HMM initialized from MEME result
Parameters: |
|
---|---|
Returns: | The constructed HMM |
Return type: | ghmm.DiscreteEmissionHMM |
Merges the hits from both strands.
Parameters: |
|
---|---|
Returns: | A list containing the TFBS hits (empty if no hit). |
Return type: | list |
Note: | The two input lists are required to be ordered following the positions on the sequence. The best hit per position is given. When no hit has been found at a position, the constant None is used. |
Construct a TFFM from the output of MEME on ChIP-seq data.
Parameters: |
|
---|---|
Returns: | The TFFM initialized from MEME results. |
Return type: | |
Note: | As the PFM is used to initialize the TFFM, a pseudocount of 1 is added to all the values in the PFM |
Construct an initialized TFFM from a PFM.
Parameters: |
|
---|---|
Returns: | The TFFM initialized from the PFM |
Return type: | |
See also: | |
Note: | As the PFM is used to initialize the TFFM, a pseudocount of 1 is added to all the values in the PFM |
Construct a TFFM described in an XML file.
Parameters: |
|
---|---|
Returns: | The TFFM described in the XML file. |
Return type: |
Module author: Anthony Mathelier <amathelier@cmmt.ubc.ca>