Welcome to TFFM’s documentation!¶
We provide here the documentation of the TFFM-framework developed in Python. The Transcription Factor Flexible Models (TFFMs) represent TFBSs and are based on hidden Markov models (HMM). They are flexible and are able to model both position interdependence within TFBSs and variable length motifs within a single dedicated framework.
The framework also implements methods to generate a new graphical representation of the modeled motifs that convey properties of position interdependences.
TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that the new HMM-based framework performs, in most cases, better than both PWMs and the dinucleotide weight matrix (DWM) extension in discriminating motifs within ChIP-seq sequences from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we observe that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS predictions. All the details have been published in Mathelier and Wasserman, The Next Generation of Transcription Binding Site Prediction, PLOS Computational Biology , Sept. 2013, 9(9):e1003214, DOI:10.1371/journal.pcbi.1003214.
TFFMs can be saved and opened from files using the XML format already used by the GHMM library.
System requirements¶
The TFFM-framework 2.0 has been developed and tested under Ubuntu Linux operating system. It has also been tested on CentOS.
Python should be installed (version 2.7 has been used successfully).
Biopython (at least version 1.61) should be installed and accessible from your Python executable. See http://biopython.org for instructions on how to install it.
The GHMM library should be installed and accessible from Python. See http://ghmm.org for instructions on how to install it.
Contents¶
Installation using conda and launching the tutorial¶
First, install conda and bioconda using the instructions:
conda create --name tffm1 python=2.7 biopython=1.70 ghmm
This will require agreeing to install many packages, and takes a minute or two.
Activate the conda environment:
conda activate tffm
Download TFFM as a git repository:
git clone https://github.com/wassermanlab/TFFM.git
Change to tutorial directory and launch python:
cd TFFM/tutorial
python
Tutorial¶
Data for this tutorial can be found in the tutorial directory of the TFFM-framework package at https://github.com/wassermanlab/TFFM/tree/master/tutorial. To run this example, you need to be located in the tutorial directory. The location to be added to sys.path (second line of the tutorial) corresponds to the directory containing the TFFM-framework.
>>> import sys
>>> sys.path.append(<TFFM-framework repository path>)
>>> import tffm_module
>>> from constants import TFFM_KIND
>>>
>>> tffm_first_order = tffm_module.tffm_from_meme("meme.txt", TFFM_KIND.FIRST_ORDER)
>>> tffm_first_order.write("tffm_first_order_initial.xml")
>>> tffm_first_order.train("train.fa")
>>> tffm_first_order.write("tffm_first_order.xml")
>>> out = open("tffm_first_order_summary_logo.svg", "w")
>>> tffm_first_order.print_summary_logo(out)
>>> out.close()
>>> out = open("tffm_first_order_dense_logo.svg", "w")
>>> tffm_first_order.print_dense_logo(out)
>>> out.close()
>>>
>>> tffm_detailed = tffm_module.tffm_from_meme("meme.txt", TFFM_KIND.DETAILED)
>>> tffm_detailed.write("tffm_detailed_initial.xml")
>>> tffm_detailed.train("train.fa")
>>> tffm_detailed.write("tffm_detailed.xml")
>>> out = open("tffm_detailed_summary_logo.svg", "w")
>>> tffm_detailed.print_summary_logo(out)
>>> out.close()
>>> out = open("tffm_detailed_dense_logo.svg", "w")
>>> tffm_detailed.print_dense_logo(out)
>>> out.close()
>>>
>>> tffm_first_order = tffm_module.tffm_from_xml("tffm_first_order.xml",
... TFFM_KIND.FIRST_ORDER)
>>> print "1st-order all"
1st-order all
>>> for hit in tffm_first_order.scan_sequences("test.fa"):
... if hit:
... print hit
...
hg19_ct_UserTrack_3545_1911 1 15 - CCCAATCTGGGGGAG TFFM 16 2.0412237861770574e-05
hg19_ct_UserTrack_3545_1911 2 16 - TCCCAATCTGGGGGA TFFM 16 5.972631515438409e-10
hg19_ct_UserTrack_3545_1911 3 17 + CCCCCAGATTGGGAG TFFM 16 7.302219996694176e-10
hg19_ct_UserTrack_3545_1911 3 17 - CTCCCAATCTGGGGG TFFM 16 4.283052089644644e-10
hg19_ct_UserTrack_3545_1911 4 18 + CCCCAGATTGGGAGA TFFM 16 2.3867153510744474e-11
hg19_ct_UserTrack_3545_1911 4 18 - TCTCCCAATCTGGGG TFFM 16 1.809727472566856e-09
...
...
hg19_ct_UserTrack_3545_3739 84 98 + AGGAAGGCAGTCTGA TFFM 16 1.4846886217808363e-13
hg19_ct_UserTrack_3545_3739 84 98 - TCAGACTGCCTTCCT TFFM 16 1.221285402074912e-07
hg19_ct_UserTrack_3545_3739 85 99 + GGAAGGCAGTCTGAG TFFM 16 4.6592155264443255e-12
hg19_ct_UserTrack_3545_3739 85 99 - CTCAGACTGCCTTCC TFFM 16 9.886433151452122e-09
hg19_ct_UserTrack_3545_3739 86 100 + GAAGGCAGTCTGAGC TFFM 16 4.299143127628496e-07
hg19_ct_UserTrack_3545_3739 87 101 + AAGGCAGTCTGAGCC TFFM 16 3.147869508275758e-08
>>> print "1st-order best"
1st-order best
>>> for hit in tffm_first_order.scan_sequences("test.fa", only_best=True):
... if hit:
... print hit
...
hg19_ct_UserTrack_3545_1911 33 47 - TAGGCCCGGAGGGAG TFFM 16 0.7487725826821764
hg19_ct_UserTrack_3545_3739 29 43 + CCTGCCCCCAGGGTG TFFM 16 0.9670956290505575
>>> tffm_detailed = tffm_module.tffm_from_xml("tffm_detailed.xml",
... TFFM_KIND.DETAILED)
>>> print "detailed all"
detailed all
>>> for hit in tffm_detailed.scan_sequences("test.fa"):
... if hit:
... print hit
...
hg19_ct_UserTrack_3545_1911 1 15 + CTCCCCCAGATTGGG TFFM 62 1.2342473696772854e-08
hg19_ct_UserTrack_3545_1911 1 15 - CCCAATCTGGGGGAG TFFM 62 6.845206246155106e-06
hg19_ct_UserTrack_3545_1911 2 16 + TCCCCCAGATTGGGA TFFM 60 9.158988688768735e-10
hg19_ct_UserTrack_3545_1911 2 16 - TCCCAATCTGGGGGA TFFM 60 2.6048552465107907e-11
hg19_ct_UserTrack_3545_1911 3 17 + CCCCCAGATTGGGAG TFFM 62 1.841150116737821e-09
hg19_ct_UserTrack_3545_1911 3 17 - CTCCCAATCTGGGGG TFFM 62 8.645334758769172e-12
...
...
hg19_ct_UserTrack_3545_3739 86 100 + GAAGGCAGTCTGAGC TFFM 61 1.5796330106346655e-05
hg19_ct_UserTrack_3545_3739 86 100 - GCTCAGACTGCCTTC TFFM 61 1.1487113198838786e-10
hg19_ct_UserTrack_3545_3739 87 101 + AAGGCAGTCTGAGCC TFFM 61 4.790574256236894e-07
hg19_ct_UserTrack_3545_3739 87 101 - GGCTCAGACTGCCTT TFFM 63 1.5113773626351342e-09
>>> print "detailed best"
detailed best
>>> for hit in tffm_detailed.scan_sequences("test.fa", only_best=True):
... if hit:
... print hit
...
hg19_ct_UserTrack_3545_1911 31 45 + CTCTCCCTCCGGGCC TFFM 61 0.8256588185762648
hg19_ct_UserTrack_3545_3739 29 43 + CCTGCCCCCAGGGTG TFFM 62 0.9683269774058026
Download¶
The TFFM-framework can be downloaded from GitHub at https://github.com/wassermanlab/TFFM.
Reference¶
The TFFMs and the TFFM-framework have been described in Mathelier and Wasserman, The Next Generation of Transcription Binding Site Prediction, PLOS Computational Biology, Sept. 2013, 9(9):e1003214, DOI:10.1371/journal.pcbi.1003214. Please cite this publication when publishing results using the TFFM-framework.
Licence¶
The TFFM-framework has been developed under the GNU Lesser General Public Licence (see http://www.gnu.org/copyleft/lesser.html and LICENCE).
The TFFM-framework uses the GHMM library which is also licenced under the GNU LGPL.