Analyze Protein Missense Variants

Enter Your Mutations:

Overview:


The Functional Analysis through Hidden Markov Models (fathmm) software and server is is capable of predicting the functional effects of protein missense mutations by combining sequence conservation within hidden Markov models (HMMs), representing the alignment of homologous sequences and conserved protein domains, with "pathogenicity weights", representing the overall tolerance of the corresponding model to mutations.

For more information, please refer to the following publication:

Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GLA, Edwards KJ, Day INM, Gaunt, TR. (2013). Predicting the Functional, Molecular and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models. Hum. Mutat., 34:57-65 fathmm Paper


Back to Top ...

Input Format:


Our software accepts one of the following formats (see here for annotating VCF files):

  • <protein> <substitution>
  • dbSNP rs identifiers

In the above, <protein> is the protein identifier and <substitution> is the amino acid substitution in the conventional one letter format. At present, our server accepts SwissProt/TrEMBL, RefSeq and Ensembl protein identifiers, e.g.:

P43026 L441P
or:

rs137854462


Back to Top ...

Batch Submission:


It is possible to submit multiple amino acid substitutions as a 'Batch Submission' via our server. Here, all amino acid substitutions for a protein can be entered on a single line and should be separated by a comma, e.g:

P43026 L441P
ENSP00000325527 N548I,E1073K,C2307S 

Note: this option is not available when analysing dbSNP rs identifiers.


Back to Top ...

Prediction Algorithm:


As described in our paper, our software is comprised of two algorithms: one sequence/conservation based (Unweighted) and the other combines our Unweighted algorithm with pathogenicity weights (Weighted). In short, our Weighted algorithm is capable of adjusting our conservation-based predictions to account for the tolerance of related sequences to mutations. For example, mutations falling within diverse regions of the Cellular Tumor Antigen P53 can be up-weighted according to the critical role the protein plays in cell regulation. In contrast, mutations falling within conserved regions of the MHC Antigen-Regognition Domain can be down-weighted according the hypervariable nature of the domain.


Back to Top ...

Phenotype Associations:


Our software not only predicts the potentially deleterious nature of protein variants but it is also capable of annotating the molecular and phenotypic consequences of these mutations via several domain-centric ontologies (dcGO). Here, the molecular consequences of mutations are statistically inferred by mapping SUPERFAMILY domains onto the Gene Ontology, the Human Phenotype Ontology and the Mammalian Phenotype Ontology (and more).

For more information on these mappings, please refer to the following publications:

Fang H, Gough J. (2012). dcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more. Nucleic Acids Res., 41, D536-544.

Gough J, Karplus K, Hughey R, Chothia C. (2001). Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol., 313, 903-919.


Back to Top ...

VCF Annotation:


Unfortunately, due to disk space constraints, we are unable to annotate Variant Call Format (VCF) files on your behalf. However, the consequences of all VCF variants can be derived using the Ensembl Variant Effect Predictor (VEP). Once annotated, the following script (available here) is capable of parsing these annotations and will provide you with a list of protein consequences which can then be used as input into our server/software.

Additional help on using our script is available by typing the following command:

python parseVCF.py --help


Back to Top ...