Our disease-specific algortihm is is capable of ranking protein missense mutations according to seventeen disease concepts by combining sequence
conservation within hidden Markov models (HMMs), representing the alignment of homologous sequences and conserved protein domains, with "pathogenicity weights",
representing the overall tolerance of the corresponding model to disease concepts, e.g. musculoskeletal and/or metabolic disease.
For more information, please refer to the following publications:
Shihab HA, Gough J, Mort M, Cooper DN, Day INM, Gaunt, TR.
A Method for Ranking Non-Synonymous Single Nucleotide Polymorphisms based on Disease Concepts
(submitted)
Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GLA, Edwards KJ, Day INM, Gaunt, TR. (2013). Predicting the Functional, Molecular and Phenotypic
Consequences of Amino Acid Substitutions using Hidden Markov Models. Hum. Mutat., 34:57-65
Our software accepts one of the following formats (see here for annotating VCF files):
<protein> <substitution>
dbSNP rs identifiers
<protein>
is the protein identifier and <substitution>
is the amino acid substitution in the conventional one
letter format. At present, our server accepts SwissProt/TrEMBL, RefSeq and Ensembl protein identifiers, e.g.:
P43026 L441Por:
rs137854462
It is possible to submit multiple amino acid substitutions as a 'Batch Submission' via our server. Here, all amino acid substitutions for a protein can be
entered on a single line and should be separated by a comma, e.g:
P43026 L441P ENSP00000325527 N548I,E1073K,C2307S
Our disease-specific predictions are still experimental; therefore, we have not defined clear prediction thresholds for identifying whether a mutation is associated with your disease of interest or not. However, predictions scoring less than zero indicate there is a chance the mutation is associated with your disease of interest, with lower scores indicating increased confidence in the association.
Unfortunately, due to disk space constraints, we are unable to annotate Variant Call Format (VCF) files on your behalf. However, the consequences of all VCF variants
can be derived using the Ensembl Variant Effect Predictor (VEP).
Once annotated, the following script (available here) is capable of parsing these annotations and will provide you with a list of protein
consequences which can then be used as input into our server/software.
Additional help on using our script is available by typing the following command:
python parseVCF.py --help