Genome-wide association studies (GWAS) have become an essential tool in assisting the improvement of understanding of neuropathology of diseases. Moreover, the imputation of genotyping data with missing information can enhance the power of GWAS by using haplotype relationships between genetic variants [1,2]. In the meantime, the imputation can substantially reduce the cost of genotyping large number of samples. Furthermore, the comparison of significant findings across different cohorts and genotyping platforms is also achievable. Currently large amounts of genotyping data are produced at an unprecedented rate to be analyzed; however, the preparation of ready-to-use data sets needs to be of high quality. Thus the implementation of an efficient and reliable genetic data pre-processing and imputation pipeline is desirable.
In order to ensure the reliability and reproducibility of genome-wide association study (GWAS) data, we set up an efficient, automatic and comprehensive genotype data processing and imputation pipeline termed Gimpute. It consists of pre-processing (genetic variant information updating/matching/liftOver, quality control of genetic variants and study samples, the alignment of variants to the imputation references), pre-phasing, imputation and post-imputation quality control [3], as well as an extension to the Genipe framework in this vignette, which is easy-to-follow and user-friendly.
Gimpute runs on any 64-bit x86 Linux distribution and it requires the following tools [4,5,6,7]:
You can install the tools from the command line as the following examples:
## PLINK
wget https://www.cog-genomics.org/static/bin/plink180717/plink_linux_x86_64.zip
unzip plink_linux_x86_64.zip
## GCTA64
wget http://cnsgenomics.com/software/gcta/gcta_1.91.5beta.zip
unzip gcta_1.91.5beta.zip
## SHAPEIT
wget https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.v2.r904.glibcv2.17.linux.tar.gz
tar -zxvf shapeit.v2.r904.glibcv2.17.linux.tar.gz
## IMPUTE2
wget https://mathgen.stats.ox.ac.uk/impute/impute_v2.3.2_x86_64_static.tgz
tar -zxvf impute_v2.3.2_x86_64_static.tgz
## GTOOL
wget http://www.well.ox.ac.uk/~cfreeman/software/gwas/gtool_v0.7.5_x86_64.tgz
tar -zxvf gtool_v0.7.5_x86_64.tgz
## QCTOOLv2
wget http://www.well.ox.ac.uk/~gav/resources/qctool_v2.0-CentOS6.8-x86_64.tgz
tar -zxvf qctool_v2.0-CentOS6.8-x86_64.tgz
Note that, IMPUTE4 can be downloaded only after applying for access. IMPUTE4 and QCTOOLv2 are not required if you are using IMPUTE2 for the imputation.
Development version from Github:
install.packages("devtools")
library("devtools")
install_github("transbioZI/Gimpute", build_vignettes=TRUE)
git clone https://github.com/transbioZI/Gimpute
R CMD build Gimpute
R CMD INSTALL Gimpute_*.tar.gz
Gimpute runs on any 64-bit x86 Linux distribution. Additional dependencies are described below.
PLINK format files (PED/MAP or BED/BIM/FAM) are required as the input genotyping data for our pipeline. Free datasets are available under ./extdata/ in our package. The way for loading these PLINK binary files is shown as below. Since sharing of genetic genotyping data is strictly prohibited, we did not upload any private genotyping data for this demonstration.
library(Gimpute)
bedFile <- system.file("extdata", "dataChr21.bed", package="Gimpute")
bimFile <- system.file("extdata", "dataChr21.bim", package="Gimpute")
famFile <- system.file("extdata", "dataChr21.fam", package="Gimpute")
The files processed and generated are named in a consistent way across all datasets as follows:
The recommended directories are named as follows:
## Gimpute has the following dependencies:
library(lattice)
library(doParallel)
## Define required tools
toolDIR <- "/home/Gimpute/tools/"
plink <- paste0(toolDIR, "plink")
gcta <- paste0(toolDIR, "gcta64")
shapeit <- paste0(toolDIR, "shapeit")
gtool <- paste0(toolDIR, "gtool")
qctool <- paste0(toolDIR, "qctool")
imputeTool <- "impute4" ##
if (imputeTool == "impute2"){
impute <- paste0(toolDIR, "impute2")
} else if (imputeTool == "impute4"){
impute <- paste0(toolDIR, "impute4.1_r291.2")
}
Genotyping chip annotation file: Most of common chip annotation files can be directly downloaded from Dr. William Rayner’page. This annotation file is particularly important for strand checking and genomic information of genotypes on a variety of genome builds. Gimpute requires the chip annotation file that is organized into two different types:
Imputation reference files: Publicly avaiable reference datasets can be downloaded from IMPUTE2’s reference page. The reference files from “1000 Genome Project Phase1v3” are the default panel used in our package. 1000 Genomes Phase 3 is an alternative reference panel.
## Download imputation reference panels
## Reference panel 1: 1000Gphase1v3_macGT1
wget https://mathgen.stats.ox.ac.uk/impute/ALL_1000G_phase1integrated_v3_impute_macGT1.tgz
tar -zxvf ALL_1000G_phase1integrated_v3_impute_macGT1.tgz
## Reference panel 2: 1000Gphase3
wget https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.tgz
wget https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3_chrX.tgz
tar -zxvf 1000GP_Phase3.tgz
tar -zxvf 1000GP_Phase3_chrX.tgz
A string indicating the type of imputation reference panels is used: c(“1000Gphase1v3_macGT1”, “1000Gphase3”). This must be defined before running pipeline.
## Define the directory where you place the imputation reference files
impRefDIR <- "/home/imputeRef/1000Gphase1v3_macGT1/"
referencePanel <- "1000Gphase1v3_macGT1"
## Alternatively,
impRefDIR <- "/home/imputeRef/1000GP_Phase3/"
referencePanel <- "1000Gphase3"
All files named in this subsection are in the directory 1-genoUpdate/.
In this subsection, the function updateGenoInfo()
is mandatory and updates genotype information of original genotype data by the use of metadata and configuration files.
## Not run
bedFile <- system.file("extdata", "dataChr21.bed", package="Gimpute")
bimFile <- system.file("extdata", "dataChr21.bim", package="Gimpute")
famFile <- system.file("extdata", "dataChr21.fam", package="Gimpute")
metadataFile <- system.file("extdata", "1_01_metaData.txt", package="Gimpute")
removedSampIDFile <- system.file("extdata", "excludedSampIDs.txt",
package="Gimpute")
excludedProbeIdsFile <- system.file("extdata", "excludedProbeIDs.txt",
package="Gimpute")
chipAnnoFile <- system.file("extdata", "coriellAffyChip.txt",
package="Gimpute")
inputPrefix <- "dataChr21"
ancestrySymbol <- "EUR"
outputPrefix <- "1_11_removedYMtSnp"
metaDataFile <- "1_01_metaData.txt"
chipType <- "rsIDstudy"
updateGenoInfo(plink, inputPrefix, metaDataFile, removedSampIDFile,
ancestrySymbol, excludedProbeIdsFile, chipAnnoFile,
chipType, outputPrefix, keepInterFile=TRUE)
Note that, the output files of each step are the input of the next step.
system(paste0("scp ", metadataFile, " ."))
inputPrefix <- "dataChr21"
outputPrefix2 <- "1_02_removedExclInst"
removedSampIDFile <- system.file("extdata", "excludedSampIDs.txt",
package="Gimpute")
removeSampID(plink, removedSampIDFile, inputPrefix,
outputPrefix=outputPrefix2)
metaDataFile <- "1_01_metaData.txt"
outputPrefix3 <- "1_03_replacedGroupAndSex"
updateGroupIdAndSex(plink, inputPrefix=outputPrefix2,
metaDataFile, outputPrefix=outputPrefix3)
outputPrefix4 <- "1_04_removedNoGroupId"
removeNoGroupId(plink, inputPrefix=outputPrefix3, outputPrefix=outputPrefix4)
outputPrefix5 <- "1_05_removedWrongAnceInst"
removedWrongAnceInst(plink, inputPrefix=outputPrefix4, metaDataFile,
ancestrySymbol, outputPrefix=outputPrefix5)
outputPrefix6 <- "1_06_removedExclProbe"
removedExclProbe(plink, inputPrefix=outputPrefix5,
excludedProbeIdsFile, outputPrefix=outputPrefix6)
outputPrefix7 <- "1_07_removedUnmapProbes"
outputSNPfile7 <- "1_07_probesUnmapped2ChipRef.txt"
removedUnmapProbes(plink, inputPrefix=outputPrefix6, chipAnnoFile,
outputPrefix=outputPrefix7, outputSNPfile=outputSNPfile7)
outputSNPdupFile8 <- "1_08_probesDouble.txt"
outputPrefix8 <- "1_08_removedDoubleProbes"
removedDoubleProbes(plink, inputPrefix=outputPrefix7, chipAnnoFile,
chipType, outputSNPdupFile=outputSNPdupFile8,
outputPrefix=outputPrefix8)
outputPrefix9 <- "1_09_updatedSnpInfo"
updatedSnpInfo(plink, inputPrefix=outputPrefix8,
chipAnnoFile, chipType, outputPrefix=outputPrefix9)
outputPrefix10 <- "1_10_splitXchr"
splitXchr(plink, inputPrefix=outputPrefix9, outputPrefix=outputPrefix10)
outputPrefix11 <- "1_11_removedYMtSnp"
removedYMtSnp(plink, inputPrefix=outputPrefix10, outputPrefix=outputPrefix11)
All files named in this subsection are in the directory 2-genoQC/.
In this subsection, the function genoQC()
is mandatory and performs QC on the genotype data. removedSnpHetX()
, removedMaleHetX()
, and removeOutlierByPCs()
are optional and can be used part of QC . removedSnpHetX()
and removedMaleHetX()
exclude heterozygous markers in male, and males with haploid heterozygous SNPs on chromosome X. Our default QC procedure does not remove any population outliers before imputation. However, if your study data includes significant population outliers and this may affect the qualtiy of subquent imputation analysis. For this, it is recommended to remove them by the function removeOutlierByPCs()
after looking at PCA plot generated by plotPCA4plink()
.
This step does QC in a single function, including all required substeps to ensure that the QC-ed genotype data is of high quality prior to imputation.
inputPrefix <- "2_02_removedInstHetX"
outputPrefix <- "2_13_removedSnpHweFemaleX"
genoQC(plink, inputPrefix,
snpMissCutOffpre=0.05,
sampleMissCutOff=0.02,
Fhet=0.2, cutoffSubject=0.05, cutoffSNP=0.1,
snpMissCutOffpost=0.02,
snpMissDifCutOff=0.02,
femaleChrXmissCutoff=0.05,
pval4autoCtl=0.000001,
pval4femaleXctl=0.000001, outputPrefix, keepInterFile=TRUE)
Alternatively, we elaborate step by step instructions of QC procedure in genoQC()
for better understanding. Note that, the output files of each step are the input of the next step.
inputPrefix <- "1_11_removedYMtSnp"
hhCutOff <- 0.005 ## can be tuned
outputPrefix <- "2_01_removedSnpHetX"
outputHetSNPfile <- "2_01_snpHHfreqAll.txt"
outputRetainSNPfile <- "2_01_snpHHfreqRetained.txt"
removedSnpHetX(plink, inputPrefix, hhCutOff, outputPrefix,
outputHetSNPfile, outputRetainSNPfile)
inputPrefix <- "2_01_removedSnpHetX"
hhSubjCutOff <- 15
outputPrefix <- "2_02_removedInstHetX"
outputSubjHetFile <- "2_02_instHetXfreqAll.txt"
outputRetainSubjectFile <- "2_02_instHetXfreqRetained.txt"
outputHetSNPfile <- "2_02_snpHHfreqAll.txt"
removedMaleHetX(plink, inputPrefix, hhSubjCutOff,
outputPrefix, outputSubjHetFile,
outputRetainSubjectFile, outputHetSNPfile)
These two steps are automatically skipped if you are working on the autosomal data.
outputPrefix3 <- "2_03_setHeteroHaploMissing"
setHeteroHaploMissing(plink, inputPrefix, outputPrefix=outputPrefix3)
outputPrefix4 <- "2_04_removedSnpMissPre"
removedSnpMiss(plink, snpMissCutOff=0.05,
inputPrefix=outputPrefix3, outputPrefix=outputPrefix4)
outputPrefix5 <- "2_05_removedInstMiss"
removedInstMiss(plink, sampleMissCutOff=0.02,
inputPrefix=outputPrefix4, outputPrefix=outputPrefix5)
outputPrefix6 <- "2_06_removedInstFhet"
removedInstFhet(plink, Fhet=0.2,
inputPrefix=outputPrefix5, outputPrefix=outputPrefix6)
outputPrefix7 <- "2_07_removedParentIdsMiss"
removedParentIdsMiss(plink, inputPrefix=outputPrefix6,
outputPrefix=outputPrefix7)
outputPrefix8 <- "2_08_removedMendelErr"
removedMendelErr(plink, inputPrefix=outputPrefix7, cutoffSubject,
cutoffSNP, outputPrefix=outputPrefix8)
outputPrefix9 <- "2_09_removedSnpMissPost"
removedSnpMiss(plink, snpMissCutOff=0.02,
inputPrefix=outputPrefix8, outputPrefix=outputPrefix9)
outputPrefix10 <- "2_10_removedSnpMissDiff"
groupLabel <- getGroupLabel(inputFAMfile=paste0(outputPrefix8, ".fam"))
removedSnpMissDiff(plink, inputPrefix=outputPrefix9, snpMissDifCutOff=0.02,
outputPrefix=outputPrefix10, groupLabel)
outputPrefix11 <- "2_11_removedSnpFemaleChrXmiss"
removedSnpFemaleChrXmiss(plink, femaleChrXmissCutoff=0.05,
inputPrefix=outputPrefix10,
outputPrefix=outputPrefix11)
outputPvalFile <- "2_12_snpHwePvalAuto.txt"
outputSNPfile <- "2_12_snpRemovedHweAuto.txt"
outputPrefix12 <- "2_12_removedSnpHweAuto"
if (groupLabel == "control" | groupLabel == "caseControl"){
removedSnpHWEauto(groupLabel="control", plink,
inputPrefix=outputPrefix11,
pval=0.000001, outputPvalFile,
outputSNPfile, outputPrefix=outputPrefix12)
}
outputPrefix <- "2_13_removedSnpHweFemaleX"
outputPvalFile <- "2_13_snpHwePvalfemaleXct.txt"
outputSNPfile <- "2_13_snpRemovedHweFemaleXct.txt"
removedSnpFemaleChrXhweControl(plink, inputPrefix=outputPrefix12,
pval=0.000001, outputPvalFile,
outputSNPfile, outputPrefix)
inputPrefix <- "2_13_removedSnpHweFemaleX"
outputPC4subjFile <- "2_14_eigenvalAfterQC.txt"
outputPCplotFile <- "2_14_eigenvalAfterQC.png"
plotPCA4plink(gcta, inputPrefix, nThread=20, outputPC4subjFile, outputPCplotFile)
cutoff <- c(-0.4, 0.2)
cutoffSign <- "greater" ## not used if cutoff == NULL, and with two values
inputPC4subjFile <- "2_14_eigenvalAfterQC.txt"
outputPC4outlierFile <- "2_14_eigenval4outliers.txt"
outputPCplotFile <- "2_14_removedOutliers.png"
outputPrefix <- "2_14_removedOutliers"
removeOutlierByPCs(plink, gcta, inputPrefix, nThread=20,
cutoff, cutoffSign, inputPC4subjFile,
outputPC4outlierFile, outputPCplotFile, outputPrefix)
All files named in this subsection are in the directory 3-checkAlign/.
The function checkAlign2ref()
performs the alignment against a given reference
panel by considering the following parameters: variant name, genomic position and
the allele presentation. Output files are generated in order.
renamePlinkBFile(inputPrefix="2_14_removedOutliers",
outputPrefix="3_1_QCdata", action="move")
inputFile <- paste0(impRefDIR,"*.legend.gz")
## To recode the intermediate disk usage
## keep in the same directory.
bimReferenceFile <- "bimImputeRef.txt"
.prepareLegend2bim(inputFile, referencePanel,
outputFile=bimReferenceFile, ncore=nCores)
inputPrefix <- "3_1_QCdata"
out2.snp <- "3_2_snpSameNameDiffPos"
out2 <- "3_2_removedSnpSameNameDiffPos"
out3 <- "3_3_removedSnpMissPos"
out3.snp <- "3_3_snpMissPos"
out4 <- "3_4_removedSnpDiffAlleles"
out4.snp <- "3_4_snpDiffAlleles"
out4.snpRetained <- "3_4_snpImpRefAlleles"
nCores <- detectCores()
checkAlign2ref(plink, inputPrefix, referencePanel, bimReferenceFile,
out2, out2.snp, out3, out3.snp,
out4, out4.snp, out4.snpRetained, nCore=nCores)
All files named in this subsection are in the directory 4-imputation/.
In this subsection, three major functions removedMonoSnp()
, phaseImpute()
and postImpQC()
are used for the purpose of excluding monomorphic SNPs, phasing, imputation and post imputation QC.
outputPrefix <- "4_1_removedMonoSnp"
outputMonoSNPfile <- "4_1_snpMonoRemoved.txt"
removedMonoSnp(plink, inputPrefix="3_4_removedSnpDiffAlleles",
outputPrefix, outputSNPfile=outputMonoSNPfile)
inputPrefix <- "4_1_removedMonoSnp"
outputPrefix <- "4_2_imputedDataset"
outputInfoFile <- "4_2_snpImputedInfoScore.txt"
tmpImputeDir <- paste0(imputeTool, "_", referencePanel)
nCores <- detectCores()
phaseImpute(inputPrefix, outputPrefix, autosome=TRUE,
plink, shapeit, imputeTool, impute, qctool, gtool,
windowSize=3000000, effectiveSize=20000,
nCore=nCores, threshold=0.9, outputInfoFile,
referencePanel, impRefDIR, tmpImputeDir, keepTmpDir=TRUE)
inputPrefix <- "4_2_imputedDataset"
out1 <- "4_3_wellImputeData"
out2 <- "4_4_removedMonoSnpAfter"
out3 <- "4_5_addedMonoSnpAfter"
out4 <- "4_6_removedSnpMissPostImp"
outputInfoFile <- "4_2_snpImputedInfoScore.txt"
outputMonoSNPfile <- "4_1_snpMonoRemoved.txt"
prefixAlign2ref <- "3_4_removedSnpDiffAlleles"
outRemovedSNPfile <- "4_6_snpRemovedMissPostImp.txt"
outRetainSNPfile <- "4_6_snpRetainMissPostImp.txt"
postImpQC(plink, inputPrefix, out1, out2, out3, out4,
outputInfoFile, infoScore=0.6, outputMonoSNPfile,
prefixAlign2ref, missCutoff=20, outRemovedSNPfile,
outRetainSNPfile, referencePanel)
All files named in this subsection are in the directory 5-reductAndExpand/.
reductExpand()
function extracts the genotypes from well imputed data so that PLINK data with only SNPs before imputation are remained. On the basis of extracted genotypes, a subset of non-imputed genotypes that are different from the imputation reference panel are then added, including SNPs with different alleles, missing positions, and different positions.
inputPrefix <- "4_6_removedSnpMissPostImp"
inputQCprefix <- "3_1_QCdata"
snpRefAlleleFile <- "3_4_snpImpRefAlleles.txt"
snpDiffAlleleFile <- "3_4_snpDiffAlleles.txt"
snpMissPosFile <- "3_3_snpMissPos.txt"
snpSameNameDifPosFile <- "3_2_snpSameNameDiffPos.txt"
dir5 <- "./5-reductAndExpand/"
## imputed dataset
system(paste0("scp ./4-imputation/", inputPrefix, ".* ", dir5))
system(paste0("scp ./3-checkAlign/", inputQCprefix, ".* ", dir5))
system(paste0("scp ./3-checkAlign/", snpRefAlleleFile, " ", dir5))
system(paste0("scp ./3-checkAlign/", snpDiffAlleleFile, " ", dir5))
system(paste0("scp ./3-checkAlign/", snpMissPosFile, " ", dir5))
system(paste0("scp ./3-checkAlign/", snpSameNameDifPosFile, " ", dir5))
setwd(dir5)
reducedToSpecificfn <- "5_1_reducedToSpecific"
specificDiffAllelefn <- "5_2_specificDiffAllele"
specificMissPosfn <- "5_3_specificMissPos"
specificDiffPosfn <- "5_4_specificDiffPos"
reductExpand(plink, referencePanel, inputPrefix, inputQCprefix,
snpRefAlleleFile, snpDiffAlleleFile,
snpMissPosFile, snpSameNameDifPosFile,
reducedToSpecificfn, specificDiffAllelefn,
specificMissPosfn, specificDiffPosfn)
setwd("..")
Copy and rename the results of interest to the directory ./6-finalResults.
The metadata information file, the final well imputed and a subset of imputed dataset (dataset-specific) are essential for the downstream analysis.
dir6 <- "./6-finalResults/"
system(paste0("scp ./1-genoUpdate/1_01_metaData.txt ", dir6))
system(paste0("scp ./4-imputation/4_6_removedSnpMissPostImp.* ", dir6))
system(paste0("scp ./5-reductAndExpand/5_4_specificDiffPos.* ", dir6))
setwd(dir6)
renamePlinkBFile(inputPrefix="4_6_removedSnpMissPostImp",
outputPrefix="imputedSnpsDataset", action="move")
renamePlinkBFile(inputPrefix="5_4_specificDiffPos",
outputPrefix="specificSnpsDataset", action="move")
Gimpute’s modular structure allows the integration of other existing imputation workflows, allowing users to choose their preferred imputation strategy. For demonstration, we have embedded Genipe as an external imputation tool [8].
To set up Genipe, please refer to Genipe’s installation page for more details.
The imputation reference can be downloaded directly from IMPUTE2’s reference page: 1000 Genome phase 3 or Genipe’s page.
This reference genome is used for the strand check, you can download from Genipe’s page.
## example
impRefDIR4genipe <- "../1000GP_Phase3/"
fastaFile <- "../hg19/hg19.fasta"
chrs =22
inputPrefix <- "3_4_removedSnpDiffAlleles"
thread4impute2 <- 20 ## tune by yourself
thread4shapeit <- 30
segmentSize <- 3000000
imputeTool <- "impute2" ## only impute2
imputedByGenipe(chrs, impRefDIR4genipe, inputPrefix, shapeit, impute, plink, fastaFile, segmentSize, thread4impute2, thread4shapeit)
chr <- 2
inputImpute2 <- 'chr2.33000001_36000000.impute2'
probability <- 0.9
completionRate <- 0.98
# info <- 0.6
outputPrefix <- paste0('imputedChr', chr)
mergeByGenipe(inputImpute2, chr, probability, completionRate, info, outputPrefix)
## extract imputed markers using Genipe
chr <- 3
inputImpute2 <- paste0('chr', chr,'.imputed.impute2')
inputMAP <- paste0('chr', chr,'.imputed.map')
format <- 'bed'
prob <- 0.9
outputPrefix <- paste0('imputedChr', chr)
extractByGenipe(inputImpute2, inputMAP, outputPrefix, format, prob)
## sessionInfo()
# R version 3.5.0 (2018-04-23)
# Platform: x86_64-redhat-linux-gnu (64-bit)
# Running under: CentOS Linux 7 (Core)
# Matrix products: default
# BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
# locale:
# [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
# [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
# [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
# [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
# [9] LC_ADDRESS=C LC_TELEPHONE=C
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
# attached base packages:
# [1] parallel stats graphics grDevices utils datasets methods
# [8] base
# other attached packages:
# [1] doParallel_1.0.11 iterators_1.0.10 foreach_1.4.4 lattice_0.20-35
# [5] Gimpute_0.99.9
# loaded via a namespace (and not attached):
# [1] compiler_3.5.0 codetools_0.2-15 grid_3.5.0
[1] Marchini, J., et al. (2010). Genotype imputation for genome-wide association studies. Nat Rev Genet 11(7): 499-511.
[2] Marchini, J., et al. (2007). A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39(7): 906-913.
[3] Schizophrenia Working Group of the Psychiatric Genomics, C. (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature 511(7510): 421-427.
[4] Purcell, Shaun, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 81.3 (2007): 559-575.
[5] Howie, B., et al. (2012). Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44(8): 955-959.
[6] Howie, B. N., et al. (2009). A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5(6): e1000529.
[7] Bycroft, Clare, Colin Freeman, Desislava Petkova, Gavin Band, Lloyd T. Elliott, Kevin Sharp, Allan Motyer et al. Genome-wide genetic data on~ 500,000 UK Biobank participants. BioRxiv (2017): 166298.
[8] Lemieux Perreault, L. P., et al. (2016). genipe: an automated genome-wide imputation pipeline with automatic reporting and statistical tools. Bioinformatics, 32(23), 3661-3663.
If you have any trouble or comment ? –> Contact: junfang.chen@zi-mannheim.de