This vignette might be a bit overwhelming for a first-time user.
You might want to skim through this vignette and then take a look at the same two examples stripped to their essential analyses. These can be found at https://github.com/statOmics/MSqRobData/blob/master/inst/extdata/Francisella/analysis_Francisella.Rmd and https://github.com/statOmics/MSqRobData/blob/master/inst/extdata/CPTAC/analysis_CPTAC.Rmd for the Francisella and CPTAC example respectively.
Note that we mainly focus on MaxQuant output, but that any type type of search engine output can be used with MSqRob, see “1.2. Importing other data types” for more info.
If anything is unclear, please do not hesitate to contact me at ludger.goeminne@vib-ugent.be or to open an issue on GitHub. Your feedback is highly welcomed!
This vignette will guide you through the main features of the MSqRob package. We will highlight two examples in order to help you familiarize with its main features.
The MSqRob package is designed for the analysis of differential abundance in label-free mass spectrometry-based shotgun proteomics experiments. These experiments typically output large spectral files (e.g. “.mzML” files) that need to be searched for identifications by specialized peak detection and peptide identification software such as Andromeda (used in the MaxQuant software package) and Mascot Distiller. These algorithms output lists of putative peptides accompanied by their corresponding overall intensity estimates.
Often, researchers further summarize these peptide intensities to protein intensities. However, as we have shown before, this summarization-based approaches tend to perform suboptimal compared to peptide-based models, wherein the peptide intensities are directly provided as input to the statistical model used for differential quantification (Goeminne et al., 2015). Nonetheless, most of these models still suffer from overfitting issues.
This package implements the robust ridge approach that prevents overfitting in sparse data but also accurately fits highly abundant proteins as described in Goeminne et al. (2016).
MSqRob is completely free and open source, but please make a reference to Goeminne et al. (2016) when using our package in any kind of publication.
On Windows, make sure that RTools is installed. Go to: https://cran.r-project.org/bin/windows/Rtools/ to download RTools. A user guide on how to install RTools on Windows can be found at: https://github.com/stan-dev/rstan/wiki/Install-Rtools-for-Windows. Errors in MSqRob on Windows related to unable to zip the results Excel file might be related to errors in configuring RTools.
First, we need to install the package devtools:
#Change to eval=TRUE to execute this piece of code.
install.packages("devtools", repos = "http://cran.us.r-project.org")
library(devtools)
Then, we call this to install MSqRob from GitHub:
#Change to eval=TRUE to execute this piece of code.
devtools::install_github("statOmics/MSqRob")
library(MSqRob)
Note that the latest version of MSqRob can always be found at: https://github.com/statOmics/MSqRob.
The latest version of this vignette can be found at: https://github.com/statOmics/MSqRob/blob/master/vignettes/MSqRob.Rmd.
Loading the package is done via the library
command, as with any other package.
library(MSqRob)
## Loading required package: lme4
## Loading required package: Matrix
Additionally, we load the packages Biobase
, MSnbase
and limma
, as we will use a few of their functions in this vignette.
library(Biobase)
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:Matrix':
##
## colMeans, colSums, rowMeans, rowSums, which
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, append,
## as.data.frame, basename, cbind, colMeans, colSums, colnames,
## dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
## intersect, is.unsorted, lapply, lengths, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
## rowMeans, rowSums, rownames, sapply, setdiff, sort, table,
## tapply, union, unique, unsplit, which, which.max, which.min
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
library(MSnbase)
## Loading required package: mzR
## Loading required package: Rcpp
## Warning in fun(libname, pkgname): mzR has been built against a different Rcpp version (0.12.16)
## than is installed on your system (0.12.17). This might lead to errors
## when loading mzR. If you encounter such issues, please send a report,
## including the output of sessionInfo() to the Bioc support forum at
## https://support.bioconductor.org/. For details see also
## https://github.com/sneumann/mzR/wiki/mzR-Rcpp-compiler-linker-issue.
## Loading required package: BiocParallel
## Loading required package: ProtGenerics
##
## This is MSnbase version 2.6.1
## Visit https://lgatto.github.io/MSnbase/ to get started.
##
## Attaching package: 'MSnbase'
## The following object is masked from 'package:stats':
##
## smooth
## The following object is masked from 'package:base':
##
## trimws
library(limma)
##
## Attaching package: 'limma'
## The following object is masked from 'package:BiocGenerics':
##
## plotMA
R packages uploaded on CRAN have a maximum size limit of 5 MB. As proteomics datasets are typically several hunderds of GB in size and even an output file of a peptide search can be tens of MBs in size, we uploaded our example data in a separate data package called MSqRobData. If you want to run the examples in this vignette, you also need the data package. This can also be downloaded from GitHub in the following way:
#Change to eval=TRUE to execute this piece of code.
devtools::install_github("statOmics/MSqRobData")
library(MSqRobData)
If you want to work with the Shiny graphical user interface (MaxQuant output only), use the following command:
#Change to eval=TRUE to execute this piece of code.
shiny::runApp(system.file('App-MSqRob', package='MSqRob'))
The bacterium Francisella tularensis is a facultative intracellular parasite of macrophages and must import host cell arginine in order to be able to multiply inside the host cell. Ramond et al. (2015) investigated the effect of gene deletion of a newly identified arginine transporter called ArgP. Their data is deposited in the PRIDE repository at http://www.ebi.ac.uk/pride/archive/projects/PXD001584. We used the authors’ supplied peptides.txt file which is the result of preprocessing with MaxQuant version 1.4.1.2. As the authors used now outdated GI numbers, we modified this file to incorporate NCBI accessions and protein names (September 25, 2016).
MSqRob data input can come from any kind of source, as long as feature-level or peptide-level data is available. I will here focus on a MaxQuant example, but in fact, output from any search engine can be used.
First, specify the location of the peptides.txt MaxQuant output file. By default, MaxQuant creates a “combined” folder in the same folder as the raw files are (or if the raw files are in different directories, in the directory of the first raw file). The peptides.txt file can then be found in “path_to_raw_files/combined/txt/peptides.txt”. Just provide the path to where the file is located on your computer. The system.file
function is used here only to access the example tab-delimited peptides.txt file that is delivered with our package.
file_peptides_txt <- system.file("extdata/Francisella", "peptides.txt", package = "MSqRobData")
file_peptides_txt
## [1] "/Library/Frameworks/R.framework/Versions/3.5/Resources/library/MSqRobData/extdata/Francisella/peptides.txt"
Import the data using the import2MSnSet
function. The pattern
argument should contain a text string that is present in all column names that contain peptide intensities but not in the other column names. This defaults to "Intensity\ "
for MaxQuant data. The default setting remove_pattern=TRUE
then removes the pattern from these column names so that the names equal the names that were entered in MaxQuant’s “Experiment” column when conducting the search. In this particular Francisella experiment, our column names start with a number (either “1” or “3”), which is not a valid column name in R. Therefore, an “X” is prepended to these names.
peptidesFranc <- import2MSnSet(file_peptides_txt, filetype = "MaxQuant", remove_pattern = TRUE)
import2MSnSet
creates an MSnSet
object with 3 slots:
exprs
: contains a matrix of peptide intensities of which the rows correspond to the peptides present in the dataset and the columns correspond to the different mass spec runs.
fData
: contains a data frame containing all additional information (such as peptide sequences, protein names, scores, number of missed cleavages, etc.) present in the input file.
pData
: contains a data frame with 0 columns and as many rows as there are mass spec runs. This slot will be used to store the experimental annotation.
If you want to assess whether importing happened in a correct way, you can always check the exprs
, fData
and pData
slots.
The exprs
slot should contain all the peptide intensities grouped per column.
head(exprs(peptidesFranc))
## X1WT_20_2h_n3_1 X1WT_20_2h_n3_2 X1WT_20_2h_n3_3 X1WT_20_2h_n4_1
## 1 0 0 9290400 92996000
## 2 0 20679000 0 0
## 3 28853000 28091000 30436000 19476000
## 4 339830000 420390000 393930000 235060000
## 5 240280000 0 241890000 222250000
## 6 0 31069000 31448000 0
## X1WT_20_2h_n4_2 X1WT_20_2h_n4_3 X1WT_20_2h_n5_1 X1WT_20_2h_n5_2
## 1 0 98059000 0 80803000
## 2 0 0 0 17358000
## 3 22050000 20687000 36588000 28654000
## 4 381090000 360270000 255710000 311370000
## 5 189930000 0 160440000 147230000
## 6 27721000 28657000 0 0
## X1WT_20_2h_n5_3 X3D8_20_2h_n3_1 X3D8_20_2h_n3_2 X3D8_20_2h_n3_3
## 1 0 0 95433000 0
## 2 13841000 12278000 10712000 0
## 3 7009100 14174000 8548000 7859200
## 4 271440000 225140000 289880000 282500000
## 5 143340000 135910000 185740000 168620000
## 6 0 0 0 0
## X3D8_20_2h_n4_1 X3D8_20_2h_n4_2 X3D8_20_2h_n4_3 X3D8_20_2h_n5_1
## 1 0 0 0 0
## 2 9500700 11111000 8723500 8125200
## 3 14871000 11690000 4470500 0
## 4 209490000 223610000 219700000 169120000
## 5 160240000 158000000 0 92542000
## 6 0 18253000 0 0
## X3D8_20_2h_n5_2 X3D8_20_2h_n5_3
## 1 0 0
## 2 0 0
## 3 4048100 3422300
## 4 178150000 170400000
## 5 102350000 100090000
## 6 0 0
The fData
slot contains all other columns.
head(fData(peptidesFranc))
## Sequence Amino.acid.before First.amino.acid
## 1 AAAEELDTR K A
## 2 AAAGFVITASHNK R A
## 3 AAANEYELALAYSIEEVAPDLHK K A
## 4 AAANNPQLEAFK K A
## 5 AAASAGLVDEK K A
## 6 AAASLDLYSYPK K A
## Second.amino.acid Second.last.amino.acid Last.amino.acid
## 1 A T R
## 2 A N K
## 3 A H K
## 4 A F K
## 5 A E K
## 6 A P K
## Amino.acid.after A.Count R.Count N.Count D.Count C.Count Q.Count E.Count
## 1 K 3 1 0 1 0 0 2
## 2 F 4 0 1 0 0 0 0
## 3 Y 6 0 1 1 0 0 4
## 4 K 4 0 2 0 0 1 1
## 5 A 4 0 0 1 0 0 1
## 6 V 3 0 0 1 0 0 0
## G.Count H.Count I.Count L.Count K.Count M.Count F.Count P.Count S.Count
## 1 0 0 0 1 0 0 0 0 0
## 2 1 1 1 0 1 0 1 0 1
## 3 0 1 1 3 1 0 0 1 1
## 4 0 0 0 1 1 0 1 1 0
## 5 1 0 0 1 1 0 0 0 1
## 6 0 0 0 2 1 0 0 1 2
## T.Count W.Count Y.Count V.Count U.Count Length Missed.cleavages
## 1 1 0 0 0 0 9 0
## 2 1 0 0 1 0 13 0
## 3 0 0 2 1 0 23 0
## 4 0 0 0 0 0 12 0
## 5 0 0 0 1 0 11 0
## 6 0 0 2 0 0 12 0
## Mass Proteins GI.number Leading.razor.protein
## 1 974.4669 WP_003038655 gi|118497196 gi|118497196
## 2 1285.6779 WP_003041237 gi|118498194 gi|118498194
## 3 2516.2435 WP_003038915 gi|118497331 gi|118497331
## 4 1272.6463 WP_003038264 gi|118496879 gi|118496879
## 5 1030.5295 WP_003035781 gi|118497152 gi|118497152
## 6 1297.6554 WP_003039212 gi|118497492 gi|118497492
## Protein.names
## 1 hypothetical protein [Francisella tularensis]
## 2 phosphoglucosamine mutase [Francisella tularensis]
## 3 glycine--tRNA ligase subunit beta [Francisella tularensis]
## 4 molecular chaperone HtpG [Francisella tularensis]
## 5 delta-aminolevulinic acid dehydratase [Francisella tularensis]
## 6 phosphorylase [Francisella tularensis]
## Unique..Groups. Unique..Proteins. Charges PEP Score
## 1 yes yes 1,2 1.2531e-04 59.35
## 2 yes yes 2 5.3020e-05 45.825
## 3 yes yes 3 1.0517e-45 84.327
## 4 yes yes 2 8.7549e-24 108.56
## 5 yes yes 2 9.5062e-05 67.385
## 6 yes yes 2 1.2563e-02 31.675
## Experiment.1WT_20_2h_n3_1 Experiment.1WT_20_2h_n3_2
## 1 NA NA
## 2 NA 1
## 3 1 1
## 4 1 1
## 5 1 NA
## 6 NA 1
## Experiment.1WT_20_2h_n3_3 Experiment.1WT_20_2h_n4_1
## 1 1 1
## 2 NA NA
## 3 2 1
## 4 1 1
## 5 1 1
## 6 1 NA
## Experiment.1WT_20_2h_n4_2 Experiment.1WT_20_2h_n4_3
## 1 NA 1
## 2 NA NA
## 3 1 1
## 4 1 1
## 5 1 NA
## 6 1 1
## Experiment.1WT_20_2h_n5_1 Experiment.1WT_20_2h_n5_2
## 1 NA 1
## 2 NA 1
## 3 2 2
## 4 1 1
## 5 1 1
## 6 NA NA
## Experiment.1WT_20_2h_n5_3 Experiment.3D8_20_2h_n3_1
## 1 NA NA
## 2 1 1
## 3 1 1
## 4 1 1
## 5 1 1
## 6 NA NA
## Experiment.3D8_20_2h_n3_2 Experiment.3D8_20_2h_n3_3
## 1 1 NA
## 2 1 NA
## 3 2 1
## 4 1 1
## 5 1 1
## 6 NA NA
## Experiment.3D8_20_2h_n4_1 Experiment.3D8_20_2h_n4_2
## 1 NA NA
## 2 1 1
## 3 1 1
## 4 1 1
## 5 1 1
## 6 NA 1
## Experiment.3D8_20_2h_n4_3 Experiment.3D8_20_2h_n5_1
## 1 NA NA
## 2 1 1
## 3 1 NA
## 4 1 1
## 5 NA 1
## 6 NA NA
## Experiment.3D8_20_2h_n5_2 Experiment.3D8_20_2h_n5_3 Intensity Reverse
## 1 NA NA 1.3295e+09
## 2 NA NA 1.6719e+08
## 3 1 1 5.5675e+08
## 4 1 1 1.4830e+10
## 5 1 1 4.8209e+09
## 6 NA NA 1.3715e+08
## Contaminant id Protein.group.IDs Mod..peptide.IDs
## 1 0 419 0
## 2 1 1150 1
## 3 2 513 2
## 4 3 199 3
## 5 4 388 4
## 6 5 624 5
## Evidence.IDs
## 1 0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18
## 2 19;20;21;22;23;24;25;26;27;28;29;30;31;32;33
## 3 34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76
## 4 77;78;79;80;81;82;83;84;85;86;87;88;89;90;91;92;93;94;95;96;97;98;99;100;101;102;103;104;105;106;107;108;109;110;111;112;113;114;115;116;117;118;119;120;121;122;123;124
## 5 125;126;127;128;129;130;131;132;133;134;135;136;137;138;139;140;141;142;143;144;145;146;147;148;149;150;151;152;153;154;155
## 6 156;157;158;159;160
## MS.MS.IDs
## 1 0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18
## 2 19;20;21;22;23;24;25;26;27;28;29;30;31;32;33
## 3 34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76;77;78;79
## 4 80;81;82;83;84;85;86;87;88;89;90;91;92;93;94;95;96;97;98;99;100;101;102;103;104;105;106;107;108;109;110;111;112;113;114;115;116;117;118;119;120;121;122;123;124;125;126;127;128;129;130;131;132;133;134;135;136;137;138;139;140;141;142;143;144;145;146;147;148;149;150;151;152;153;154;155;156;157;158;159;160;161;162;163;164;165;166;167;168;169;170;171;172
## 5 173;174;175;176;177;178;179;180;181;182;183;184;185;186;187;188;189;190;191;192;193;194;195;196;197;198;199;200;201;202;203
## 6 204;205;206;207;208
## Best.MS.MS Oxidation..M..site.IDs
## 1 15
## 2 26
## 3 39
## 4 159
## 5 191
## 6 205
The pData
slot will contain the experimental annotation. Right now, it has 18 rows (equal to the number of mass spec runs) and 0 columns (as the experimental annotation is not yet loaded).
pData(peptidesFranc)
## data frame with 0 columns and 18 rows
The import2MSnSet
has built-in support for moFF (Argentini et al., 2016), mzTab (Griss et al., 2014) and Progenesis (Nonlinear Dynamics, Newcastle upon Tyne, U.K.) output. You should then simply set filetype
to "moFF"
, "mzTab"
or "Progenesis"
respectively.
If you are using yet another data type, you can use MSqRob’s read2MSnSet
function to import your data. Hereby, you have to specify (1) your file, (2) a regular expression pattern that indicates the columns with the peptide intensities or simply the indices of these columns and (3) the separator (e.g. "\t"
, ","
, …).
Now, we are going to create a data frame that contains the experimental annotation. This data frame should contain all explanatory variables that groups different runs together. This is important in order to be able to include factors on which you will do inference (such as treatment effects or effects over time, or, in our case: the effect of the genetic knock-out) as well as other factors that might have an influence (such as biological repeats, technical repeats and potential confounders such as batch effects).
Note that the “experimental annotation” can also be given as an Excel file or a tab delimited file! This file should have the column names as a header (i.e. the first row in the file) and the same structure as the exp_annotation
data frame we create below. If you have your experimental annotation in a file, just provide the filepath to this annotation file in the preprocess_MaxQuant
function.
Also note that exactly one column in the experimental annotation should contain the names of the mass spec runs. These names should be equal to the names you entered in the “Experiment” column in MaxQuant (i.e. the column names of the exprs
slot of the peptidesFranc MSnSet
object). In our case, we will call this column “run” (see below). By default, MSqRob guesses this column by taking the only column of which its elements correspond to the column names of the exprs
slot. If there is no such a column, or there are multiple such columns, MSqRob throws an error.
In this example, we create the experimental annotation data frame ourselves.
Extract the names of the mass spec runs. The mass spec run names are equal to the column names of the exprs
slot of the peptidesFranc MSnSet
object. Other columns in the experimental annotation data frame will typically be derived from this column.
runs <- sampleNames(peptidesFranc)
runs
## [1] "X1WT_20_2h_n3_1" "X1WT_20_2h_n3_2" "X1WT_20_2h_n3_3"
## [4] "X1WT_20_2h_n4_1" "X1WT_20_2h_n4_2" "X1WT_20_2h_n4_3"
## [7] "X1WT_20_2h_n5_1" "X1WT_20_2h_n5_2" "X1WT_20_2h_n5_3"
## [10] "X3D8_20_2h_n3_1" "X3D8_20_2h_n3_2" "X3D8_20_2h_n3_3"
## [13] "X3D8_20_2h_n4_1" "X3D8_20_2h_n4_2" "X3D8_20_2h_n4_3"
## [16] "X3D8_20_2h_n5_1" "X3D8_20_2h_n5_2" "X3D8_20_2h_n5_3"
colnames(exprs(peptidesFranc))
## [1] "X1WT_20_2h_n3_1" "X1WT_20_2h_n3_2" "X1WT_20_2h_n3_3"
## [4] "X1WT_20_2h_n4_1" "X1WT_20_2h_n4_2" "X1WT_20_2h_n4_3"
## [7] "X1WT_20_2h_n5_1" "X1WT_20_2h_n5_2" "X1WT_20_2h_n5_3"
## [10] "X3D8_20_2h_n3_1" "X3D8_20_2h_n3_2" "X3D8_20_2h_n3_3"
## [13] "X3D8_20_2h_n4_1" "X3D8_20_2h_n4_2" "X3D8_20_2h_n4_3"
## [16] "X3D8_20_2h_n5_1" "X3D8_20_2h_n5_2" "X3D8_20_2h_n5_3"
Note again that the sample names have a prepended “X” because a syntactically valid name in R cannot start with a number.
Add a factor for the genetic knock-out, which we will call “genotype” (i.e. wild type (WT) or knock-out (KO)).
#Create the "genotype" vector
genotype <- factor(c(rep("WT",9),rep("KO",9)), levels=c("WT","KO"))
genotype
## [1] WT WT WT WT WT WT WT WT WT KO KO KO KO KO KO KO KO KO
## Levels: WT KO
Add a factor for each of the 6 biological replicates (i.e. 3 biological repeats for each genotype).
#Create a factor for the biological replicates
n <- nchar(runs)
biorep <- as.factor(paste0("b_",rep(1:6,each=3)))
biorep
## [1] b_1 b_1 b_1 b_2 b_2 b_2 b_3 b_3 b_3 b_4 b_4 b_4 b_5 b_5 b_5 b_6 b_6
## [18] b_6
## Levels: b_1 b_2 b_3 b_4 b_5 b_6
#Alternative: extract the biological repeat based on the mass spec run names.
# biorep <- as.factor(paste0("b_",factor(as.numeric(factor(paste0(substr(runs, 1,1),substr(runs, n-2,n-2))))+c(rep(0,9),rep(3,9)))))
Each of the 18 technical repeats are here represented by the 18 mass spec runs.
Create an experimental annotation data frame.
exp_annotation <- data.frame(run=runs, genotype=genotype, biorep=biorep)
exp_annotation
## run genotype biorep
## 1 X1WT_20_2h_n3_1 WT b_1
## 2 X1WT_20_2h_n3_2 WT b_1
## 3 X1WT_20_2h_n3_3 WT b_1
## 4 X1WT_20_2h_n4_1 WT b_2
## 5 X1WT_20_2h_n4_2 WT b_2
## 6 X1WT_20_2h_n4_3 WT b_2
## 7 X1WT_20_2h_n5_1 WT b_3
## 8 X1WT_20_2h_n5_2 WT b_3
## 9 X1WT_20_2h_n5_3 WT b_3
## 10 X3D8_20_2h_n3_1 KO b_4
## 11 X3D8_20_2h_n3_2 KO b_4
## 12 X3D8_20_2h_n3_3 KO b_4
## 13 X3D8_20_2h_n4_1 KO b_5
## 14 X3D8_20_2h_n4_2 KO b_5
## 15 X3D8_20_2h_n4_3 KO b_5
## 16 X3D8_20_2h_n5_1 KO b_6
## 17 X3D8_20_2h_n5_2 KO b_6
## 18 X3D8_20_2h_n5_3 KO b_6
Note that you can also import the experimental annotation from a tab-delimited or Excel file. As an example, we show you here how you can import data from an experimental annotation file delivered with our package. First, provide the path to where the data is saved.
file_exp_annotation <- system.file("extdata/Francisella", "label-free_Francisella_annotation.xlsx", package = "MSqRobData")
file_exp_annotation
## [1] "/Library/Frameworks/R.framework/Versions/3.5/Resources/library/MSqRobData/extdata/Francisella/label-free_Francisella_annotation.xlsx"
Then, import the data using the read.xlsx
function from the openxlsx
package and convert the columns of the data frame to factors.
#Import Excel file
exp_annotation_xlsx <- openxlsx::read.xlsx(file_exp_annotation)
#Convert columns of data frame to factors
exp_annotation_xlsx <- as.data.frame(unclass(exp_annotation_xlsx))
exp_annotation_xlsx
## run genotype biorep
## 1 1WT_20_2h_n3_1 WT b_1
## 2 1WT_20_2h_n3_2 WT b_1
## 3 1WT_20_2h_n3_3 WT b_1
## 4 1WT_20_2h_n4_1 WT b_2
## 5 1WT_20_2h_n4_2 WT b_2
## 6 1WT_20_2h_n4_3 WT b_2
## 7 1WT_20_2h_n5_1 WT b_3
## 8 1WT_20_2h_n5_2 WT b_3
## 9 1WT_20_2h_n5_3 WT b_3
## 10 3D8_20_2h_n3_1 KO b_4
## 11 3D8_20_2h_n3_2 KO b_4
## 12 3D8_20_2h_n3_3 KO b_4
## 13 3D8_20_2h_n4_1 KO b_5
## 14 3D8_20_2h_n4_2 KO b_5
## 15 3D8_20_2h_n4_3 KO b_5
## 16 3D8_20_2h_n5_1 KO b_6
## 17 3D8_20_2h_n5_2 KO b_6
## 18 3D8_20_2h_n5_3 KO b_6
Note that here, there is no “X” prepended to the run names. However, names will be changed automatically to syntactically valid names by MSqRob during preprocessing.
The identification of peptides that carry one or more chemical modifications is often not as reliable as the identification of unmodified peptides. Therefore, a common step during the preprocessing of proteomics data consists of removing the proteins of which all identified peptides have one or more modification sites. If we want to remove these proteins that are only identified by modified peptides from the dataset, we also need the proteinGroups.txt file. This file can be found in the same location as the peptides.txt file (“path_to_raw_files/combined/txt/proteinGroups.txt”). Here, we make use of the proteinGroups.txt file that is delivered with the package.
Note: if you don’t want to remove proteins that are only identified by modified peptides, set the remove_only_site
argument of the preprocess_MaxQuant
function to FALSE
and leave the file_proteinGroups
argument at its default value NULL
.
Define the path to the proteinGroups.txt file.
file_proteinGroups <- system.file("extdata/Francisella", "proteinGroups.txt", package = "MSqRobData")
file_proteinGroups
## [1] "/Library/Frameworks/R.framework/Versions/3.5/Resources/library/MSqRobData/extdata/Francisella/proteinGroups.txt"
Add to the useful_properties
argument all column names of the fData
slot of the peptidesFranc MSnSet
object that you would like to keep (such as gene names, protein names, ontologies, etc.). Other columns will be removed for improved efficiency and memory usage. Here, we would like to keep the GI number and the protein names. These 2 columns are present in our peptides.txt file, but note that you can use basic R manipulations to add other columns to the fData slot if you want to. We certainly want to keep the “Sequence” slot, as peptide-specific effects must be included in our model.
peptidesFranc2 <- preprocess_MaxQuant(peptidesFranc, accession="Proteins", exp_annotation=exp_annotation, logtransform=TRUE, base=2, normalisation="quantiles", smallestUniqueGroups=TRUE, useful_properties=c("GI number","Protein.names","Sequence"), filter=c("Contaminant","Reverse"), remove_only_site=TRUE, file_proteinGroups=file_proteinGroups, filter_symbol="+", minIdentified=2)
preprocess_MaxQuant
internally executes the following preprocessing steps in this order:
Peptide intensities are (log2-)transformed (logtransform=TRUE, base=2
).
As a normalization approach, we choose to quantile normalize the peptide intensities (normalisation="quantiles"
). Other options are "sum"
, "max"
, "center.mean"
, "center.median"
, "quantiles.robust"
, "vsn"
or "none"
(see MSnbase::normalize
for more information).
Handling overlapping protein groups (smallestUniqueGroups=TRUE
): in our approach a peptide can map to multiple accessions (accession="Proteins"
), as long as there is none of these proteins present in a smaller subgroup. Peptides that map to protein groups for wich also subgroups are present in the dataset are removed from the dataset. (Proteins in a protein group are separated by a ;
).
Remove contaminants and reverse sequences (filter=c("Contaminant","Reverse"), filter_symbol="+"
): each row with the filter symbol (“+” in our case) in the columns “Contaminant” and “Reverse” will be removed from the data. Note that the “Contaminant” column in older MaxQuant peptides.txt files can also be called “Potential.contaminant”.
Remove all proteins that are only identified by peptides carrying a modification site (remove_only_site=TRUE
, file_proteinGroups=file_proteinGroups
).
Remove all superfluous columns in the fData
slot, except the “Proteins”, “GI.number”, “Protein.names” and “Sequence” columns (useful_properties=c("GI.number","Protein.names","Sequence")
). Note that the “Proteins” column does not have to be provided: the accession
column is always retained.
Remove all peptides that are only identified once in the dataset (minIdentified=2
).
Add experimental annotation (exp_annotation=exp_annotation
).
We can now make density plots to inspect the effect of the preprocessing on the distribution of our log2-transformed peptide intensities.
Here we take a closer look at how you can make the diagnostic plots from the Shiny app. We will plot both the density plots and the MDS plot.
#Make sure the order of the rows in your exp_annotation is the same as the columns of the exprs slot in your peptides object.
all(exp_annotation[,"run"]==colnames(exprs(peptidesFranc)))
## [1] TRUE
#Color by biological repeat
colors <- as.numeric(exp_annotation[,"biorep"])
### Density plot before preprocessing ###
densAll <- apply(log2(exprs(peptidesFranc)),2,density,na.rm=TRUE)
#Automatically determine the plot window size
ymax=max(vapply(densAll,function(d) max(d$y),1))
rangematrix <- vapply(densAll,function(d) range(d$x, na.rm=TRUE), c(1,1)) #no longer
xlim=range(rangematrix,na.rm=TRUE)
ylim=c(0,ymax)
plot(densAll[[1]],col=colors[1],xlim=xlim,ylim=ylim, las=1, frame.plot=FALSE, main="Density before preprocessing")
for (i in 2:length(densAll)) lines(densAll[[i]],col=colors[i])
######
### Density plot after preprocessing ###
densAll <- apply(log2(exprs(peptidesFranc2)),2,density,na.rm=TRUE)
#Automatically determine the plot window size
ymax=max(vapply(densAll,function(d) max(d$y),1))
rangematrix <- vapply(densAll,function(d) range(d$x, na.rm=TRUE), c(1,1)) #no longer
xlim=range(rangematrix,na.rm=TRUE)
ylim=c(0,ymax)
plot(densAll[[1]],col=colors[1],xlim=xlim,ylim=ylim, las=1, frame.plot=FALSE, main="Density after preprocessing")
for (i in 2:length(densAll)) lines(densAll[[i]],col=colors[i])
######
We can also construct an MDS plot. This plot will cluster similar runs together.
plotMDS(exprs(peptidesFranc2), col=colors, las=1)
The accession
argument denotes by which column the individual peptide observations should be grouped. The annotations
argument indicates which columns in peptidesFranc2 are annotations linked to the accessions (in our case: the GI number and the protein names). These annotations will be added to a separate annotation
slot.
proteinsFranc <- MSnSet2protdata(peptidesFranc2, accession="Proteins", annotations = c("GI.number", "Protein.names"))
Imagine you forgot a factor in your experimental annotation. Should you redo the whole preprocessing part? No, often, you can still add this factor thanks to the addVarFromVar
function. We will now show how we can add an extra (redundant) factor called techrep
.
Create a factor for each of the 18 technical replicates (each sample was analyzed in triplicate). This factor is completely redundant, as the “run” column in the experimental annotation already has 18 different levels and is thus already equal to the factor for the technical replicates.
techrep <- as.factor(paste0("t_",1:18))
names(techrep) <- levels(exp_annotation$run)
techrep
## X1WT_20_2h_n3_1 X1WT_20_2h_n3_2 X1WT_20_2h_n3_3 X1WT_20_2h_n4_1
## t_1 t_2 t_3 t_4
## X1WT_20_2h_n4_2 X1WT_20_2h_n4_3 X1WT_20_2h_n5_1 X1WT_20_2h_n5_2
## t_5 t_6 t_7 t_8
## X1WT_20_2h_n5_3 X3D8_20_2h_n3_1 X3D8_20_2h_n3_2 X3D8_20_2h_n3_3
## t_9 t_10 t_11 t_12
## X3D8_20_2h_n4_1 X3D8_20_2h_n4_2 X3D8_20_2h_n4_3 X3D8_20_2h_n5_1
## t_13 t_14 t_15 t_16
## X3D8_20_2h_n5_2 X3D8_20_2h_n5_3
## t_17 t_18
## 18 Levels: t_1 t_10 t_11 t_12 t_13 t_14 t_15 t_16 t_17 t_18 t_2 ... t_9
proteinsFranc2 <- addVarFromVar(proteinsFranc, basecol="run", name="techrep", vector=techrep)
proteinsFranc2
## @accession
##
## [1] "WP_003038655"
##
## @annotation
##
## GI.number
## "gi|118497196"
## Protein.names
## "hypothetical protein [Francisella tularensis]"
##
## @data
##
## quant_value Sequence run genotype biorep techrep
## 1.1 25.11978 AVVVFFDYGCGK X1WT_20_2h_n3_1 WT b_1 t_1
## 1.2 25.34586 DLLSDPATPTVGPQDANK X1WT_20_2h_n3_1 WT b_1 t_1
## 1.3 24.11491 ETNGELTVQDVDNVVK X1WT_20_2h_n3_1 WT b_1 t_1
## 1.4 26.50674 IIAIPDIVK X1WT_20_2h_n3_1 WT b_1 t_1
## 1.5 24.69046 QTNVITDNYQQAAK X1WT_20_2h_n3_1 WT b_1 t_1
## 2.1 24.75196 AVVVFFDYGCGK X1WT_20_2h_n3_2 WT b_1 t_2
##
## 96 more rows...
##
## @accession
##
## [1] "WP_003041237"
##
## @annotation
##
## GI.number
## "gi|118498194"
## Protein.names
## "phosphoglucosamine mutase [Francisella tularensis]"
##
## @data
##
## quant_value Sequence run genotype biorep techrep
## 1.1 25.81732 ADLGISLDGDADR X1WT_20_2h_n3_1 WT b_1 t_1
## 1.2 27.36716 FVIVGQDTR X1WT_20_2h_n3_1 WT b_1 t_1
## 1.3 24.91571 GEVANSTITAEFTQK X1WT_20_2h_n3_1 WT b_1 t_1
## 1.4 27.38539 ILANAIDEYIESIHSR X1WT_20_2h_n3_1 WT b_1 t_1
## 1.6 27.93398 SLATNEAEYLVEK X1WT_20_2h_n3_1 WT b_1 t_1
## 1.7 26.59163 VASDVNDVEK X1WT_20_2h_n3_1 WT b_1 t_1
##
## 187 more rows...
##
## @accession
##
## [1] "WP_003038915"
##
## @annotation
##
## GI.number
## "gi|118497331"
## Protein.names
## "glycine--tRNA ligase subunit beta [Francisella tularensis]"
##
## @data
##
## quant_value Sequence run genotype biorep
## 1 24.37301 AAANEYELALAYSIEEVAPDLHK X1WT_20_2h_n3_1 WT b_1
## 1.3 22.60222 EEGIAVDIFEAINNTNYDSIK X1WT_20_2h_n3_1 WT b_1
## 1.4 25.34002 EGEPTQVGLGFAK X1WT_20_2h_n3_1 WT b_1
## 1.5 25.13163 EYSYALELLTCLDK X1WT_20_2h_n3_1 WT b_1
## 1.6 24.09977 FHHPQAIVIDNISDYIK X1WT_20_2h_n3_1 WT b_1
## 1.8 25.75712 IIDITLESYK X1WT_20_2h_n3_1 WT b_1
## techrep
## 1 t_1
## 1.3 t_1
## 1.4 t_1
## 1.5 t_1
## 1.6 t_1
## 1.8 t_1
##
## 215 more rows...
##
## @accession
##
## [1] "WP_003038264"
##
## @annotation
##
## GI.number
## "gi|118496879"
## Protein.names
## "molecular chaperone HtpG [Francisella tularensis]"
##
## @data
##
## quant_value Sequence run genotype biorep techrep
## 1 27.92076 AAANNPQLEAFK X1WT_20_2h_n3_1 WT b_1 t_1
## 1.2 26.48741 EGQDTIYYITSDSYK X1WT_20_2h_n3_1 WT b_1 t_1
## 1.4 27.33299 EHLDLLEYHVLK X1WT_20_2h_n3_1 WT b_1 t_1
## 1.5 26.37656 EILQHNK X1WT_20_2h_n3_1 WT b_1 t_1
## 1.6 28.11488 ELLPPYLR X1WT_20_2h_n3_1 WT b_1 t_1
## 1.7 28.77850 ELVSNSSDAIEK X1WT_20_2h_n3_1 WT b_1 t_1
##
## 338 more rows...
##
## @accession
##
## [1] "WP_003035781"
##
## @annotation
##
## GI.number
## "gi|118497152"
## Protein.names
## "delta-aminolevulinic acid dehydratase [Francisella tularensis]"
##
## @data
##
## quant_value Sequence run genotype biorep techrep
## 1 27.47327 AAASAGLVDEK X1WT_20_2h_n3_1 WT b_1 t_1
## 1.1 24.05872 VILTYTALDVANWLK X1WT_20_2h_n3_1 WT b_1 t_1
## 2.1 23.88890 VILTYTALDVANWLK X1WT_20_2h_n3_2 WT b_1 t_2
## 2.2 25.37749 YASAYYGPFR X1WT_20_2h_n3_2 WT b_1 t_2
## 3 27.47269 AAASAGLVDEK X1WT_20_2h_n3_3 WT b_1 t_3
## 3.1 24.93174 VILTYTALDVANWLK X1WT_20_2h_n3_3 WT b_1 t_3
##
## 41 more rows...
##
## @accession
##
## [1] "WP_003039212"
##
## @annotation
##
## GI.number
## "gi|118497492"
## Protein.names
## "phosphorylase [Francisella tularensis]"
##
## @data
##
## quant_value Sequence run genotype biorep
## 1.2 22.96510 FPAGLNDVEFVAEHIFQHSK X1WT_20_2h_n3_1 WT b_1
## 1.3 25.10257 LQDLLTFIGK X1WT_20_2h_n3_1 WT b_1
## 1.6 24.55598 VVFSVDYR X1WT_20_2h_n3_1 WT b_1
## 2 24.53369 AAASLDLYSYPK X1WT_20_2h_n3_2 WT b_1
## 2.1 24.68029 EVLFLNR X1WT_20_2h_n3_2 WT b_1
## 2.2 23.16038 FPAGLNDVEFVAEHIFQHSK X1WT_20_2h_n3_2 WT b_1
## techrep
## 1.2 t_1
## 1.3 t_1
## 1.6 t_1
## 2 t_2
## 2.1 t_2
## 2.2 t_2
##
## 56 more rows...
##
## 1182 more elements...
Here, we illustrate how you can reproduce the detail plots from the Shiny app. We will take a closer look at the protein “WP_003039212”.
protein_name <- "WP_003039212"
data <- getData(proteinsFranc[protein_name])
#We plot the peptide sequences as different point symbols
points <- as.numeric(droplevels(as.factor(data$Sequence)))
#We don't want to use repeated plot symbols or invisible symbols
pch_vals <- c(0:25,32:127)
#Repeat pch_vals if there would be more than 122 unique levels
pch_vals <- rep_len(pch_vals, length(unique(points)))
pch_vals <- pch_vals[points]
#We color by mass spec run
colors <- grDevices::colorRampPalette(RColorBrewer::brewer.pal(8,"Spectral"))(length(unique(droplevels(as.factor(data$run)))))
colors <- colors[as.numeric(droplevels(as.factor(data$run)))]
#We make a different boxplot for each genotype
boxplot(data$quant_value~as.factor(data$genotype), outline=FALSE, ylim=c(min(data$quant_value)-0.2,max(data$quant_value)+0.2), ylab="preprocessed peptide intensity", xlab="", main=paste0("Detail plot for protein ", protein_name), las=2, frame.plot=FALSE, frame=FALSE, col="grey", pars=list(boxcol="white"))
#Add the data as separate points
points(jitter((as.numeric(data$genotype)), factor=2),data$quant_value, col=colors, pch=pch_vals)
Fit the robust ridge models for each protein. The fit.model
function will fit a model to each protein in the protdata
object proteinsFranc
. The default setting reg="ridge"
indicates that ridge regression should be used. The default setting weights="Huber"
indicates that an M estimation algorithm should be used that assigns Huber weights to each peptide observation based on their residuals.
Caution: fitting the models takes some time (less than 15 minutes for the Francisella tularensis dataset on our system).
system.time(protLMFranc <- fit.model(proteinsFranc, response="quant_value", fixed=c("genotype"), random=c("biorep","run","Sequence"), add.intercept=TRUE, weights="Huber"))
## user system elapsed
## 384.990 1.433 396.099
The system.time
function was only added to output the duration of the model fitting. If you are not interested in this, you can of course omit it.
protLMFranc
is a protLM
object.
class(protLMFranc)
## [1] "protLM"
## attr(,"package")
## [1] "MSqRob"
Printing out a protLM
object returns an overview of the first 6 accessions with their annotations and their corresponding models.
protLMFranc
## @accession
##
## [1] "WP_003038655"
##
## @annotation
##
## GI.number
## "gi|118497196"
## Protein.names
## "hypothetical protein [Francisella tularensis]"
##
## @model
##
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 66.4301
## Random effects:
## Groups Name Std.Dev.
## run (Intercept) 0.0000
## Sequence (Intercept) 0.6682
## biorep (Intercept) 0.2247
## ridgeGroup.1 (Intercept) 1.0983
## Residual 0.2524
## Number of obs: 102, groups:
## run, 18; Sequence, 7; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -260
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
##
## @accession
##
## [1] "WP_003041237"
##
## @annotation
##
## GI.number
## "gi|118498194"
## Protein.names
## "phosphoglucosamine mutase [Francisella tularensis]"
##
## @model
##
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 136.4625
## Random effects:
## Groups Name Std.Dev.
## run (Intercept) 0.00000
## Sequence (Intercept) 1.28720
## biorep (Intercept) 0.09322
## ridgeGroup.1 (Intercept) 0.00000
## Residual 0.27463
## Number of obs: 193, groups:
## run, 18; Sequence, 12; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -370.9
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
##
## @accession
##
## [1] "WP_003038915"
##
## @annotation
##
## GI.number
## "gi|118497331"
## Protein.names
## "glycine--tRNA ligase subunit beta [Francisella tularensis]"
##
## @model
##
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 340.3599
## Random effects:
## Groups Name Std.Dev.
## Sequence (Intercept) 1.07373
## run (Intercept) 0.00000
## biorep (Intercept) 0.05502
## ridgeGroup.1 (Intercept) 2.11989
## Residual 0.40125
## Number of obs: 221, groups:
## Sequence, 18; run, 18; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -365.9
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
##
## @accession
##
## [1] "WP_003038264"
##
## @annotation
##
## GI.number
## "gi|118496879"
## Protein.names
## "molecular chaperone HtpG [Francisella tularensis]"
##
## @model
##
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 378.4025
## Random effects:
## Groups Name Std.Dev.
## Sequence (Intercept) 1.02426
## run (Intercept) 0.09019
## biorep (Intercept) 0.11985
## ridgeGroup.1 (Intercept) 0.38438
## Residual 0.31793
## Number of obs: 344, groups:
## Sequence, 23; run, 18; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -489.8
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
##
## @accession
##
## [1] "WP_003035781"
##
## @annotation
##
## GI.number
## "gi|118497152"
## Protein.names
## "delta-aminolevulinic acid dehydratase [Francisella tularensis]"
##
## @model
##
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 57.1314
## Random effects:
## Groups Name Std.Dev.
## run (Intercept) 0.0000
## biorep (Intercept) 0.2059
## Sequence (Intercept) 1.9048
## ridgeGroup.1 (Intercept) 0.5694
## Residual 0.3522
## Number of obs: 47, groups:
## run, 18; biorep, 6; Sequence, 3; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -173.3
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
##
## @accession
##
## [1] "WP_003039212"
##
## @annotation
##
## GI.number
## "gi|118497492"
## Protein.names
## "phosphorylase [Francisella tularensis]"
##
## @model
##
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 9.2867
## Random effects:
## Groups Name Std.Dev.
## run (Intercept) 0.0000
## Sequence (Intercept) 0.6635
## biorep (Intercept) 0.0000
## ridgeGroup.1 (Intercept) 0.3159
## Residual 0.2026
## Number of obs: 62, groups:
## run, 18; Sequence, 7; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -192.6
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
##
## 1182 more elements...
The numbers in the Std.Dev.
column give the estimated standard deviations of the random effects. Effects starting with ridgeGroup
indicate shrunken fixed effects (we make use of the random effects framework of the lm4
package in order to estimate Ridge penalties). However, as the fixed effects in the ridgeGroups are orthogonalized using the Gramm-Schmidt procedure, these numbers are not very informative.
A protLM
object has three slots:
accession
slot: a vector containing all the accessionsmodel
slot: a list of all fitted models (one for each accession)annotation
slot: data frame of which each row (one for each accession) corresponds to the annotations of a different accession.These slots can be accessed using the getAccessions
, getModels
and getAnnotations
functions respectively. Here we give the accessions, models and annotations for the first 5 models.
getAccessions(protLMFranc)[1:5]
## [1] WP_003038655 WP_003041237 WP_003038915 WP_003038264 WP_003035781
## 1188 Levels: WP_003013731 WP_003013860 WP_003013909 ... WP_011733734
getModels(protLMFranc)[1:5]
## [[1]]
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 66.4301
## Random effects:
## Groups Name Std.Dev.
## run (Intercept) 0.0000
## Sequence (Intercept) 0.6682
## biorep (Intercept) 0.2247
## ridgeGroup.1 (Intercept) 1.0983
## Residual 0.2524
## Number of obs: 102, groups:
## run, 18; Sequence, 7; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -260
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
##
## [[2]]
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 136.4625
## Random effects:
## Groups Name Std.Dev.
## run (Intercept) 0.00000
## Sequence (Intercept) 1.28720
## biorep (Intercept) 0.09322
## ridgeGroup.1 (Intercept) 0.00000
## Residual 0.27463
## Number of obs: 193, groups:
## run, 18; Sequence, 12; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -370.9
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
##
## [[3]]
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 340.3599
## Random effects:
## Groups Name Std.Dev.
## Sequence (Intercept) 1.07373
## run (Intercept) 0.00000
## biorep (Intercept) 0.05502
## ridgeGroup.1 (Intercept) 2.11989
## Residual 0.40125
## Number of obs: 221, groups:
## Sequence, 18; run, 18; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -365.9
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
##
## [[4]]
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 378.4025
## Random effects:
## Groups Name Std.Dev.
## Sequence (Intercept) 1.02426
## run (Intercept) 0.09019
## biorep (Intercept) 0.11985
## ridgeGroup.1 (Intercept) 0.38438
## Residual 0.31793
## Number of obs: 344, groups:
## Sequence, 23; run, 18; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -489.8
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
##
## [[5]]
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 57.1314
## Random effects:
## Groups Name Std.Dev.
## run (Intercept) 0.0000
## biorep (Intercept) 0.2059
## Sequence (Intercept) 1.9048
## ridgeGroup.1 (Intercept) 0.5694
## Residual 0.3522
## Number of obs: 47, groups:
## run, 18; biorep, 6; Sequence, 3; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -173.3
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
getAnnotations(protLMFranc)[1:5,]
## GI.number
## [1,] "gi|118497196"
## [2,] "gi|118498194"
## [3,] "gi|118497331"
## [4,] "gi|118496879"
## [5,] "gi|118497152"
## Protein.names
## [1,] "hypothetical protein [Francisella tularensis]"
## [2,] "phosphoglucosamine mutase [Francisella tularensis]"
## [3,] "glycine--tRNA ligase subunit beta [Francisella tularensis]"
## [4,] "molecular chaperone HtpG [Francisella tularensis]"
## [5,] "delta-aminolevulinic acid dehydratase [Francisella tularensis]"
protLM
objects can be subsetted using either their numeric index or their corresponding accession.
Retain only the part corresponding to model 24 and model 50.
protLMFranc[c(24,50)]
## @accession
##
## [1] "WP_003035135"
##
## @annotation
##
## GI.number
## "gi|118498204"
## Protein.names
## "superoxide dismutase [Francisella tularensis]"
##
## @model
##
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 130.6537
## Random effects:
## Groups Name Std.Dev.
## run (Intercept) 0.00000
## Sequence (Intercept) 1.48360
## biorep (Intercept) 0.05107
## ridgeGroup.1 (Intercept) 1.40062
## Residual 0.29509
## Number of obs: 148, groups:
## run, 18; Sequence, 10; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -344.8
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
##
## @accession
##
## [1] "WP_003041024"
##
## @annotation
##
## GI.number
## "gi|118496740"
## Protein.names
## "acetate kinase [Francisella tularensis]"
##
## @model
##
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 26.4739
## Random effects:
## Groups Name Std.Dev.
## run (Intercept) 0.0000
## biorep (Intercept) 0.2982
## Sequence (Intercept) 0.3214
## ridgeGroup.1 (Intercept) 0.0000
## Residual 0.2708
## Number of obs: 36, groups:
## run, 16; biorep, 6; Sequence, 4; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -147.4
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
Inspect the protLM
object corresponding to accession "WP_003033643"
.
protLMFranc["WP_003033643"]
## @accession
##
## [1] "WP_003033643"
##
## @annotation
##
## GI.number
## "gi|118497379"
## Protein.names
## "isochorismatase [Francisella tularensis]"
##
## @model
##
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 5.8471
## Random effects:
## Groups Name Std.Dev.
## run (Intercept) 0.0000
## biorep (Intercept) 0.1829
## Sequence (Intercept) 1.1305
## ridgeGroup.1 (Intercept) 0.3079
## Residual 0.9620
## Number of obs: 3, groups: run, 3; biorep, 2; Sequence, 1; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -36.64
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
Say we want to inspect cell division protein "WP_003040227"
. We can then select protein "WP_003040227"
from the protLM
object. We can extract the model via the getModels
function. simplify=TRUE
is the default setting and indicates that if you select one model, you only want this model and not a list with as only element this model. Note that if you select multiple models, you automatically receive a list of models.
modelWP_003040227 <- getModels(protLMFranc["WP_003040227"], simplify=TRUE)
modelWP_003040227
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 35.3401
## Random effects:
## Groups Name Std.Dev.
## run (Intercept) 0.0000
## biorep (Intercept) 0.5312
## Sequence (Intercept) 0.0000
## ridgeGroup.1 (Intercept) 0.8636
## Residual 0.4931
## Number of obs: 20, groups:
## run, 15; biorep, 5; Sequence, 2; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob
## -111.7
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
The betaBVcovDf
function extracts a list from a model containing the following four elements:
NULL
.betaBVcovDf <- getBetaVcovDf(modelWP_003040227)
betaBVcovDf$beta
## [,1]
## (Intercept) 2.478692e+01
## runX1WT_20_2h_n4_1 -3.268340e-19
## runX1WT_20_2h_n4_2 6.312569e-19
## runX1WT_20_2h_n4_3 -5.514836e-19
## runX1WT_20_2h_n5_1 -2.653938e-19
## runX1WT_20_2h_n5_2 1.300820e-19
## runX1WT_20_2h_n5_3 3.002867e-20
## runX3D8_20_2h_n3_1 6.312569e-19
## runX3D8_20_2h_n3_2 5.345092e-20
## runX3D8_20_2h_n3_3 -7.547956e-20
## runX3D8_20_2h_n4_1 3.677317e-19
## runX3D8_20_2h_n4_2 1.418812e-19
## runX3D8_20_2h_n4_3 -3.821573e-19
## runX3D8_20_2h_n5_1 -3.498709e-19
## runX3D8_20_2h_n5_2 -1.035900e-19
## runX3D8_20_2h_n5_3 6.912076e-20
## biorepb_2 -2.866945e-01
## biorepb_3 -1.221728e-01
## biorepb_4 7.069615e-01
## biorepb_5 1.479023e-01
## biorepb_6 -4.459965e-01
## SequenceMVFTFYNNLK 6.400946e-19
## SequenceYTYIYQIASFR -6.400946e-19
## genotypeKO 2.573084e-01
betaBVcovDf$vcov
## (Intercept) runX1WT_20_2h_n4_1 runX1WT_20_2h_n4_2
## (Intercept) 4.644336e-01 -1.133292e-19 -7.590798e-20
## runX1WT_20_2h_n4_1 -1.133292e-19 1.000000e-18 2.081844e-37
## runX1WT_20_2h_n4_2 -7.590798e-20 2.081844e-37 1.000000e-18
## runX1WT_20_2h_n4_3 -1.133292e-19 3.108154e-37 2.081844e-37
## runX1WT_20_2h_n5_1 -1.036390e-19 2.528956e-38 1.693897e-38
## runX1WT_20_2h_n5_2 -1.036390e-19 2.528956e-38 1.693897e-38
## runX1WT_20_2h_n5_3 -1.036390e-19 2.528956e-38 1.693897e-38
## runX3D8_20_2h_n3_1 -1.868609e-20 4.559704e-39 3.054093e-39
## runX3D8_20_2h_n3_2 -4.832685e-20 1.179252e-38 7.898639e-39
## runX3D8_20_2h_n3_3 -4.832685e-20 1.179252e-38 7.898639e-39
## runX3D8_20_2h_n4_1 -2.678135e-20 6.535074e-39 4.377198e-39
## runX3D8_20_2h_n4_2 -5.356270e-20 1.307015e-38 8.754397e-39
## runX3D8_20_2h_n4_3 -5.356270e-20 1.307015e-38 8.754397e-39
## runX3D8_20_2h_n5_1 -4.575672e-20 1.116537e-38 7.478572e-39
## runX3D8_20_2h_n5_2 -4.575672e-20 1.116537e-38 7.478572e-39
## runX3D8_20_2h_n5_3 -4.575672e-20 1.116537e-38 7.478572e-39
## biorepb_2 -3.511045e-01 -1.974862e-19 -1.322764e-19
## biorepb_3 -3.607947e-01 8.803963e-20 5.896902e-20
## biorepb_4 -1.338428e-01 3.265976e-20 2.187553e-20
## biorepb_5 -1.553883e-01 3.791720e-20 2.539697e-20
## biorepb_6 -1.592912e-01 3.886959e-20 2.603489e-20
## SequenceMVFTFYNNLK -5.668853e-19 -1.056869e-37 -7.078916e-38
## SequenceYTYIYQIASFR -4.331147e-19 1.056869e-37 7.078916e-38
## genotypeKO -2.822640e-01 6.887691e-20 4.613381e-20
## runX1WT_20_2h_n4_3 runX1WT_20_2h_n5_1
## (Intercept) -1.133292e-19 -1.036390e-19
## runX1WT_20_2h_n4_1 3.108154e-37 2.528956e-38
## runX1WT_20_2h_n4_2 2.081844e-37 1.693897e-38
## runX1WT_20_2h_n4_3 1.000000e-18 2.528956e-38
## runX1WT_20_2h_n5_1 2.528956e-38 1.000000e-18
## runX1WT_20_2h_n5_2 2.528956e-38 2.820767e-37
## runX1WT_20_2h_n5_3 2.528956e-38 2.820767e-37
## runX3D8_20_2h_n3_1 4.559704e-39 4.169826e-39
## runX3D8_20_2h_n3_2 1.179252e-38 1.078420e-38
## runX3D8_20_2h_n3_3 1.179252e-38 1.078420e-38
## runX3D8_20_2h_n4_1 6.535074e-39 5.976293e-39
## runX3D8_20_2h_n4_2 1.307015e-38 1.195259e-38
## runX3D8_20_2h_n4_3 1.307015e-38 1.195259e-38
## runX3D8_20_2h_n5_1 1.116537e-38 1.021067e-38
## runX3D8_20_2h_n5_2 1.116537e-38 1.021067e-38
## runX3D8_20_2h_n5_3 1.116537e-38 1.021067e-38
## biorepb_2 -1.974862e-19 7.834942e-20
## biorepb_3 8.803963e-20 -1.784378e-19
## biorepb_4 3.265976e-20 2.986719e-20
## biorepb_5 3.791720e-20 3.467509e-20
## biorepb_6 3.886959e-20 3.554605e-20
## SequenceMVFTFYNNLK -1.056869e-37 1.265012e-37
## SequenceYTYIYQIASFR 1.056869e-37 -1.265012e-37
## genotypeKO 6.887691e-20 6.298759e-20
## runX1WT_20_2h_n5_2 runX1WT_20_2h_n5_3
## (Intercept) -1.036390e-19 -1.036390e-19
## runX1WT_20_2h_n4_1 2.528956e-38 2.528956e-38
## runX1WT_20_2h_n4_2 1.693897e-38 1.693897e-38
## runX1WT_20_2h_n4_3 2.528956e-38 2.528956e-38
## runX1WT_20_2h_n5_1 2.820767e-37 2.820767e-37
## runX1WT_20_2h_n5_2 1.000000e-18 2.820767e-37
## runX1WT_20_2h_n5_3 2.820767e-37 1.000000e-18
## runX3D8_20_2h_n3_1 4.169826e-39 4.169826e-39
## runX3D8_20_2h_n3_2 1.078420e-38 1.078420e-38
## runX3D8_20_2h_n3_3 1.078420e-38 1.078420e-38
## runX3D8_20_2h_n4_1 5.976293e-39 5.976293e-39
## runX3D8_20_2h_n4_2 1.195259e-38 1.195259e-38
## runX3D8_20_2h_n4_3 1.195259e-38 1.195259e-38
## runX3D8_20_2h_n5_1 1.021067e-38 1.021067e-38
## runX3D8_20_2h_n5_2 1.021067e-38 1.021067e-38
## runX3D8_20_2h_n5_3 1.021067e-38 1.021067e-38
## biorepb_2 7.834942e-20 7.834942e-20
## biorepb_3 -1.784378e-19 -1.784378e-19
## biorepb_4 2.986719e-20 2.986719e-20
## biorepb_5 3.467509e-20 3.467509e-20
## biorepb_6 3.554605e-20 3.554605e-20
## SequenceMVFTFYNNLK 1.265012e-37 1.265012e-37
## SequenceYTYIYQIASFR -1.265012e-37 -1.265012e-37
## genotypeKO 6.298759e-20 6.298759e-20
## runX3D8_20_2h_n3_1 runX3D8_20_2h_n3_2
## (Intercept) -1.868609e-20 -4.832685e-20
## runX1WT_20_2h_n4_1 4.559704e-39 1.179252e-38
## runX1WT_20_2h_n4_2 3.054093e-39 7.898639e-39
## runX1WT_20_2h_n4_3 4.559704e-39 1.179252e-38
## runX1WT_20_2h_n5_1 4.169826e-39 1.078420e-38
## runX1WT_20_2h_n5_2 4.169826e-39 1.078420e-38
## runX1WT_20_2h_n5_3 4.169826e-39 1.078420e-38
## runX3D8_20_2h_n3_1 1.000000e-18 1.287695e-37
## runX3D8_20_2h_n3_2 1.287695e-37 1.000000e-18
## runX3D8_20_2h_n3_3 1.287695e-37 3.330298e-37
## runX3D8_20_2h_n4_1 5.397077e-39 1.395817e-38
## runX3D8_20_2h_n4_2 1.079415e-38 2.791634e-38
## runX3D8_20_2h_n4_3 1.079415e-38 2.791634e-38
## runX3D8_20_2h_n5_1 9.221064e-39 2.384795e-38
## runX3D8_20_2h_n5_2 9.221064e-39 2.384795e-38
## runX3D8_20_2h_n5_3 9.221064e-39 2.384795e-38
## biorepb_2 1.412639e-20 3.653433e-20
## biorepb_3 1.451627e-20 3.754265e-20
## biorepb_4 -9.205803e-20 -2.380848e-19
## biorepb_5 3.131442e-20 8.098680e-20
## biorepb_6 3.210096e-20 8.302100e-20
## SequenceMVFTFYNNLK -3.713523e-38 -9.604086e-38
## SequenceYTYIYQIASFR 3.713523e-38 9.604086e-38
## genotypeKO -1.802540e-20 -4.661813e-20
## runX3D8_20_2h_n3_3 runX3D8_20_2h_n4_1
## (Intercept) -4.832685e-20 -2.678135e-20
## runX1WT_20_2h_n4_1 1.179252e-38 6.535074e-39
## runX1WT_20_2h_n4_2 7.898639e-39 4.377198e-39
## runX1WT_20_2h_n4_3 1.179252e-38 6.535074e-39
## runX1WT_20_2h_n5_1 1.078420e-38 5.976293e-39
## runX1WT_20_2h_n5_2 1.078420e-38 5.976293e-39
## runX1WT_20_2h_n5_3 1.078420e-38 5.976293e-39
## runX3D8_20_2h_n3_1 1.287695e-37 5.397077e-39
## runX3D8_20_2h_n3_2 3.330298e-37 1.395817e-38
## runX3D8_20_2h_n3_3 1.000000e-18 1.395817e-38
## runX3D8_20_2h_n4_1 1.395817e-38 1.000000e-18
## runX3D8_20_2h_n4_2 2.791634e-38 3.566651e-37
## runX3D8_20_2h_n4_3 2.791634e-38 3.566651e-37
## runX3D8_20_2h_n5_1 2.384795e-38 1.321585e-38
## runX3D8_20_2h_n5_2 2.384795e-38 1.321585e-38
## runX3D8_20_2h_n5_3 2.384795e-38 1.321585e-38
## biorepb_2 3.653433e-20 2.024628e-20
## biorepb_3 3.754265e-20 2.080506e-20
## biorepb_4 -2.380848e-19 3.865760e-20
## biorepb_5 8.098680e-20 -1.257168e-19
## biorepb_6 8.302100e-20 4.600785e-20
## SequenceMVFTFYNNLK -9.604086e-38 -3.944178e-37
## SequenceYTYIYQIASFR 9.604086e-38 3.944178e-37
## genotypeKO -4.661813e-20 -2.583443e-20
## runX3D8_20_2h_n4_2 runX3D8_20_2h_n4_3
## (Intercept) -5.356270e-20 -5.356270e-20
## runX1WT_20_2h_n4_1 1.307015e-38 1.307015e-38
## runX1WT_20_2h_n4_2 8.754397e-39 8.754397e-39
## runX1WT_20_2h_n4_3 1.307015e-38 1.307015e-38
## runX1WT_20_2h_n5_1 1.195259e-38 1.195259e-38
## runX1WT_20_2h_n5_2 1.195259e-38 1.195259e-38
## runX1WT_20_2h_n5_3 1.195259e-38 1.195259e-38
## runX3D8_20_2h_n3_1 1.079415e-38 1.079415e-38
## runX3D8_20_2h_n3_2 2.791634e-38 2.791634e-38
## runX3D8_20_2h_n3_3 2.791634e-38 2.791634e-38
## runX3D8_20_2h_n4_1 3.566651e-37 3.566651e-37
## runX3D8_20_2h_n4_2 1.000000e-18 7.133303e-37
## runX3D8_20_2h_n4_3 7.133303e-37 1.000000e-18
## runX3D8_20_2h_n5_1 2.643169e-38 2.643169e-38
## runX3D8_20_2h_n5_2 2.643169e-38 2.643169e-38
## runX3D8_20_2h_n5_3 2.643169e-38 2.643169e-38
## biorepb_2 4.049255e-20 4.049255e-20
## biorepb_3 4.161011e-20 4.161011e-20
## biorepb_4 7.731521e-20 7.731521e-20
## biorepb_5 -2.514336e-19 -2.514336e-19
## biorepb_6 9.201570e-20 9.201570e-20
## SequenceMVFTFYNNLK 2.111644e-37 2.111644e-37
## SequenceYTYIYQIASFR -2.111644e-37 -2.111644e-37
## genotypeKO -5.166885e-20 -5.166885e-20
## runX3D8_20_2h_n5_1 runX3D8_20_2h_n5_2
## (Intercept) -4.575672e-20 -4.575672e-20
## runX1WT_20_2h_n4_1 1.116537e-38 1.116537e-38
## runX1WT_20_2h_n4_2 7.478572e-39 7.478572e-39
## runX1WT_20_2h_n4_3 1.116537e-38 1.116537e-38
## runX1WT_20_2h_n5_1 1.021067e-38 1.021067e-38
## runX1WT_20_2h_n5_2 1.021067e-38 1.021067e-38
## runX1WT_20_2h_n5_3 1.021067e-38 1.021067e-38
## runX3D8_20_2h_n3_1 9.221064e-39 9.221064e-39
## runX3D8_20_2h_n3_2 2.384795e-38 2.384795e-38
## runX3D8_20_2h_n3_3 2.384795e-38 2.384795e-38
## runX3D8_20_2h_n4_1 1.321585e-38 1.321585e-38
## runX3D8_20_2h_n4_2 2.643169e-38 2.643169e-38
## runX3D8_20_2h_n4_3 2.643169e-38 2.643169e-38
## runX3D8_20_2h_n5_1 1.000000e-18 6.055208e-37
## runX3D8_20_2h_n5_2 6.055208e-37 1.000000e-18
## runX3D8_20_2h_n5_3 6.055208e-37 6.055208e-37
## biorepb_2 3.459136e-20 3.459136e-20
## biorepb_3 3.554605e-20 3.554605e-20
## biorepb_4 6.604765e-20 6.604765e-20
## biorepb_5 7.667975e-20 7.667975e-20
## biorepb_6 -2.128648e-19 -2.128648e-19
## SequenceMVFTFYNNLK 3.465504e-38 3.465504e-38
## SequenceYTYIYQIASFR -3.465504e-38 -3.465504e-38
## genotypeKO -4.413888e-20 -4.413888e-20
## runX3D8_20_2h_n5_3 biorepb_2 biorepb_3
## (Intercept) -4.575672e-20 -3.511045e-01 -3.607947e-01
## runX1WT_20_2h_n4_1 1.116537e-38 -1.974862e-19 8.803963e-20
## runX1WT_20_2h_n4_2 7.478572e-39 -1.322764e-19 5.896902e-20
## runX1WT_20_2h_n4_3 1.116537e-38 -1.974862e-19 8.803963e-20
## runX1WT_20_2h_n5_1 1.021067e-38 7.834942e-20 -1.784378e-19
## runX1WT_20_2h_n5_2 1.021067e-38 7.834942e-20 -1.784378e-19
## runX1WT_20_2h_n5_3 1.021067e-38 7.834942e-20 -1.784378e-19
## runX3D8_20_2h_n3_1 9.221064e-39 1.412639e-20 1.451627e-20
## runX3D8_20_2h_n3_2 2.384795e-38 3.653433e-20 3.754265e-20
## runX3D8_20_2h_n3_3 2.384795e-38 3.653433e-20 3.754265e-20
## runX3D8_20_2h_n4_1 1.321585e-38 2.024628e-20 2.080506e-20
## runX3D8_20_2h_n4_2 2.643169e-38 4.049255e-20 4.161011e-20
## runX3D8_20_2h_n4_3 2.643169e-38 4.049255e-20 4.161011e-20
## runX3D8_20_2h_n5_1 6.055208e-37 3.459136e-20 3.554605e-20
## runX3D8_20_2h_n5_2 6.055208e-37 3.459136e-20 3.554605e-20
## runX3D8_20_2h_n5_3 1.000000e-18 3.459136e-20 3.554605e-20
## biorepb_2 3.459136e-20 5.485906e-01 2.727550e-01
## biorepb_3 3.554605e-20 2.727550e-01 5.392324e-01
## biorepb_4 6.604765e-20 1.011830e-01 1.039756e-01
## biorepb_5 7.667975e-20 1.174711e-01 1.207132e-01
## biorepb_6 -2.128648e-19 1.204216e-01 1.237452e-01
## SequenceMVFTFYNNLK 3.465504e-38 -3.274279e-19 4.403841e-19
## SequenceYTYIYQIASFR -3.465504e-38 3.274279e-19 -4.403841e-19
## genotypeKO -4.413888e-20 2.133871e-01 2.192765e-01
## biorepb_4 biorepb_5 biorepb_6
## (Intercept) -1.338428e-01 -1.553883e-01 -1.592912e-01
## runX1WT_20_2h_n4_1 3.265976e-20 3.791720e-20 3.886959e-20
## runX1WT_20_2h_n4_2 2.187553e-20 2.539697e-20 2.603489e-20
## runX1WT_20_2h_n4_3 3.265976e-20 3.791720e-20 3.886959e-20
## runX1WT_20_2h_n5_1 2.986719e-20 3.467509e-20 3.554605e-20
## runX1WT_20_2h_n5_2 2.986719e-20 3.467509e-20 3.554605e-20
## runX1WT_20_2h_n5_3 2.986719e-20 3.467509e-20 3.554605e-20
## runX3D8_20_2h_n3_1 -9.205803e-20 3.131442e-20 3.210096e-20
## runX3D8_20_2h_n3_2 -2.380848e-19 8.098680e-20 8.302100e-20
## runX3D8_20_2h_n3_3 -2.380848e-19 8.098680e-20 8.302100e-20
## runX3D8_20_2h_n4_1 3.865760e-20 -1.257168e-19 4.600785e-20
## runX3D8_20_2h_n4_2 7.731521e-20 -2.514336e-19 9.201570e-20
## runX3D8_20_2h_n4_3 7.731521e-20 -2.514336e-19 9.201570e-20
## runX3D8_20_2h_n5_1 6.604765e-20 7.667975e-20 -2.128648e-19
## runX3D8_20_2h_n5_2 6.604765e-20 7.667975e-20 -2.128648e-19
## runX3D8_20_2h_n5_3 6.604765e-20 7.667975e-20 -2.128648e-19
## biorepb_2 1.011830e-01 1.174711e-01 1.204216e-01
## biorepb_3 1.039756e-01 1.207132e-01 1.237452e-01
## biorepb_4 5.010379e-01 2.242956e-01 2.299293e-01
## biorepb_5 2.242956e-01 4.309991e-01 2.669425e-01
## biorepb_6 2.299293e-01 2.669425e-01 4.193827e-01
## SequenceMVFTFYNNLK -2.659883e-19 3.238866e-20 1.206434e-19
## SequenceYTYIYQIASFR 2.659883e-19 -3.238866e-20 -1.206434e-19
## genotypeKO -1.291104e-01 -1.498941e-01 -1.536591e-01
## SequenceMVFTFYNNLK SequenceYTYIYQIASFR genotypeKO
## (Intercept) -5.668853e-19 -4.331147e-19 -2.822640e-01
## runX1WT_20_2h_n4_1 -1.056869e-37 1.056869e-37 6.887691e-20
## runX1WT_20_2h_n4_2 -7.078916e-38 7.078916e-38 4.613381e-20
## runX1WT_20_2h_n4_3 -1.056869e-37 1.056869e-37 6.887691e-20
## runX1WT_20_2h_n5_1 1.265012e-37 -1.265012e-37 6.298759e-20
## runX1WT_20_2h_n5_2 1.265012e-37 -1.265012e-37 6.298759e-20
## runX1WT_20_2h_n5_3 1.265012e-37 -1.265012e-37 6.298759e-20
## runX3D8_20_2h_n3_1 -3.713523e-38 3.713523e-38 -1.802540e-20
## runX3D8_20_2h_n3_2 -9.604086e-38 9.604086e-38 -4.661813e-20
## runX3D8_20_2h_n3_3 -9.604086e-38 9.604086e-38 -4.661813e-20
## runX3D8_20_2h_n4_1 -3.944178e-37 3.944178e-37 -2.583443e-20
## runX3D8_20_2h_n4_2 2.111644e-37 -2.111644e-37 -5.166885e-20
## runX3D8_20_2h_n4_3 2.111644e-37 -2.111644e-37 -5.166885e-20
## runX3D8_20_2h_n5_1 3.465504e-38 -3.465504e-38 -4.413888e-20
## runX3D8_20_2h_n5_2 3.465504e-38 -3.465504e-38 -4.413888e-20
## runX3D8_20_2h_n5_3 3.465504e-38 -3.465504e-38 -4.413888e-20
## biorepb_2 -3.274279e-19 3.274279e-19 2.133871e-01
## biorepb_3 4.403841e-19 -4.403841e-19 2.192765e-01
## biorepb_4 -2.659883e-19 2.659883e-19 -1.291104e-01
## biorepb_5 3.238866e-20 -3.238866e-20 -1.498941e-01
## biorepb_6 1.206434e-19 -1.206434e-19 -1.536591e-01
## SequenceMVFTFYNNLK 1.000000e-18 3.142651e-36 -7.108561e-20
## SequenceYTYIYQIASFR 3.142651e-36 1.000000e-18 7.108561e-20
## genotypeKO -7.108561e-20 7.108561e-20 4.579926e-01
betaBVcovDf$df
## [1] 18.69393
betaBVcovDf$df_exp
## NULL
betaBVcovDf$sigma
## [1] 0.4618902
The getBetaVcovDfList
function can be applied on (a subset of) the protLM
object protLMFranc
. This returns a list containing one or more lists similar to the betaBVcovDf
list.
betaVcovDfList <- getBetaVcovDfList(protLMFranc[1:2])
#Inspect the structure of the list without printing it out compeletely:
str(betaVcovDfList)
## List of 2
## $ WP_003038655:List of 6
## ..$ beta : num [1:33, 1] 2.57e+01 -9.42e-20 1.96e-19 -5.17e-19 -1.58e-19 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:33] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
## .. .. ..$ : NULL
## ..$ vcov : num [1:33, 1:33] 1.23 -7.08e-20 -1.04e-19 -8.46e-20 -9.38e-20 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:33] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
## .. .. ..$ : chr [1:33] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
## ..$ df : num 93.4
## ..$ sigma : num 0.252
## ..$ df_exp : NULL
## ..$ df_pars: Named num [1:33] 1.00 3.19e-18 3.70e-18 3.48e-18 3.31e-18 ...
## .. ..- attr(*, "names")= chr [1:33] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
## $ WP_003041237:List of 6
## ..$ beta : num [1:38, 1] 2.67e+01 -6.09e-19 -2.51e-20 3.31e-19 -1.80e-18 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:38] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
## .. .. ..$ : NULL
## ..$ vcov : num [1:38, 1:38] 1.86 -5.44e-20 -6.04e-20 -5.10e-20 -3.68e-20 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:38] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
## .. .. ..$ : chr [1:38] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
## ..$ df : num 180
## ..$ sigma : num 0.274
## ..$ df_exp : NULL
## ..$ df_pars: Named num [1:38] 1.00 7.14e-18 7.28e-18 6.98e-18 5.14e-18 ...
## .. ..- attr(*, "names")= chr [1:38] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
You can save your proteinsFranc
and protLMFranc
objects to a .RDatas object that can be loaded in R or in the Shiny graphical user interface. If you want your .RDatas object to be loaded into the Shiny App, you should also specify the levelOptions
, random
and plot2DependentVars
objects. These objects are settings that are saved automatically when using the MSqRob Shiny App. In this example, I save the .RDatas object to my desktop.
#Change to eval=TRUE to execute this piece of code.
result_files_new <- list()
result_files_new$proteins <- proteinsFranc
result_files_new$models <- protLMFranc
### This part is only needed if you want to load your .RDatas file in the MSqRob Shiny App ###
#Should contain all contrast options for the fixed effects (see also 5. Statistical inference)
result_files_new$levelOptions <- c("genotypeWT","genotypeKO")
#Should contain all random effects
result_files_new$random <- c("Sequence","biorep","run")
#Should contain a list with all fixed and random effects
result_files_new$plot2DependentVars <- as.list(c("genotype","Sequence","biorep","run"))
#############################################################################################
saves_MSqRob(result_files, file="/Users/lgoeminn/Desktop/project_Francisella.RDatas") #Change file path to your desired save location!
If you want to load your saved file, first specify the folder where it is saved.
#Change to eval=TRUE to execute this piece of code.
result_path <- "/Users/lgoeminn/Desktop/project_Francisella.RDatas"
To inspect the different objects in the file, you can use the inspect_loads_MSqRob
function.
#Change to eval=TRUE to execute this piece of code.
inspect_loads_MSqRob(result_path)
To load this file and assign new protdata
and protLM
object to it, execute the following piece of code.
#Change to eval=TRUE to execute this piece of code.
#To load the "project_Francisella.RDatas" file
result_files <- loads_MSqRob(result_path)
#assign a new protdata object to the saved proteins
proteinsFranc_result <- result_files$proteins
#assign a new protLM object to the saved models
protLMFranc_result <- result_files$models
We are interested in which proteins differ significantly in abundance between ArgP mutant and wild type Francisella tularensis. Via the "MSqRob_levels"
attribute, you can check which factor levels are present in each model. Pick a model for which there are no missing levels for the factors for which you want to test contrasts and inspect the names. Note that these names are always the combination of the parameter name and the factor level.
attr(modelWP_003040227,"MSqRob_levels")
## [1] "(Intercept)" "genotypeWT" "genotypeKO"
## [4] "runX1WT_20_2h_n4_1" "runX1WT_20_2h_n4_2" "runX1WT_20_2h_n4_3"
## [7] "runX1WT_20_2h_n5_1" "runX1WT_20_2h_n5_2" "runX1WT_20_2h_n5_3"
## [10] "runX3D8_20_2h_n3_1" "runX3D8_20_2h_n3_2" "runX3D8_20_2h_n3_3"
## [13] "runX3D8_20_2h_n4_1" "runX3D8_20_2h_n4_2" "runX3D8_20_2h_n4_3"
## [16] "runX3D8_20_2h_n5_1" "runX3D8_20_2h_n5_2" "runX3D8_20_2h_n5_3"
## [19] "biorepb_2" "biorepb_3" "biorepb_4"
## [22] "biorepb_5" "biorepb_6" "SequenceMVFTFYNNLK"
## [25] "SequenceYTYIYQIASFR"
Here, we notice that the mutant is encoded as "genotypeKO"
and the wild type as "genotypeWT"
. We construct a contrast matrix L
for inferring on the fold change of interest. We are interested in the log2 fold change between mutant and wild type. As all intensities are log2-transformed, we should take the difference between "genotypeKO"
and "genotypeWT"
(because of the following property of logarithms: log2(a/b)=log2(a)-log2(b)).
L <- makeContrast(contrasts=c("genotypeKO-genotypeWT"),
levels=c("genotypeWT","genotypeKO"))
L
## Contrasts
## Levels genotypeKO-genotypeWT
## genotypeWT -1
## genotypeKO 1
Imagine we would want to test the average log2 protein intensity between mutant and wild type, then L
would be constructed as follows:
L2 <- makeContrast(contrasts=c("genotypeKO/2+genotypeWT/2"),
levels=c("genotypeWT","genotypeKO"))
L2
## Contrasts
## Levels genotypeKO/2+genotypeWT/2
## genotypeWT 0.5
## genotypeKO 0.5
Important notification: do not use the estimated parameters beta to determine the factor levels! You will notice for example that the level "genotypeWT"
is missing in the column names of the betas. This is because "genotypeWT"
is the reference class, giving the parameter corresponding to "genotypeKO"
the interpretation of the effect of "genotypeKO"
relative to "genotypeWT"
. It is important however to include "genotypeWT"
in the contrast matrix because for proteins where factor levels are missing, these parameters can get different interpretations. The makeContrast
function takes this into account and will return NA
when a contrast cannot be properly estimated.
betaBVcovDf$beta
## [,1]
## (Intercept) 2.478692e+01
## runX1WT_20_2h_n4_1 -3.268340e-19
## runX1WT_20_2h_n4_2 6.312569e-19
## runX1WT_20_2h_n4_3 -5.514836e-19
## runX1WT_20_2h_n5_1 -2.653938e-19
## runX1WT_20_2h_n5_2 1.300820e-19
## runX1WT_20_2h_n5_3 3.002867e-20
## runX3D8_20_2h_n3_1 6.312569e-19
## runX3D8_20_2h_n3_2 5.345092e-20
## runX3D8_20_2h_n3_3 -7.547956e-20
## runX3D8_20_2h_n4_1 3.677317e-19
## runX3D8_20_2h_n4_2 1.418812e-19
## runX3D8_20_2h_n4_3 -3.821573e-19
## runX3D8_20_2h_n5_1 -3.498709e-19
## runX3D8_20_2h_n5_2 -1.035900e-19
## runX3D8_20_2h_n5_3 6.912076e-20
## biorepb_2 -2.866945e-01
## biorepb_3 -1.221728e-01
## biorepb_4 7.069615e-01
## biorepb_5 1.479023e-01
## biorepb_6 -4.459965e-01
## SequenceMVFTFYNNLK 6.400946e-19
## SequenceYTYIYQIASFR -6.400946e-19
## genotypeKO 2.573084e-01
Build a data frame containing estimate, standard error, degrees of freedom, T-value and p-value for each protein in the protLM
object protLMFranc, by using the test.protLMcontrast
function.
resultFranc <- test.protLMcontrast(protLMFranc, L)
The prot.p.adjust
function adds a new column "qval"
to the data frame based on an existing "pval"
column. The new column contains adjusted p-values using one of the methods in the p.adjust.methods
vector.
resultFranc <- prot.p.adjust(resultFranc, method="fdr")
The prot.signif
function needs a matrix (or a list of matrices) containing a column named “pval” and a column named "qval"
. The "pval"
column will be used to sort the matrix. The function generates a new column "signif"
containing 1 for all values of "qval"
with a value lower than the specified FDR level (default: 0.05) and 0 otherwise.
resultFranc <- prot.signif(resultFranc, level = 0.05)
Inspect the first 6 significant proteins:
head(resultFranc,6)
## GI.number
## WP_003040894 gi|118496676
## WP_003026016 gi|118497407
## WP_003023392 gi|118497654
## WP_003040849 gi|118496659
## WP_003020026 gi|118496882
## WP_003038350 gi|118496950
## Protein.names
## WP_003040894 3-isopropylmalate dehydratase large subunit [Francisella tularensis]
## WP_003026016 biotin synthase [Francisella tularensis]
## WP_003023392 tRNA (N6-isopentenyl adenosine(37)-C2)-methylthiotransferase MiaB [Francisella tularensis]
## WP_003040849 hypothetical protein [Francisella tularensis]
## WP_003020026 CTP synthetase [Francisella tularensis]
## WP_003038350 hypothetical protein [Francisella tularensis]
## estimate se df Tval pval
## WP_003040894 -0.9057261 0.05555518 121.14412 -16.30318 2.457599e-32
## WP_003026016 -0.9617749 0.07478739 133.04725 -12.86012 4.225036e-25
## WP_003023392 -1.3965407 0.08530562 44.43458 -16.37103 1.941754e-20
## WP_003040849 -1.1356799 0.09816434 101.51013 -11.56917 3.023500e-20
## WP_003020026 -0.5163597 0.04825840 150.49382 -10.69989 3.184831e-20
## WP_003038350 0.8107740 0.06684619 72.40727 12.12895 4.127913e-19
## qval signif
## WP_003040894 2.831154e-29 TRUE
## WP_003026016 2.433621e-22 TRUE
## WP_003023392 7.337851e-18 TRUE
## WP_003040849 7.337851e-18 TRUE
## WP_003020026 7.337851e-18 TRUE
## WP_003038350 7.925594e-17 TRUE
It is also possible to import your data as a simple data frame and make use of some custom preprocessing pipeline. Here, we will show the same preprocessing pipeline for the Francisella dataset as outlined above, but note that adding, removing or modifying steps in the preprocessing pipeline is now very simple as the data is in data frame format.
First, we import the Francisella dataset as a data frame via the base R function read.table
.
pepFrancDf <- read.table(file = file_peptides_txt, sep = "\t", header = TRUE, quote="", comment.char = "")
Of course, other filetypes can also be imported. Say for example your input file is a comma separated file stored in My Documents on a Windows computer, you could use for example:
#Change to eval=TRUE to execute this piece of code.
pepFrancDf <- read.table("C:/Users/Ludger/Documents/peptides.csv", sep = ",", dec=".", header = TRUE, quote="", comment.char = "")
The data frame pepFrancD
is in wide format.
head(pepFrancDf)
## Sequence Amino.acid.before First.amino.acid
## 1 AAAEELDTR K A
## 2 AAAGFVITASHNK R A
## 3 AAANEYELALAYSIEEVAPDLHK K A
## 4 AAANNPQLEAFK K A
## 5 AAASAGLVDEK K A
## 6 AAASLDLYSYPK K A
## Second.amino.acid Second.last.amino.acid Last.amino.acid
## 1 A T R
## 2 A N K
## 3 A H K
## 4 A F K
## 5 A E K
## 6 A P K
## Amino.acid.after A.Count R.Count N.Count D.Count C.Count Q.Count E.Count
## 1 K 3 1 0 1 0 0 2
## 2 F 4 0 1 0 0 0 0
## 3 Y 6 0 1 1 0 0 4
## 4 K 4 0 2 0 0 1 1
## 5 A 4 0 0 1 0 0 1
## 6 V 3 0 0 1 0 0 0
## G.Count H.Count I.Count L.Count K.Count M.Count F.Count P.Count S.Count
## 1 0 0 0 1 0 0 0 0 0
## 2 1 1 1 0 1 0 1 0 1
## 3 0 1 1 3 1 0 0 1 1
## 4 0 0 0 1 1 0 1 1 0
## 5 1 0 0 1 1 0 0 0 1
## 6 0 0 0 2 1 0 0 1 2
## T.Count W.Count Y.Count V.Count U.Count Length Missed.cleavages
## 1 1 0 0 0 0 9 0
## 2 1 0 0 1 0 13 0
## 3 0 0 2 1 0 23 0
## 4 0 0 0 0 0 12 0
## 5 0 0 0 1 0 11 0
## 6 0 0 2 0 0 12 0
## Mass Proteins GI.number Leading.razor.protein
## 1 974.4669 WP_003038655 gi|118497196 gi|118497196
## 2 1285.6779 WP_003041237 gi|118498194 gi|118498194
## 3 2516.2435 WP_003038915 gi|118497331 gi|118497331
## 4 1272.6463 WP_003038264 gi|118496879 gi|118496879
## 5 1030.5295 WP_003035781 gi|118497152 gi|118497152
## 6 1297.6554 WP_003039212 gi|118497492 gi|118497492
## Protein.names
## 1 hypothetical protein [Francisella tularensis]
## 2 phosphoglucosamine mutase [Francisella tularensis]
## 3 glycine--tRNA ligase subunit beta [Francisella tularensis]
## 4 molecular chaperone HtpG [Francisella tularensis]
## 5 delta-aminolevulinic acid dehydratase [Francisella tularensis]
## 6 phosphorylase [Francisella tularensis]
## Unique..Groups. Unique..Proteins. Charges PEP Score
## 1 yes yes 1,2 1.2531e-04 59.35
## 2 yes yes 2 5.3020e-05 45.825
## 3 yes yes 3 1.0517e-45 84.327
## 4 yes yes 2 8.7549e-24 108.56
## 5 yes yes 2 9.5062e-05 67.385
## 6 yes yes 2 1.2563e-02 31.675
## Experiment.1WT_20_2h_n3_1 Experiment.1WT_20_2h_n3_2
## 1 NA NA
## 2 NA 1
## 3 1 1
## 4 1 1
## 5 1 NA
## 6 NA 1
## Experiment.1WT_20_2h_n3_3 Experiment.1WT_20_2h_n4_1
## 1 1 1
## 2 NA NA
## 3 2 1
## 4 1 1
## 5 1 1
## 6 1 NA
## Experiment.1WT_20_2h_n4_2 Experiment.1WT_20_2h_n4_3
## 1 NA 1
## 2 NA NA
## 3 1 1
## 4 1 1
## 5 1 NA
## 6 1 1
## Experiment.1WT_20_2h_n5_1 Experiment.1WT_20_2h_n5_2
## 1 NA 1
## 2 NA 1
## 3 2 2
## 4 1 1
## 5 1 1
## 6 NA NA
## Experiment.1WT_20_2h_n5_3 Experiment.3D8_20_2h_n3_1
## 1 NA NA
## 2 1 1
## 3 1 1
## 4 1 1
## 5 1 1
## 6 NA NA
## Experiment.3D8_20_2h_n3_2 Experiment.3D8_20_2h_n3_3
## 1 1 NA
## 2 1 NA
## 3 2 1
## 4 1 1
## 5 1 1
## 6 NA NA
## Experiment.3D8_20_2h_n4_1 Experiment.3D8_20_2h_n4_2
## 1 NA NA
## 2 1 1
## 3 1 1
## 4 1 1
## 5 1 1
## 6 NA 1
## Experiment.3D8_20_2h_n4_3 Experiment.3D8_20_2h_n5_1
## 1 NA NA
## 2 1 1
## 3 1 NA
## 4 1 1
## 5 NA 1
## 6 NA NA
## Experiment.3D8_20_2h_n5_2 Experiment.3D8_20_2h_n5_3 Intensity
## 1 NA NA 1.3295e+09
## 2 NA NA 1.6719e+08
## 3 1 1 5.5675e+08
## 4 1 1 1.4830e+10
## 5 1 1 4.8209e+09
## 6 NA NA 1.3715e+08
## Intensity.1WT_20_2h_n3_1 Intensity.1WT_20_2h_n3_2
## 1 0 0
## 2 0 20679000
## 3 28853000 28091000
## 4 339830000 420390000
## 5 240280000 0
## 6 0 31069000
## Intensity.1WT_20_2h_n3_3 Intensity.1WT_20_2h_n4_1
## 1 9290400 92996000
## 2 0 0
## 3 30436000 19476000
## 4 393930000 235060000
## 5 241890000 222250000
## 6 31448000 0
## Intensity.1WT_20_2h_n4_2 Intensity.1WT_20_2h_n4_3
## 1 0 98059000
## 2 0 0
## 3 22050000 20687000
## 4 381090000 360270000
## 5 189930000 0
## 6 27721000 28657000
## Intensity.1WT_20_2h_n5_1 Intensity.1WT_20_2h_n5_2
## 1 0 80803000
## 2 0 17358000
## 3 36588000 28654000
## 4 255710000 311370000
## 5 160440000 147230000
## 6 0 0
## Intensity.1WT_20_2h_n5_3 Intensity.3D8_20_2h_n3_1
## 1 0 0
## 2 13841000 12278000
## 3 7009100 14174000
## 4 271440000 225140000
## 5 143340000 135910000
## 6 0 0
## Intensity.3D8_20_2h_n3_2 Intensity.3D8_20_2h_n3_3
## 1 95433000 0
## 2 10712000 0
## 3 8548000 7859200
## 4 289880000 282500000
## 5 185740000 168620000
## 6 0 0
## Intensity.3D8_20_2h_n4_1 Intensity.3D8_20_2h_n4_2
## 1 0 0
## 2 9500700 11111000
## 3 14871000 11690000
## 4 209490000 223610000
## 5 160240000 158000000
## 6 0 18253000
## Intensity.3D8_20_2h_n4_3 Intensity.3D8_20_2h_n5_1
## 1 0 0
## 2 8723500 8125200
## 3 4470500 0
## 4 219700000 169120000
## 5 0 92542000
## 6 0 0
## Intensity.3D8_20_2h_n5_2 Intensity.3D8_20_2h_n5_3 Reverse Contaminant id
## 1 0 0 0
## 2 0 0 1
## 3 4048100 3422300 2
## 4 178150000 170400000 3
## 5 102350000 100090000 4
## 6 0 0 5
## Protein.group.IDs Mod..peptide.IDs
## 1 419 0
## 2 1150 1
## 3 513 2
## 4 199 3
## 5 388 4
## 6 624 5
## Evidence.IDs
## 1 0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18
## 2 19;20;21;22;23;24;25;26;27;28;29;30;31;32;33
## 3 34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76
## 4 77;78;79;80;81;82;83;84;85;86;87;88;89;90;91;92;93;94;95;96;97;98;99;100;101;102;103;104;105;106;107;108;109;110;111;112;113;114;115;116;117;118;119;120;121;122;123;124
## 5 125;126;127;128;129;130;131;132;133;134;135;136;137;138;139;140;141;142;143;144;145;146;147;148;149;150;151;152;153;154;155
## 6 156;157;158;159;160
## MS.MS.IDs
## 1 0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18
## 2 19;20;21;22;23;24;25;26;27;28;29;30;31;32;33
## 3 34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76;77;78;79
## 4 80;81;82;83;84;85;86;87;88;89;90;91;92;93;94;95;96;97;98;99;100;101;102;103;104;105;106;107;108;109;110;111;112;113;114;115;116;117;118;119;120;121;122;123;124;125;126;127;128;129;130;131;132;133;134;135;136;137;138;139;140;141;142;143;144;145;146;147;148;149;150;151;152;153;154;155;156;157;158;159;160;161;162;163;164;165;166;167;168;169;170;171;172
## 5 173;174;175;176;177;178;179;180;181;182;183;184;185;186;187;188;189;190;191;192;193;194;195;196;197;198;199;200;201;202;203
## 6 204;205;206;207;208
## Best.MS.MS Oxidation..M..site.IDs
## 1 15
## 2 26
## 3 39
## 4 159
## 5 191
## 6 205
If we want to mimick the MaxQuant preprocessing pipeline, we need to add a factor that indicates whether a protein is only identified by modified peptides. This requires some basic R programming. The information about which proteins are only identified by modified peptides can be found in the proteinGroups.txt file. We again make use of the proteinGroups.txt file that is delivered with the package.
#Import the proteinGroups.txt file delivered with the MSqRobData package
file_proteinGroups <- system.file("extdata/Francisella", "proteinGroups.txt", package = "MSqRobData")
proteinGroups <- read.table(file_proteinGroups, sep="\t", header=TRUE, quote="", comment.char = "")
#Extract the column that indicates which proteins are only identified by a modification site
only_site <- proteinGroups$Only.identified.by.site
#If there are no such proteins (as is the case here), the column will be completely empty and R will import this by default as "NA". However, we want the an empty value instead of "NA" to keep this column consistent with the "Contaminant" and "Reverse" columns.
only_site[is.na(only_site)] <- ""
#Select the protein accessions that are only identified by one or more modified peptides
filter_symbol <- "+"
removed_proteins <- proteinGroups$Protein.IDs[only_site==filter_symbol]
#Create a logical variable "rem" that holds "TRUE" if a row in the pepFrancDf data frame should be removed and FALSE otherwise.
accession <- "Proteins"
rem <- as.character(pepFrancDf[,accession]) %in% as.character(removed_proteins)
#Create a new column "only_site" in "pepFrancDf" that indicates with a "+" which rows should be removed.
pepFrancDf$only_site <- ""
pepFrancDf$only_site[rem] <- "+"
Note that in this particular case, the previous chunck of code was superfluous, as there are no proteins in the dataset that are only identified by modified peptides.
We can easily preprocess the data frame using the preprocess_wide
function (for data in “long” format, use the preprocess_long
function). This function uses our standard preprocessing pipeline on data frames in “wide” format (i.e. the data frame has one observation row per subject with each measurement present as a different variable).
We already made the exp_annotation
object before (see “2.1. Create an experimental annotation data frame.”), so we will not repeat this code again here. Importantly, however, contrary to the exprs
slot in the peptidesFranc2
object created before, the names of all intensity columns in data frame pepFrancDf
start with “Intensity.”.
This is because the default setting remove_pattern=TRUE
in the import2MSnSet
function removes the prefix "Intensity\ "
from the columns containing the intensities as this allows users that are unfamiliar with R to just copy and paste the names they provided in MaxQuant’s “Experiment” column as an input. Here, we just imported the peptides.txt file as it is. We thus create a new experimental annotation data frame called exp_annotation2
with run names starting with “Intensity.” (a space is not syntactically valid in R, so it is automatically converted to a dot).
runs2 <- colnames(pepFrancDf)[grepl("Intensity.",colnames(pepFrancDf))]
runs2
## [1] "Intensity.1WT_20_2h_n3_1" "Intensity.1WT_20_2h_n3_2"
## [3] "Intensity.1WT_20_2h_n3_3" "Intensity.1WT_20_2h_n4_1"
## [5] "Intensity.1WT_20_2h_n4_2" "Intensity.1WT_20_2h_n4_3"
## [7] "Intensity.1WT_20_2h_n5_1" "Intensity.1WT_20_2h_n5_2"
## [9] "Intensity.1WT_20_2h_n5_3" "Intensity.3D8_20_2h_n3_1"
## [11] "Intensity.3D8_20_2h_n3_2" "Intensity.3D8_20_2h_n3_3"
## [13] "Intensity.3D8_20_2h_n4_1" "Intensity.3D8_20_2h_n4_2"
## [15] "Intensity.3D8_20_2h_n4_3" "Intensity.3D8_20_2h_n5_1"
## [17] "Intensity.3D8_20_2h_n5_2" "Intensity.3D8_20_2h_n5_3"
#Extract the relevant part of the mass spec run names that points to the genotype.
genotype2 <- factor(substr(runs2, 11,13))
genotype2
## [1] 1WT 1WT 1WT 1WT 1WT 1WT 1WT 1WT 1WT 3D8 3D8 3D8 3D8 3D8 3D8 3D8 3D8
## [18] 3D8
## Levels: 1WT 3D8
n <- nchar(runs2)
biorep2 <- as.factor(paste0("b_",factor(as.numeric(factor(paste0(substr(runs2, 10,10),substr(runs2, n-2,n-2))))+c(rep(0,9),rep(3,9)))))
biorep
## [1] b_1 b_1 b_1 b_2 b_2 b_2 b_3 b_3 b_3 b_4 b_4 b_4 b_5 b_5 b_5 b_6 b_6
## [18] b_6
## Levels: b_1 b_2 b_3 b_4 b_5 b_6
exp_annotation2 <- data.frame(run=runs2, genotype=genotype2, biorep=biorep2)
exp_annotation2
## run genotype biorep
## 1 Intensity.1WT_20_2h_n3_1 1WT b_1
## 2 Intensity.1WT_20_2h_n3_2 1WT b_1
## 3 Intensity.1WT_20_2h_n3_3 1WT b_1
## 4 Intensity.1WT_20_2h_n4_1 1WT b_2
## 5 Intensity.1WT_20_2h_n4_2 1WT b_2
## 6 Intensity.1WT_20_2h_n4_3 1WT b_2
## 7 Intensity.1WT_20_2h_n5_1 1WT b_3
## 8 Intensity.1WT_20_2h_n5_2 1WT b_3
## 9 Intensity.1WT_20_2h_n5_3 1WT b_3
## 10 Intensity.3D8_20_2h_n3_1 3D8 b_4
## 11 Intensity.3D8_20_2h_n3_2 3D8 b_4
## 12 Intensity.3D8_20_2h_n3_3 3D8 b_4
## 13 Intensity.3D8_20_2h_n4_1 3D8 b_5
## 14 Intensity.3D8_20_2h_n4_2 3D8 b_5
## 15 Intensity.3D8_20_2h_n4_3 3D8 b_5
## 16 Intensity.3D8_20_2h_n5_1 3D8 b_6
## 17 Intensity.3D8_20_2h_n5_2 3D8 b_6
## 18 Intensity.3D8_20_2h_n5_3 3D8 b_6
The preprocess_wide
function allows for the same preprocessing as the preprocess_MaxQuant
function on a regular data frame. If you inspect this function, you will notice that the same 8 preprocessing steps are executed. Remember that you can do any preprocessing you like using basic R data frame manipulations. The split
argument indicates which character string is used to separate the accession groups. The exp_annotation
argument, like before, contains the experimental annotation data frame or a character string indicating the filepath where the experimental annotation is saved as an Excel file or a tab-delimited file. The quant_cols
argument contains a vector of names or a vector of numeric indices pointing to the columns containing the quantitative values of interest (mostly intensities or areas under the curve). If quant_cols
contains only one character string, each column containing this pattern will be regarded as a column containing the quantitative values of interest. aggr_by
indicates the column by which the data should be aggregated. Here, the data is already aggregated to the peptide level (i.e. over different charge states and modification statuses). We need no further aggregation, thus we provide the “Sequence” column (the level to which aggregation has already been done). aggr_function
will be superfluous here, but if further aggregation would have been necessary, raw intensities would be summed up (aggr_function="sum"
). The other parameters are similar to the preprocess_MaxQuant
function.
pepFrancDf2 <- preprocess_wide(pepFrancDf, accession="Proteins", split=";", exp_annotation=exp_annotation2, quant_cols="Intensity.", aggr_by="Sequence", aggr_function="sum", logtransform=TRUE, base=2, normalisation="quantiles", smallestUniqueGroups=TRUE, useful_properties=c("GI.number","Protein.names","Sequence"), filter=c("Contaminant","Reverse","only_site"), filter_symbol="+", minIdentified=2)
Note that the experimental annotation is now present as an attribute of the pepFrancDf2
data frame.
attr(pepFrancDf2,"MSqRob_exp_annotation")
## run genotype biorep
## 1 Intensity.1WT_20_2h_n3_1 1WT b_1
## 2 Intensity.1WT_20_2h_n3_2 1WT b_1
## 3 Intensity.1WT_20_2h_n3_3 1WT b_1
## 4 Intensity.1WT_20_2h_n4_1 1WT b_2
## 5 Intensity.1WT_20_2h_n4_2 1WT b_2
## 6 Intensity.1WT_20_2h_n4_3 1WT b_2
## 7 Intensity.1WT_20_2h_n5_1 1WT b_3
## 8 Intensity.1WT_20_2h_n5_2 1WT b_3
## 9 Intensity.1WT_20_2h_n5_3 1WT b_3
## 10 Intensity.3D8_20_2h_n3_1 3D8 b_4
## 11 Intensity.3D8_20_2h_n3_2 3D8 b_4
## 12 Intensity.3D8_20_2h_n3_3 3D8 b_4
## 13 Intensity.3D8_20_2h_n4_1 3D8 b_5
## 14 Intensity.3D8_20_2h_n4_2 3D8 b_5
## 15 Intensity.3D8_20_2h_n4_3 3D8 b_5
## 16 Intensity.3D8_20_2h_n5_1 3D8 b_6
## 17 Intensity.3D8_20_2h_n5_2 3D8 b_6
## 18 Intensity.3D8_20_2h_n5_3 3D8 b_6
After custom preprocessing, you can transform the data to a protdata
object via the df2protdata
function.
The acc_col
argument in the df2protdata
function indicates the column index or column name of the accessions (i.e. the protein (group) identifiers). The quant_cols
argument is a vector of indices pointing to all columns containing peptide intensities. In the example below, the column named “Proteins” will be added to the accession
slot while all columns with indices in the quant_cols
vector will be added as a matrix to the intensities
slot. quant_name
indicates the name that will be given to the column containing the (preprocessed) intensities and run_name
indicates the name that will be given to the column containing the mass spec run names in the new protdata
object. Like before, we keep “GI.number” and “Protein.names” as annotations.
#Select columns with intensities.
quant_cols <- which(grepl("Intensity.", colnames(pepFrancDf2)))
quant_cols
## [1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
#Call df2protdata function
proteinsFrancDf <- df2protdata(pepFrancDf2, acc_col="Proteins", quant_cols=quant_cols, quant_name="quant_value", run_name="run", annotations=c("GI.number","Protein.names"))
#Inspect proteinsFrancDf protdata object.
proteinsFrancDf
## @accession
##
## [1] "WP_003038655"
##
## @annotation
##
## GI.number
## "gi|118497196"
## Protein.names
## "hypothetical protein [Francisella tularensis]"
##
## @data
##
## quant_value Sequence genotype biorep run
## 2 25.11978 AVVVFFDYGCGK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 3 25.34586 DLLSDPATPTVGPQDANK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 4 24.11491 ETNGELTVQDVDNVVK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 5 26.50674 IIAIPDIVK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 6 24.69046 QTNVITDNYQQAAK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 9 24.75196 AVVVFFDYGCGK 1WT b_1 Intensity.1WT_20_2h_n3_2
##
## 96 more rows...
##
## @accession
##
## [1] "WP_003041237"
##
## @annotation
##
## GI.number
## "gi|118498194"
## Protein.names
## "phosphoglucosamine mutase [Francisella tularensis]"
##
## @data
##
## quant_value Sequence genotype biorep run
## 2 25.81732 ADLGISLDGDADR 1WT b_1 Intensity.1WT_20_2h_n3_1
## 3 27.36716 FVIVGQDTR 1WT b_1 Intensity.1WT_20_2h_n3_1
## 4 24.91571 GEVANSTITAEFTQK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 5 27.38539 ILANAIDEYIESIHSR 1WT b_1 Intensity.1WT_20_2h_n3_1
## 7 27.93398 SLATNEAEYLVEK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 8 26.59163 VASDVNDVEK 1WT b_1 Intensity.1WT_20_2h_n3_1
##
## 187 more rows...
##
## @accession
##
## [1] "WP_003038915"
##
## @annotation
##
## GI.number
## "gi|118497331"
## Protein.names
## "glycine--tRNA ligase subunit beta [Francisella tularensis]"
##
## @data
##
## quant_value Sequence genotype biorep
## 1 24.37301 AAANEYELALAYSIEEVAPDLHK 1WT b_1
## 4 22.60222 EEGIAVDIFEAINNTNYDSIK 1WT b_1
## 5 25.34002 EGEPTQVGLGFAK 1WT b_1
## 6 25.13163 EYSYALELLTCLDK 1WT b_1
## 7 24.09977 FHHPQAIVIDNISDYIK 1WT b_1
## 9 25.75712 IIDITLESYK 1WT b_1
## run
## 1 Intensity.1WT_20_2h_n3_1
## 4 Intensity.1WT_20_2h_n3_1
## 5 Intensity.1WT_20_2h_n3_1
## 6 Intensity.1WT_20_2h_n3_1
## 7 Intensity.1WT_20_2h_n3_1
## 9 Intensity.1WT_20_2h_n3_1
##
## 215 more rows...
##
## @accession
##
## [1] "WP_003038264"
##
## @annotation
##
## GI.number
## "gi|118496879"
## Protein.names
## "molecular chaperone HtpG [Francisella tularensis]"
##
## @data
##
## quant_value Sequence genotype biorep run
## 1 27.92076 AAANNPQLEAFK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 3 26.48741 EGQDTIYYITSDSYK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 5 27.33299 EHLDLLEYHVLK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 6 26.37656 EILQHNK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 7 28.11488 ELLPPYLR 1WT b_1 Intensity.1WT_20_2h_n3_1
## 8 28.77850 ELVSNSSDAIEK 1WT b_1 Intensity.1WT_20_2h_n3_1
##
## 338 more rows...
##
## @accession
##
## [1] "WP_003035781"
##
## @annotation
##
## GI.number
## "gi|118497152"
## Protein.names
## "delta-aminolevulinic acid dehydratase [Francisella tularensis]"
##
## @data
##
## quant_value Sequence genotype biorep run
## 1 27.47327 AAASAGLVDEK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 2 24.05872 VILTYTALDVANWLK 1WT b_1 Intensity.1WT_20_2h_n3_1
## 5 23.88890 VILTYTALDVANWLK 1WT b_1 Intensity.1WT_20_2h_n3_2
## 6 25.37749 YASAYYGPFR 1WT b_1 Intensity.1WT_20_2h_n3_2
## 7 27.47269 AAASAGLVDEK 1WT b_1 Intensity.1WT_20_2h_n3_3
## 8 24.93174 VILTYTALDVANWLK 1WT b_1 Intensity.1WT_20_2h_n3_3
##
## 41 more rows...
##
## @accession
##
## [1] "WP_003039212"
##
## @annotation
##
## GI.number
## "gi|118497492"
## Protein.names
## "phosphorylase [Francisella tularensis]"
##
## @data
##
## quant_value Sequence genotype biorep
## 3 22.96510 FPAGLNDVEFVAEHIFQHSK 1WT b_1
## 4 25.10257 LQDLLTFIGK 1WT b_1
## 7 24.55598 VVFSVDYR 1WT b_1
## 8 24.53369 AAASLDLYSYPK 1WT b_1
## 9 24.68029 EVLFLNR 1WT b_1
## 10 23.16038 FPAGLNDVEFVAEHIFQHSK 1WT b_1
## run
## 3 Intensity.1WT_20_2h_n3_1
## 4 Intensity.1WT_20_2h_n3_1
## 7 Intensity.1WT_20_2h_n3_1
## 8 Intensity.1WT_20_2h_n3_2
## 9 Intensity.1WT_20_2h_n3_2
## 10 Intensity.1WT_20_2h_n3_2
##
## 56 more rows...
##
## 1182 more elements...
Note that the proteinsFrancDf
object is the same as the proteinsFranc
object, only with a different order of columns in their data
slot.
library(MSqRob)
library(Biobase)
This example involves the CPTAC Study 6. In this dataset Sigma’s UPS1 standard containing 48 human proteins was spiked into a reference yeast proteome and analyzed on 7 instruments in 5 different laboratories (Paulovich et al., 2010). Raw data files can be downloaded at: https://cptac-data-portal.georgetown.edu/cptac/dataPublic/list?currentPath=%2FPhase_I_Data%2FStudy6&nonav=true
Here, we limited ourselves to the data of LTQ-Orbitrap at site 86, LTQ-Orbitrap O at site 65 and LTQ-Orbitrap W at site 56. The data was searched with MaxQuant version 1.5.2.8. Detailed search settings can be found in Goeminne et al. (2016).
First, specify the location of the peptides.txt MaxQuant output file. By default, MaxQuant creates a “combined” folder in the same folder as the raw files are (or if the raw files are in different directories, in the directory of the first raw file). The peptides.txt file can then be found in “path_to_raw_files/combined/txt/peptides.txt”. Here, we make use of the peptides.txt file that is delivered with the package.
file_peptides_txt <- system.file("extdata/CPTAC", "peptides.txt", package = "MSqRobData")
Import MaxQuant’s peptides.txt file and convert it to an MSnSet
object (see package MSnbase
).
peptidesCPTAC <- import2MSnSet(file_peptides_txt, filetype="MaxQuant", remove_pattern=TRUE)
The “pattern” argument indicates which columns in the peptides.txt file contain peptide intensities. Currently, all columns containing "Intensity\ "
in their column names will be treated as peptide intensities (default for MaxQuant). remove_pattern=TRUE
(default) indicates that the name of the pattern is removed from the column names.
Extract the names of the mass spec runs via the sampleNames
function from the Biobase
package.
runs <- sampleNames(peptidesCPTAC)
Note that as the names again start with a number, an “X” is prepended to make the names syntactically valid. Add a grouping factor for the spike-in condition (i.e. 6A, 6B, 6C, 6D and 6E).
condition <- factor(substr(runs, 2,3))
Add a grouping factor for the lab effect (i.e. LTQ-Orbitrap at site 86, LTQ-Orbitrap O at site 65 and LTQ-Orbitrap W at site 56).
lab <- factor(rep(rep(1:3,each=3),5))
#Alternative:
#lab <- factor((as.numeric(substr(runs, 5,5))-1)%/%3+1)
Create an experimental annotation data frame.
exp_annotation <- data.frame(run=runs, condition=condition, lab=lab)
exp_annotation
## run condition lab
## 1 X6A_1 6A 1
## 2 X6A_2 6A 1
## 3 X6A_3 6A 1
## 4 X6A_4 6A 2
## 5 X6A_5 6A 2
## 6 X6A_6 6A 2
## 7 X6A_7 6A 3
## 8 X6A_8 6A 3
## 9 X6A_9 6A 3
## 10 X6B_1 6B 1
## 11 X6B_2 6B 1
## 12 X6B_3 6B 1
## 13 X6B_4 6B 2
## 14 X6B_5 6B 2
## 15 X6B_6 6B 2
## 16 X6B_7 6B 3
## 17 X6B_8 6B 3
## 18 X6B_9 6B 3
## 19 X6C_1 6C 1
## 20 X6C_2 6C 1
## 21 X6C_3 6C 1
## 22 X6C_4 6C 2
## 23 X6C_5 6C 2
## 24 X6C_6 6C 2
## 25 X6C_7 6C 3
## 26 X6C_8 6C 3
## 27 X6C_9 6C 3
## 28 X6D_1 6D 1
## 29 X6D_2 6D 1
## 30 X6D_3 6D 1
## 31 X6D_4 6D 2
## 32 X6D_5 6D 2
## 33 X6D_6 6D 2
## 34 X6D_7 6D 3
## 35 X6D_8 6D 3
## 36 X6D_9 6D 3
## 37 X6E_1 6E 1
## 38 X6E_2 6E 1
## 39 X6E_3 6E 1
## 40 X6E_4 6E 2
## 41 X6E_5 6E 2
## 42 X6E_6 6E 2
## 43 X6E_7 6E 3
## 44 X6E_8 6E 3
## 45 X6E_9 6E 3
Remember that the exp_annotation
can also be given as an Excel file or a tab delimited file!
If we want to remove proteins that are only identified by modified peptides from the dataset, we also need the proteinGroups.txt file. This file can be found in the same location as the peptides.txt file (“path_to_raw_files/combined/txt/proteinGroups.txt”). Here, we make use of the proteinGroups.txt file that is delivered with the package.
Note: if you don’t want to remove proteins that are only identified by modified peptides, set remove_only_site=FALSE
and leave the file_proteinGroups
argument at its default value (NULL
).
file_proteinGroups <- system.file("extdata/CPTAC", "proteinGroups.txt", package = "MSqRobData")
Important: in the CPTAC dataset, some human UPS proteins are NOT contaminants as these proteins were spiked in on purpose! We only want to remove those contaminants that are not human UPS proteins. We thus need to unmark these proteins as contaminant before preprocessing.
fData(peptidesCPTAC)[grepl("_HUMAN_UPS", fData(peptidesCPTAC)$Proteins),]$Potential.contaminant <- ""
Add to useful_properties
all column names of the fData
slot of the peptidesCPTAC MSnSet
object that you would like to keep (such as gene names, protein names, ontologies, etc.). Other columns will be removed for improved efficiency and memory usage. We certainly want to keep the “Proteins” slot, as this will be our protein (group) identifier. You also want to keep the “Sequence” slot, as peptide-specific effects must be included in our model.
peptidesCPTAC2 <- preprocess_MaxQuant(peptidesCPTAC, accession="Proteins", exp_annotation=exp_annotation, logtransform=TRUE, base=2, normalisation="quantiles", smallestUniqueGroups=TRUE, useful_properties=c("Proteins","Sequence"), filter=c("Potential.contaminant","Reverse"), remove_only_site=TRUE, file_proteinGroups=file_proteinGroups, filter_symbol="+", minIdentified=2)
The following preprocessing steps are executed in this order:
Peptide intensities are log2-transformed (logtransform=TRUE, base=2
).
As a normalization approach, we choose to quantile normalize the peptide intensities (normalisation="quantiles"
). Other options are "sum"
, "max"
, "center.mean"
, "center.median"
, "quantiles.robust"
, "vsn"
or "none"
(see MSnbase::normalize
for more information).
Handling overlapping protein groups (smallestUniqueGroups=TRUE
): in our approach a peptide can map to multiple proteins, as long as there is none of these proteins present in a smaller subgroup. Peptides that map to protein groups for wich also subgroups are present in the dataset are removed from the dataset.
Remove reverse sequences and contaminants (filter=c("Potential.contaminant","Reverse")
).
Remove all proteins that are only identified by peptides carrying a modification site (remove_only_site=TRUE
, file_proteinGroups=file_proteinGroups
).
Remove all superfluous columns in the fData slot, except the “Proteins” and “Sequence” columns (useful_properties=c("Proteins","Sequence")
).
Remove all peptides that are only identified once in the dataset (minIdentified=2
).
Add experimental annotation.
It is possible to do all the preprocessing steps yourself. That way, you can do more advanced preprocessing or change the order of certain preprocessing steps if you like to.
peptidesCPTAC3 <- log(peptidesCPTAC, base=2)
#Alternative:
#peptidesCPTAC3 <- log2(peptidesCPTAC3)
After log2-transformation, we want to replace -Inf
values in the peptide intensities by NA
values as these values originate from zeros in the untransformed intensities
matrix, which means these intensities were actually missing.
exprs <- exprs(peptidesCPTAC3)
exprs[is.infinite(exprs)] <- NA
exprs(peptidesCPTAC3) <- exprs
peptidesCPTAC3 <- normalize(peptidesCPTAC3, "quantiles")
groups2 <- smallestUniqueGroups(fData(peptidesCPTAC3)$Proteins)
sel <- fData(peptidesCPTAC3)$Proteins %in% groups2
peptidesCPTAC3 <- peptidesCPTAC3[sel]
head(fData(peptidesCPTAC3))
## Sequence N.term.cleavage.window C.term.cleavage.window
## 1 AAAAGAGGAGDSGDAVTK EHQHDEQKAAAAGAGG DSGDAVTKIGSEDVKL
## 2 AAAALAGGK QQLSKAAKAAAALAGG AAALAGGKKSKKKWSK
## 3 AAAALAGGKK QQLSKAAKAAAALAGG AALAGGKKSKKKWSKK
## 4 AAADALSDLEIK MPKETPSKAAADALSD ALSDLEIKDSKSNLNK
## 5 AAADALSDLEIKDSK MPKETPSKAAADALSD DLEIKDSKSNLNKELE
## 6 AAAEEFQR RERKEREKAAAEEFQR AAAEEFQRQQELLRQQ
## Amino.acid.before First.amino.acid Second.amino.acid
## 1 K A A
## 2 K A A
## 3 K A A
## 4 K A A
## 5 K A A
## 6 K A A
## Second.last.amino.acid Last.amino.acid Amino.acid.after A.Count R.Count
## 1 T K I 7 0
## 2 G K K 5 0
## 3 K K S 5 0
## 4 I K D 4 0
## 5 S K S 4 0
## 6 Q R Q 3 1
## N.Count D.Count C.Count Q.Count E.Count G.Count H.Count I.Count L.Count
## 1 0 2 0 0 0 5 0 0 0
## 2 0 0 0 0 0 2 0 0 1
## 3 0 0 0 0 0 2 0 0 1
## 4 0 2 0 0 1 0 0 1 2
## 5 0 3 0 0 1 0 0 1 2
## 6 0 0 0 1 2 0 0 0 0
## K.Count M.Count F.Count P.Count S.Count T.Count W.Count Y.Count V.Count
## 1 1 0 0 0 1 1 0 0 1
## 2 1 0 0 0 0 0 0 0 0
## 3 2 0 0 0 0 0 0 0 0
## 4 1 0 0 0 1 0 0 0 0
## 5 2 0 0 0 2 0 0 0 0
## 6 0 0 1 0 0 0 0 0 0
## U.Count Length Missed.cleavages Mass
## 1 0 18 0 1445.6746
## 2 0 9 0 728.4181
## 3 0 10 1 856.5131
## 4 0 12 0 1215.6347
## 5 0 15 1 1545.7886
## 6 0 8 0 920.4352
## Proteins Leading.razor.protein
## 1 sp|P38915|SPT8_YEAST sp|P38915|SPT8_YEAST
## 2 sp|Q3E792|RS25A_YEAST;sp|P0C0T4|RS25B_YEAST sp|Q3E792|RS25A_YEAST
## 3 sp|Q3E792|RS25A_YEAST;sp|P0C0T4|RS25B_YEAST sp|Q3E792|RS25A_YEAST
## 4 sp|P09938|RIR2_YEAST sp|P09938|RIR2_YEAST
## 5 sp|P09938|RIR2_YEAST sp|P09938|RIR2_YEAST
## 6 sp|P53075|SHE10_YEAST sp|P53075|SHE10_YEAST
## Start.position End.position Unique..Groups. Unique..Proteins. Charges
## 1 97 114 yes yes 2
## 2 13 21 yes no 2
## 3 13 22 yes no 2
## 4 9 20 yes yes 2
## 5 9 23 yes yes 3
## 6 538 545 yes yes 2
## PEP Score Identification.type.6A_1 Identification.type.6A_2
## 1 1.1843e-05 82.942 By matching By MS/MS
## 2 7.4562e-06 134.810 By matching By matching
## 3 3.3094e-09 143.730 By matching By matching
## 4 9.1593e-23 182.230 By MS/MS By MS/MS
## 5 1.5319e-04 73.927 By matching By matching
## 6 1.7588e-02 79.474 By matching By matching
## Identification.type.6A_3 Identification.type.6A_4
## 1 By matching By MS/MS
## 2 By matching By MS/MS
## 3 By matching By MS/MS
## 4 By matching By MS/MS
## 5 By matching By MS/MS
## 6 By matching By matching
## Identification.type.6A_5 Identification.type.6A_6
## 1 By matching By matching
## 2 By matching By matching
## 3 By matching By matching
## 4 By MS/MS By MS/MS
## 5 By MS/MS By MS/MS
## 6 By matching By matching
## Identification.type.6A_7 Identification.type.6A_8
## 1 By MS/MS By MS/MS
## 2 By MS/MS By MS/MS
## 3 By MS/MS By MS/MS
## 4 By MS/MS By matching
## 5 By MS/MS By MS/MS
## 6 By matching By matching
## Identification.type.6A_9 Identification.type.6B_1
## 1 By MS/MS By matching
## 2 By MS/MS By MS/MS
## 3 By MS/MS By matching
## 4 By MS/MS By MS/MS
## 5 By MS/MS By matching
## 6 By matching By MS/MS
## Identification.type.6B_2 Identification.type.6B_3
## 1 By matching By matching
## 2 By matching By matching
## 3 By MS/MS By MS/MS
## 4 By MS/MS By matching
## 5 By matching By matching
## 6 By matching By matching
## Identification.type.6B_4 Identification.type.6B_5
## 1 By matching By matching
## 2 By matching By matching
## 3 By matching By matching
## 4 By MS/MS By MS/MS
## 5 By MS/MS By MS/MS
## 6 By MS/MS By matching
## Identification.type.6B_6 Identification.type.6B_7
## 1 By matching By matching
## 2 By matching By MS/MS
## 3 By matching By MS/MS
## 4 By MS/MS By MS/MS
## 5 By MS/MS By MS/MS
## 6 By matching By matching
## Identification.type.6B_8 Identification.type.6B_9
## 1 By MS/MS By MS/MS
## 2 By MS/MS By MS/MS
## 3 By MS/MS By MS/MS
## 4 By matching By matching
## 5 By MS/MS By MS/MS
## 6 By matching By matching
## Identification.type.6C_1 Identification.type.6C_2
## 1 By matching By matching
## 2 By matching By MS/MS
## 3 By matching By MS/MS
## 4 By MS/MS By matching
## 5 By matching By matching
## 6 By matching By matching
## Identification.type.6C_3 Identification.type.6C_4
## 1 By matching By matching
## 2 By matching By MS/MS
## 3 By matching By MS/MS
## 4 By MS/MS By MS/MS
## 5 By matching By MS/MS
## 6 By matching By matching
## Identification.type.6C_5 Identification.type.6C_6
## 1 By MS/MS By matching
## 2 By matching By matching
## 3 By matching By matching
## 4 By MS/MS By MS/MS
## 5 By MS/MS By MS/MS
## 6 By matching By matching
## Identification.type.6C_7 Identification.type.6C_8
## 1 By MS/MS By matching
## 2 By MS/MS By MS/MS
## 3 By MS/MS By MS/MS
## 4 By matching By MS/MS
## 5 By MS/MS By MS/MS
## 6 By matching By matching
## Identification.type.6C_9 Identification.type.6D_1
## 1 By matching By matching
## 2 By MS/MS By matching
## 3 By MS/MS By matching
## 4 By MS/MS By MS/MS
## 5 By MS/MS By MS/MS
## 6 By matching By matching
## Identification.type.6D_2 Identification.type.6D_3
## 1 By matching By matching
## 2 By matching By matching
## 3 By matching By matching
## 4 By matching By matching
## 5 By MS/MS By matching
## 6 By matching By matching
## Identification.type.6D_4 Identification.type.6D_5
## 1 By matching By matching
## 2 By matching By matching
## 3 By MS/MS By matching
## 4 By MS/MS By MS/MS
## 5 By MS/MS By MS/MS
## 6 By matching By matching
## Identification.type.6D_6 Identification.type.6D_7
## 1 By MS/MS By matching
## 2 By matching By MS/MS
## 3 By matching By MS/MS
## 4 By MS/MS By matching
## 5 By matching By MS/MS
## 6 By matching By matching
## Identification.type.6D_8 Identification.type.6D_9
## 1 By matching By matching
## 2 By MS/MS By MS/MS
## 3 By MS/MS By MS/MS
## 4 By MS/MS By MS/MS
## 5 By MS/MS By MS/MS
## 6 By matching By matching
## Identification.type.6E_1 Identification.type.6E_2
## 1 By matching By matching
## 2 By matching By matching
## 3 By matching By matching
## 4 By MS/MS By MS/MS
## 5 By MS/MS By MS/MS
## 6 By matching By matching
## Identification.type.6E_3 Identification.type.6E_4
## 1 By matching By matching
## 2 By matching By MS/MS
## 3 By matching By matching
## 4 By matching By MS/MS
## 5 By MS/MS By matching
## 6 By matching By matching
## Identification.type.6E_5 Identification.type.6E_6
## 1 By matching By matching
## 2 By matching By matching
## 3 By matching By matching
## 4 By MS/MS By matching
## 5 By MS/MS By MS/MS
## 6 By matching By matching
## Identification.type.6E_7 Identification.type.6E_8
## 1 By matching By matching
## 2 By MS/MS By MS/MS
## 3 By MS/MS By MS/MS
## 4 By MS/MS By MS/MS
## 5 By matching By MS/MS
## 6 By matching By matching
## Identification.type.6E_9 Experiment.6A_1 Experiment.6A_2 Experiment.6A_3
## 1 By matching NA 1 NA
## 2 By MS/MS NA 1 2
## 3 By MS/MS NA 1 NA
## 4 By MS/MS 1 1 1
## 5 By MS/MS 1 1 NA
## 6 By matching NA NA NA
## Experiment.6A_4 Experiment.6A_5 Experiment.6A_6 Experiment.6A_7
## 1 1 1 1 1
## 2 1 1 1 2
## 3 1 NA 1 1
## 4 1 1 1 1
## 5 1 1 1 1
## 6 NA 1 NA NA
## Experiment.6A_8 Experiment.6A_9 Experiment.6B_1 Experiment.6B_2
## 1 1 1 NA NA
## 2 1 1 1 1
## 3 1 1 NA 1
## 4 1 1 1 1
## 5 1 1 NA 1
## 6 NA 1 1 NA
## Experiment.6B_3 Experiment.6B_4 Experiment.6B_5 Experiment.6B_6
## 1 NA NA 1 1
## 2 1 2 1 NA
## 3 1 1 NA NA
## 4 1 1 1 1
## 5 NA 1 1 1
## 6 NA 1 NA NA
## Experiment.6B_7 Experiment.6B_8 Experiment.6B_9 Experiment.6C_1
## 1 NA 1 1 NA
## 2 2 1 2 NA
## 3 1 1 1 NA
## 4 1 1 1 1
## 5 1 1 1 1
## 6 NA NA 1 NA
## Experiment.6C_2 Experiment.6C_3 Experiment.6C_4 Experiment.6C_5
## 1 NA NA 1 1
## 2 1 2 2 NA
## 3 1 NA 1 NA
## 4 1 1 1 1
## 5 1 1 1 1
## 6 NA NA 1 NA
## Experiment.6C_6 Experiment.6C_7 Experiment.6C_8 Experiment.6C_9
## 1 1 1 1 1
## 2 NA 2 1 1
## 3 NA 1 1 1
## 4 1 1 1 1
## 5 1 1 1 1
## 6 1 NA NA NA
## Experiment.6D_1 Experiment.6D_2 Experiment.6D_3 Experiment.6D_4
## 1 NA NA NA 1
## 2 NA 1 1 1
## 3 NA NA NA 1
## 4 1 1 1 1
## 5 1 1 1 1
## 6 NA NA NA 1
## Experiment.6D_5 Experiment.6D_6 Experiment.6D_7 Experiment.6D_8
## 1 1 1 1 NA
## 2 1 NA 2 1
## 3 NA NA 1 1
## 4 1 1 1 1
## 5 1 1 1 1
## 6 NA 1 NA NA
## Experiment.6D_9 Experiment.6E_1 Experiment.6E_2 Experiment.6E_3
## 1 NA NA 1 NA
## 2 2 NA 1 2
## 3 1 NA NA NA
## 4 1 1 1 1
## 5 1 1 1 1
## 6 1 NA NA NA
## Experiment.6E_4 Experiment.6E_5 Experiment.6E_6 Experiment.6E_7
## 1 NA 1 1 NA
## 2 2 1 NA 2
## 3 1 NA NA 1
## 4 1 1 1 1
## 5 1 1 1 NA
## 6 NA NA NA NA
## Experiment.6E_8 Experiment.6E_9 Intensity Reverse Potential.contaminant
## 1 NA NA 1190800
## 2 2 1 280990000
## 3 1 1 33360000
## 4 1 1 54622000
## 5 1 1 18910000
## 6 NA NA 1158600
## id Protein.group.IDs Mod..peptide.IDs
## 1 0 859 0
## 2 1 230 1
## 3 2 230 2
## 4 3 229 3
## 5 4 229 4
## 6 5 1143 5
## Evidence.IDs
## 1 0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23
## 2 24;25;26;27;28;29;30;31;32;33;34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73
## 3 74;75;76;77;78;79;80;81;82;83;84;85;86;87;88;89;90;91;92;93;94;95;96;97;98
## 4 99;100;101;102;103;104;105;106;107;108;109;110;111;112;113;114;115;116;117;118;119;120;121;122;123;124;125;126;127;128;129;130;131;132;133;134;135;136;137;138;139;140;141;142;143
## 5 144;145;146;147;148;149;150;151;152;153;154;155;156;157;158;159;160;161;162;163;164;165;166;167;168;169;170;171;172;173;174;175;176;177;178;179;180;181;182;183;184
## 6 185;186;187;188;189;190;191;192;193;194
## MS.MS.IDs
## 1 0;1;2;3;4;5;6;7;8;9
## 2 10;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25;26;27;28;29
## 3 30;31;32;33;34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50
## 4 51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76;77;78;79;80;81;82;83;84
## 5 85;86;87;88;89;90;91;92;93;94;95;96;97;98;99;100;101;102;103;104;105;106;107;108;109;110;111;112;113;114;115;116
## 6 117;118
## Best.MS.MS Oxidation..M..site.IDs MS.MS.Count
## 1 0 10
## 2 21 18
## 3 31 21
## 4 72 29
## 5 94 32
## 6 117 2
#Remove reverse sequences
sel <- fData(peptidesCPTAC3)$Reverse != "+"
peptidesCPTAC3 <- peptidesCPTAC3[sel]
head(exprs(peptidesCPTAC3))
## X6A_1 X6A_2 X6A_3 X6A_4 X6A_5 X6A_6 X6A_7 X6A_8
## 1 NA 15.89625 NA 15.74427 16.11690 16.26360 NA NA
## 2 NA 23.33130 23.66245 23.91300 15.60841 17.66008 21.89950 20.94838
## 3 NA 21.62554 NA 22.05381 NA 15.02926 20.68958 20.00483
## 4 19.64640 20.09650 20.04574 20.32064 20.75817 20.45584 19.63882 20.01568
## 5 15.85773 15.59100 NA 19.33941 19.55917 19.62396 18.94303 19.31408
## 6 NA NA NA NA 17.42813 NA NA NA
## X6A_9 X6B_1 X6B_2 X6B_3 X6B_4 X6B_5 X6B_6 X6B_7
## 1 16.40301 NA NA NA NA 15.60504 16.04583 NA
## 2 20.96813 23.43180 23.26411 23.64645 24.20048 14.91464 NA 22.20164
## 3 19.81333 NA 21.71777 19.37231 16.85073 NA NA 20.36880
## 4 19.98197 19.62953 19.95218 20.13390 20.04605 20.30895 20.39173 19.42857
## 5 18.96642 NA 16.09153 NA 19.20895 19.09520 19.49627 19.01121
## 6 16.00602 NA NA NA 17.54346 NA NA NA
## X6B_8 X6B_9 X6C_1 X6C_2 X6C_3 X6C_4 X6C_5 X6C_6
## 1 15.53808 NA NA NA NA 16.16971 16.09331 15.76673
## 2 20.62686 21.25133 NA 23.52247 24.08292 23.82946 NA NA
## 3 20.02875 19.62006 NA 21.60703 NA 20.82764 NA NA
## 4 19.97507 20.02002 20.13560 19.58226 20.02870 19.84447 20.53751 20.65843
## 5 19.23466 19.15420 16.65312 16.28174 15.45126 18.94057 19.12564 19.46605
## 6 NA 16.52722 NA NA NA 17.69237 NA 17.58594
## X6C_7 X6C_8 X6C_9 X6D_1 X6D_2 X6D_3 X6D_4 X6D_5
## 1 NA 15.68773 15.35170 NA NA NA 15.96516 16.06759
## 2 22.30765 20.49290 21.00310 NA 23.52274 23.29155 24.05100 16.88207
## 3 20.33127 19.95157 19.64324 NA NA NA 22.07140 NA
## 4 19.49678 20.41416 20.01655 20.46055 20.63712 20.54149 20.12157 20.54021
## 5 18.68434 19.24157 19.08266 18.35102 18.22513 18.08876 19.04524 19.98076
## 6 NA NA NA NA NA NA 18.17092 NA
## X6D_6 X6D_7 X6D_8 X6D_9 X6E_1 X6E_2 X6E_3 X6E_4
## 1 16.38325 16.60706 NA NA NA 15.48811 NA NA
## 2 NA 22.09701 20.51658 21.57285 NA 22.82601 22.99603 23.72723
## 3 NA 20.36570 19.82115 19.79300 NA NA NA 15.44656
## 4 20.53321 19.90789 19.95061 19.64517 20.26774 19.74733 19.81515 19.80937
## 5 19.65537 18.40360 19.38517 18.73883 18.10882 17.61519 18.09828 18.94790
## 6 15.59841 NA NA 16.25820 NA NA NA NA
## X6E_5 X6E_6 X6E_7 X6E_8 X6E_9
## 1 15.48788 15.44796 NA NA NA
## 2 13.78541 NA 21.64825 19.69567 20.88645
## 3 NA NA 20.17699 18.78196 19.33404
## 4 19.62211 19.86977 19.80026 19.96206 19.76630
## 5 18.87684 19.06988 NA 18.53583 18.69385
## 6 NA NA NA NA NA
#Remove contaminants
sel <- fData(peptidesCPTAC3)$Potential.contaminant != "+"
peptidesCPTAC3 <- peptidesCPTAC3[sel]
head(exprs(peptidesCPTAC3))
## X6A_1 X6A_2 X6A_3 X6A_4 X6A_5 X6A_6 X6A_7 X6A_8
## 1 NA 15.89625 NA 15.74427 16.11690 16.26360 NA NA
## 2 NA 23.33130 23.66245 23.91300 15.60841 17.66008 21.89950 20.94838
## 3 NA 21.62554 NA 22.05381 NA 15.02926 20.68958 20.00483
## 4 19.64640 20.09650 20.04574 20.32064 20.75817 20.45584 19.63882 20.01568
## 5 15.85773 15.59100 NA 19.33941 19.55917 19.62396 18.94303 19.31408
## 6 NA NA NA NA 17.42813 NA NA NA
## X6A_9 X6B_1 X6B_2 X6B_3 X6B_4 X6B_5 X6B_6 X6B_7
## 1 16.40301 NA NA NA NA 15.60504 16.04583 NA
## 2 20.96813 23.43180 23.26411 23.64645 24.20048 14.91464 NA 22.20164
## 3 19.81333 NA 21.71777 19.37231 16.85073 NA NA 20.36880
## 4 19.98197 19.62953 19.95218 20.13390 20.04605 20.30895 20.39173 19.42857
## 5 18.96642 NA 16.09153 NA 19.20895 19.09520 19.49627 19.01121
## 6 16.00602 NA NA NA 17.54346 NA NA NA
## X6B_8 X6B_9 X6C_1 X6C_2 X6C_3 X6C_4 X6C_5 X6C_6
## 1 15.53808 NA NA NA NA 16.16971 16.09331 15.76673
## 2 20.62686 21.25133 NA 23.52247 24.08292 23.82946 NA NA
## 3 20.02875 19.62006 NA 21.60703 NA 20.82764 NA NA
## 4 19.97507 20.02002 20.13560 19.58226 20.02870 19.84447 20.53751 20.65843
## 5 19.23466 19.15420 16.65312 16.28174 15.45126 18.94057 19.12564 19.46605
## 6 NA 16.52722 NA NA NA 17.69237 NA 17.58594
## X6C_7 X6C_8 X6C_9 X6D_1 X6D_2 X6D_3 X6D_4 X6D_5
## 1 NA 15.68773 15.35170 NA NA NA 15.96516 16.06759
## 2 22.30765 20.49290 21.00310 NA 23.52274 23.29155 24.05100 16.88207
## 3 20.33127 19.95157 19.64324 NA NA NA 22.07140 NA
## 4 19.49678 20.41416 20.01655 20.46055 20.63712 20.54149 20.12157 20.54021
## 5 18.68434 19.24157 19.08266 18.35102 18.22513 18.08876 19.04524 19.98076
## 6 NA NA NA NA NA NA 18.17092 NA
## X6D_6 X6D_7 X6D_8 X6D_9 X6E_1 X6E_2 X6E_3 X6E_4
## 1 16.38325 16.60706 NA NA NA 15.48811 NA NA
## 2 NA 22.09701 20.51658 21.57285 NA 22.82601 22.99603 23.72723
## 3 NA 20.36570 19.82115 19.79300 NA NA NA 15.44656
## 4 20.53321 19.90789 19.95061 19.64517 20.26774 19.74733 19.81515 19.80937
## 5 19.65537 18.40360 19.38517 18.73883 18.10882 17.61519 18.09828 18.94790
## 6 15.59841 NA NA 16.25820 NA NA NA NA
## X6E_5 X6E_6 X6E_7 X6E_8 X6E_9
## 1 15.48788 15.44796 NA NA NA
## 2 13.78541 NA 21.64825 19.69567 20.88645
## 3 NA NA 20.17699 18.78196 19.33404
## 4 19.62211 19.86977 19.80026 19.96206 19.76630
## 5 18.87684 19.06988 NA 18.53583 18.69385
## 6 NA NA NA NA NA
Small check:
length(unique(fData(peptidesCPTAC3)$Proteins[grepl('_HUMAN_UPS',fData(peptidesCPTAC3)$Proteins)]))
## [1] 46
46 out of a total 48 spiked-in human UPS proteins are present in this dataset with at least one peptide sequence identified.
proteinGroups <- read.table(file_proteinGroups, sep="\t", header=TRUE, quote="", comment.char = "")
only_site <- proteinGroups$Only.identified.by.site
only_site[is.na(only_site)] <- ""
removed_proteins <- proteinGroups$Protein.IDs[only_site=="+"]
sel <- !(as.character(Biobase::fData(peptidesCPTAC3)[,"Proteins"]) %in% as.character(removed_proteins))
peptidesCPTAC3 <- peptidesCPTAC3[sel]
fData(peptidesCPTAC3) <- fData(peptidesCPTAC3)[,c("Proteins","Sequence")] #, "ups"
minIdentified <- 2
keepers <- rowSums(!is.na(exprs(peptidesCPTAC3)))>=minIdentified
peptidesCPTAC3 <- peptidesCPTAC3[keepers]
head(exprs(peptidesCPTAC3))
## X6A_1 X6A_2 X6A_3 X6A_4 X6A_5 X6A_6 X6A_7 X6A_8
## 1 NA 15.89625 NA 15.74427 16.11690 16.26360 NA NA
## 2 NA 23.33130 23.66245 23.91300 15.60841 17.66008 21.89950 20.94838
## 3 NA 21.62554 NA 22.05381 NA 15.02926 20.68958 20.00483
## 4 19.64640 20.09650 20.04574 20.32064 20.75817 20.45584 19.63882 20.01568
## 5 15.85773 15.59100 NA 19.33941 19.55917 19.62396 18.94303 19.31408
## 6 NA NA NA NA 17.42813 NA NA NA
## X6A_9 X6B_1 X6B_2 X6B_3 X6B_4 X6B_5 X6B_6 X6B_7
## 1 16.40301 NA NA NA NA 15.60504 16.04583 NA
## 2 20.96813 23.43180 23.26411 23.64645 24.20048 14.91464 NA 22.20164
## 3 19.81333 NA 21.71777 19.37231 16.85073 NA NA 20.36880
## 4 19.98197 19.62953 19.95218 20.13390 20.04605 20.30895 20.39173 19.42857
## 5 18.96642 NA 16.09153 NA 19.20895 19.09520 19.49627 19.01121
## 6 16.00602 NA NA NA 17.54346 NA NA NA
## X6B_8 X6B_9 X6C_1 X6C_2 X6C_3 X6C_4 X6C_5 X6C_6
## 1 15.53808 NA NA NA NA 16.16971 16.09331 15.76673
## 2 20.62686 21.25133 NA 23.52247 24.08292 23.82946 NA NA
## 3 20.02875 19.62006 NA 21.60703 NA 20.82764 NA NA
## 4 19.97507 20.02002 20.13560 19.58226 20.02870 19.84447 20.53751 20.65843
## 5 19.23466 19.15420 16.65312 16.28174 15.45126 18.94057 19.12564 19.46605
## 6 NA 16.52722 NA NA NA 17.69237 NA 17.58594
## X6C_7 X6C_8 X6C_9 X6D_1 X6D_2 X6D_3 X6D_4 X6D_5
## 1 NA 15.68773 15.35170 NA NA NA 15.96516 16.06759
## 2 22.30765 20.49290 21.00310 NA 23.52274 23.29155 24.05100 16.88207
## 3 20.33127 19.95157 19.64324 NA NA NA 22.07140 NA
## 4 19.49678 20.41416 20.01655 20.46055 20.63712 20.54149 20.12157 20.54021
## 5 18.68434 19.24157 19.08266 18.35102 18.22513 18.08876 19.04524 19.98076
## 6 NA NA NA NA NA NA 18.17092 NA
## X6D_6 X6D_7 X6D_8 X6D_9 X6E_1 X6E_2 X6E_3 X6E_4
## 1 16.38325 16.60706 NA NA NA 15.48811 NA NA
## 2 NA 22.09701 20.51658 21.57285 NA 22.82601 22.99603 23.72723
## 3 NA 20.36570 19.82115 19.79300 NA NA NA 15.44656
## 4 20.53321 19.90789 19.95061 19.64517 20.26774 19.74733 19.81515 19.80937
## 5 19.65537 18.40360 19.38517 18.73883 18.10882 17.61519 18.09828 18.94790
## 6 15.59841 NA NA 16.25820 NA NA NA NA
## X6E_5 X6E_6 X6E_7 X6E_8 X6E_9
## 1 15.48788 15.44796 NA NA NA
## 2 13.78541 NA 21.64825 19.69567 20.88645
## 3 NA NA 20.17699 18.78196 19.33404
## 4 19.62211 19.86977 19.80026 19.96206 19.76630
## 5 18.87684 19.06988 NA 18.53583 18.69385
## 6 NA NA NA NA NA
#Remove unused levels
fData(peptidesCPTAC3) <- droplevels(fData(peptidesCPTAC3))
rownames(exp_annotation) <- exp_annotation[,"run"]
peptidesCPTAC3 <- MSnbase::MSnSet(exprs=exprs(peptidesCPTAC3), fData=fData(peptidesCPTAC3), pData=exp_annotation)
Check whether our manually preprocessed peptidesCPTAC3
is equal to peptidesCPTAC2
.
all(exprs(peptidesCPTAC2)==exprs(peptidesCPTAC3), na.rm=TRUE)
## [1] TRUE
all(fData(peptidesCPTAC2)==fData(peptidesCPTAC3), na.rm=TRUE)
## [1] TRUE
all(pData(peptidesCPTAC2)==pData(peptidesCPTAC3), na.rm=TRUE)
## [1] TRUE
Caution: this step might take some time! (about 5 minutes on our system). The accession element denotes by which column the individual peptide observations should be grouped.
system.time(proteinsCPTAC <- MSnSet2protdata(peptidesCPTAC2, accession="Proteins"))
## user system elapsed
## 9.899 0.300 10.223
When your goal is to analyze the CPTAC data in the recommended way (i.e. a robust ridge model with variance shrinkage and M estimation), your model should be constructed look as follows:
system.time(modelCPTAC_RR <- fit.model(protdata=proteinsCPTAC, response="quant_value", fixed=c("condition","lab"), random=c("Sequence","run"), shrinkage.fixed=NULL, weights="Huber", squeezeVar=TRUE))
## user system elapsed
## 416.557 1.065 418.827
Caution: model fitting takes some time with large datasets! On our computer, fitting the robust ridge model took about 13 - 18 minutes.
Saving results is completely analogous to the Francisella example.
In this section, we fit different types of models to the CPTAC data, analogous to what has been done in Goeminne et al. (2016) in order to compare their performances. We compare an ordinary least squares (OLS) model with and without M estimation (Huber weights) to a ridge model with and without M estimation. As we also want to assess the impact of the empirical Bayes variance estimation, we set squeezeVar=FALSE
so that no variance squeezing is performed yet.
In order to do an objective comparison between different approaches, we chose here not to include the “run” effect, because this leads to massive overfitting in the ordinary least squares approaches. This effect will mostly be equal to zero anyways because each repeat in the CPTAC dataset is in fact a technical repeat and therefore the variability in peptide intensities between samples will be very small. When doing a standard robust ridge analysis on true biological data, we advise however to include the “run” effect as a random effect. As the simultanous shrinkage of fixed effects was not yet implemented at the time, we provide “lab” as a random effect in order to get results that are as close as possible to those of Goeminne et al. (2016).
Fit an OLS model (shrinkage.fixed=c(0,0,0,0)
) without M estimation (weights=NULL
).
system.time(modelCPTAC_Lm <- fit.model(protdata=proteinsCPTAC, response="quant_value", fixed=c("condition","Sequence","lab"), random=NULL, shrinkage.fixed=c(0,0,0,0), weights=NULL, squeezeVar=FALSE))
## user system elapsed
## 3.038 0.058 3.103
Fit an OLS model (shrinkage.fixed=c(0,0,0,0)
) with M estimation (weights=Huber
).
system.time(modelCPTAC_Lm_M <- fit.model(protdata=proteinsCPTAC, response="quant_value", fixed=c("condition","Sequence","lab"), random=NULL, shrinkage.fixed=c(0,0,0,0), weights="Huber", squeezeVar=FALSE))
## user system elapsed
## 43.976 0.863 45.278
Fit a ridge model (shrinkage.fixed=c(0,1)
) without M estimation (weights=NULL
).
system.time(modelCPTAC_Ridge <- fit.model(protdata=proteinsCPTAC, response="quant_value", fixed="condition", random=c("Sequence","lab"), shrinkage.fixed=c(0,1), weights=NULL, squeezeVar=FALSE))
## user system elapsed
## 79.554 0.212 79.982
Fit a ridge model (shrinkage.fixed=c(0,1)
) with M estimation (weights=Huber
).
system.time(modelCPTAC_Ridge_M <- fit.model(protdata=proteinsCPTAC, response="quant_value", fixed="condition", random=c("Sequence","lab"), shrinkage.fixed=c(0,1), weights="Huber", squeezeVar=FALSE))
## user system elapsed
## 307.672 0.765 309.228
As you noticed, an OLS model can be fit by setting shrinkage.fixed=c(0,0,0,0)
. This indicates that neither the intercept and none of our three fixed effects should be shrunken. The ridge model is here specified by setting shrinkage.fixed=c(0,1)
. This indicates that the intercept should not be shrunken, but the fixed effect "condition"
should be shrunken. Alternatively, one could also set shrinkage.fixed=NULL
as the shrinkage of all fixed effects except for the intercept is the default behaviour of MSqRob.
attr(getModels(modelCPTAC_RR[1]),"MSqRob_levels")
## [1] "(Intercept)" "condition6A"
## [3] "condition6B" "condition6C"
## [5] "condition6D" "condition6E"
## [7] "lab1" "lab2"
## [9] "lab3" "runX6A_2"
## [11] "runX6A_4" "runX6A_5"
## [13] "runX6A_6" "runX6A_9"
## [15] "runX6B_5" "runX6B_6"
## [17] "runX6B_8" "runX6C_4"
## [19] "runX6C_5" "runX6C_6"
## [21] "runX6C_8" "runX6C_9"
## [23] "runX6D_4" "runX6D_5"
## [25] "runX6D_6" "runX6D_7"
## [27] "runX6E_2" "runX6E_5"
## [29] "runX6E_6" "SequenceAAAAGAGGAGDSGDAVTK"
MSqRob always tries to fit a model, but some models are overparameterized because too many parameters are fit on too few observations. These models have convergence problems and can be removed from the data prior to estimating p-values. This is only relevant for models that perform shrinkage or use any kind of random effect.
Extract the data as a list from the protdata object proteinsCPTAC
.
data <- getData(proteinsCPTAC)
convergence <- lapply(data, function(x){return(tryCatch(lme4::lFormula(formula("quant_value~1+(1|condition)+(1|lab)+(1|Sequence)"),x,control = lme4::lmerControl()), error=function(e){
return(NA)
}))})
Get the indices of these overparameterized models.
na_indices <- which(is.na(convergence))
See which proteins you removed.
getAccessions(proteinsCPTAC)[na_indices]
## [1] sp|P38915|SPT8_YEAST
## [2] sp|P10622|RLA3_YEAST
## [3] sp|P33421|SDH3_YEAST
## [4] sp|P33328|SNC2_YEAST
## [5] sp|P38555|YPT31_YEAST
## [6] sp|P36139|PET10_YEAST
## [7] sp|Q12517|DCP1_YEAST
## [8] sp|P39516|RS14B_YEAST
## [9] sp|Q12078|SMF3_YEAST
## [10] sp|P38885|AIM46_YEAST
## [11] sp|P40548|SNL1_YEAST
## [12] sp|P48363|PFD3_YEAST
## [13] sp|P14065|GCY1_YEAST
## [14] sp|P46970|NMD5_YEAST
## [15] sp|P23201|SPA2_YEAST
## [16] sp|P05743|RL26A_YEAST
## [17] sp|P0CH09|RL402_YEAST;sp|P0CH08|RL401_YEAST
## [18] sp|P38256|YBU6_YEAST
## [19] sp|Q03880|PKR1_YEAST
## [20] sp|P38747|OTU2_YEAST
## [21] sp|O43137|YB085_YEAST
## [22] sp|P40212|RL13B_YEAST
## [23] sp|Q04177|UTP5_YEAST
## [24] sp|P24031|PPA3_YEAST;sp|P00635|PPA5_YEAST
## [25] sp|P32504|CBF3A_YEAST
## [26] sp|P38145|RISA_YEAST
## [27] sp|P32803|EMP24_YEAST
## [28] sp|Q12040|YO283_YEAST
## [29] sp|Q12099|FAL1_YEAST
## [30] sp|Q04175|SXM1_YEAST
## [31] sp|Q07914|TIM14_YEAST
## [32] sp|P30665|MCM4_YEAST
## [33] sp|P06704|CDC31_YEAST
## [34] sp|P40956|GTS1_YEAST
## [35] sp|P53304|SLI1_YEAST
## [36] sp|Q04409|EMI2_YEAST
## [37] sp|P54858|DOS2_YEAST
## [38] sp|P50106|RPA14_YEAST
## [39] sp|P53064|RTF1_YEAST
## [40] sp|P11746|MCM1_YEAST
## [41] sp|P08456|PSS_YEAST
## [42] sp|Q03525|TMA23_YEAST
## [43] sp|Q12156|AIM7_YEAST
## [44] sp|P47771|ALDH2_YEAST
## [45] sp|P17898|CPT1_YEAST
## [46] sp|P32500|NDC1_YEAST
## [47] sp|P32910|RPC6_YEAST
## [48] sp|P39998|EDC3_YEAST
## [49] sp|Q03529|SCS7_YEAST
## [50] sp|Q06449|PIN3_YEAST
## [51] sp|P25618|CWH43_YEAST
## [52] sp|P53093|YIP4_YEAST
## [53] sp|P52893|ALAM_YEAST
## [54] sp|Q02753|RL21A_YEAST
## [55] sp|Q12672|RL21B_YEAST
## [56] sp|P0CT04|IPB2_YEAST
## [57] sp|P20436|RPAB3_YEAST
## [58] sp|P89105|CTR9_YEAST
## [59] sp|P05744|RL33A_YEAST
## [60] sp|P41056|RL33B_YEAST
## [61] sp|Q07897|CMS1_YEAST
## [62] sp|P48234|NOL10_YEAST
## [63] sp|P17442|PHO81_YEAST
## [64] sp|Q06132|SGD1_YEAST
## [65] sp|Q07451|YET3_YEAST
## [66] sp|Q08925|MRN1_YEAST
## [67] sp|O13516|RS9A_YEAST
## [68] sp|P38063|KPR4_YEAST
## [69] sp|P47019|ALB1_YEAST
## [70] sp|P46654|RSSA2_YEAST
## [71] sp|Q06580|MLC2_YEAST
## [72] sp|P38066|RIB1_YEAST
## [73] sp|P41939|IDHC_YEAST;sp|P53982|IDHH_YEAST
## [74] sp|P47172|YJ9I_YEAST
## [75] sp|P40541|SCC3_YEAST
## [76] sp|P18852|GBG_YEAST
## [77] sp|Q07540|FRDA_YEAST
## [78] sp|A5Z2X5|YP010_YEAST
## [79] sp|Q07915|RLP24_YEAST
## [80] sp|P56628|RL22B_YEAST
## [81] sp|P38697|IMDH2_YEAST
## [82] sp|P52490|UBC13_YEAST
## [83] sp|P53250|TWF1_YEAST
## [84] sp|Q99385|VCX1_YEAST
## [85] sp|P33448|TOM6_YEAST
## [86] sp|Q04773|YMW4_YEAST
## [87] sp|Q12273|YO11B_YEAST
## [88] sp|Q03434|YM12B_YEAST;sp|Q12141|YG11B_YEAST;sp|P47100|YJ12B_YEAST;sp|P47099|YJ12A_YEAST
## [89] sp|P80967|TOM5_YEAST
## [90] sp|P40030|ERG28_YEAST
## [91] sp|Q08746|RRS1_YEAST
## [92] sp|P11076|ARF1_YEAST
## [93] sp|P25642|IMG2_YEAST
## [94] sp|P38759|VPS29_YEAST
## [95] sp|Q04119|PPN1_YEAST
## [96] sp|Q12690|RL13A_YEAST
## [97] sp|P53859|CSL4_YEAST
## [98] sp|P39726|GCSH_YEAST
## [99] sp|Q01590|SED5_YEAST
## [100] sp|P51862|ROM2_YEAST
## [101] sp|P38760|MIP6_YEAST
## [102] sp|P40005|PFD2_YEAST
## [103] sp|Q06497|ANT1_YEAST
## [104] sp|Q3E7Y3|RS22B_YEAST
## [105] sp|P32835|GSP1_YEAST
## [106] sp|P32905|RSSA1_YEAST
## [107] sp|P11632|NHP6A_YEAST
## [108] sp|P11633|NHP6B_YEAST
## [109] sp|P39002|LCF3_YEAST
## [110] sp|P53507|TOM7_YEAST
## [111] sp|Q12284|ERV2_YEAST
## [112] sp|P32770|NRP1_YEAST
## [113] sp|P32857|PTM1_YEAST
## [114] sp|P32477|GSH1_YEAST
## [115] sp|P47138|DPH4_YEAST
## [116] sp|Q3E834|YOS1_YEAST
## [117] sp|Q05123|ARP9_YEAST
## [118] sp|P07275|PUT2_YEAST
## [119] sp|P0C2I9|YP12B_YEAST;sp|Q99337|YN12B_YEAST;sp|Q12501|YO22B_YEAST;sp|Q12337|YG21B_YEAST;sp|Q12113|YO21B_YEAST;sp|Q03494|YD22B_YEAST;sp|P25384|YC21B_YEAST;sp|P0CX64|YG22B_YEAST;sp|P0CX63|YF21B_YEAST;sp|P0C2J3|YL21B_YEAST;sp|Q12491|YB21B_YEAST;sp|Q12472|YD21B_YEAST
## [120] sp|P25627|BUD23_YEAST
## [121] sp|Q06991|PUN1_YEAST
## [122] sp|P38616|YGP1_YEAST
## [123] sp|Q03178|PIR1_YEAST
## [124] sp|Q07746|YD242_YEAST
## [125] sp|Q3E747|YL363_YEAST
## [126] sp|Q12030|TAF10_YEAST
## [127] sp|P40087|DDI1_YEAST
## [128] sp|O94742|SEM1_YEAST
## [129] sp|P28496|LAC1_YEAST
## [130] sp|P36516|RM03_YEAST
## [131] sp|P53121|FLC3_YEAST
## [132] sp|Q12213|RL7B_YEAST
## [133] sp|P05737|RL7A_YEAST
## [134] sp|Q06508|LOA1_YEAST
## [135] sp|P33416|HSP78_YEAST
## [136] sp|P25582|SPB1_YEAST
## [137] sp|P53847|DSL1_YEAST
## [138] sp|Q07505|DLHH_YEAST
## [139] sp|Q12160|YP063_YEAST
## [140] sp|P32349|RPC3_YEAST
## [141] sp|P36135|UTH1_YEAST
## [142] sp|P32478|HS150_YEAST
## [143] sp|Q12255|NYV1_YEAST
## [144] sp|Q03305|RMT2_YEAST
## [145] sp|P05747|RL29_YEAST
## [146] sp|P18851|GBB_YEAST
## [147] sp|P53885|MDG1_YEAST
## [148] sp|P46956|PHO86_YEAST
## [149] P69905ups|HBA_HUMAN_UPS
## [150] sp|P20049|TYR1_YEAST
## [151] sp|Q06543|GPN3_YEAST
## [152] sp|Q3E833|PCC1_YEAST
## [153] sp|P25039|EFGM_YEAST
## [154] sp|P81449|ATPJ_YEAST
## [155] sp|P53890|BNI5_YEAST
## [156] sp|P06100|NOT2_YEAST
## [157] sp|Q03020|ISU1_YEAST
## [158] sp|P25358|ELO2_YEAST
## [159] sp|P17558|RTPT_YEAST
## [160] sp|P38985|SRP14_YEAST
## [161] sp|P34909|NOT4_YEAST
## [162] sp|P13574|STE12_YEAST
## [163] sp|P22214|SEC22_YEAST
## [164] sp|Q08004|BUD20_YEAST
## [165] sp|P53173|ERV14_YEAST
## [166] sp|P40043|RGI1_YEAST
## [167] sp|Q08951|AP3D_YEAST
## [168] sp|P39938|RS26A_YEAST
## [169] P41159ups|LEP_HUMAN_UPS
## [170] sp|P34248|YKK0_YEAST
## [171] sp|P46964|OST2_YEAST
## [172] sp|P32529|RPA12_YEAST
## [173] sp|P39013|END3_YEAST
## [174] sp|P46990|RL17B_YEAST
## [175] sp|P05740|RL17A_YEAST
## [176] sp|P53868|ALG9_YEAST
## [177] sp|P12630|BAR1_YEAST
## [178] sp|P56508|SNA2_YEAST
## 1515 Levels: O00762ups|UBE2C_HUMAN_UPS ... sp|Q99385|VCX1_YEAST
Both UPS proteins in this list are only identified by one peptide.
getData(proteinsCPTAC["P69905ups|HBA_HUMAN_UPS"])
## quant_value Sequence run condition lab
## 13 16.34396 VGAHAGEYGAEALER X6B_4 6B 2
## 20 18.98716 VGAHAGEYGAEALER X6C_2 6C 1
## 22 16.89253 VGAHAGEYGAEALER X6C_4 6C 2
## 23 18.10633 VGAHAGEYGAEALER X6C_5 6C 2
## 24 17.52955 VGAHAGEYGAEALER X6C_6 6C 2
## 31 18.97273 VGAHAGEYGAEALER X6D_4 6D 2
## 32 16.95212 VGAHAGEYGAEALER X6D_5 6D 2
## 33 19.29081 VGAHAGEYGAEALER X6D_6 6D 2
## 35 16.09004 VGAHAGEYGAEALER X6D_8 6D 3
## 37 17.82644 VGAHAGEYGAEALER X6E_1 6E 1
## 40 20.41722 VGAHAGEYGAEALER X6E_4 6E 2
## 41 17.83335 VGAHAGEYGAEALER X6E_5 6E 2
getData(proteinsCPTAC["P41159ups|LEP_HUMAN_UPS"])
## quant_value Sequence run condition lab
## 31 15.32290 VTGLDFIPGLHPILTLSK X6D_4 6D 2
## 32 15.85747 VTGLDFIPGLHPILTLSK X6D_5 6D 2
## 40 19.09555 VTGLDFIPGLHPILTLSK X6E_4 6E 2
## 41 20.80418 VTGLDFIPGLHPILTLSK X6E_5 6E 2
## 42 20.65861 VTGLDFIPGLHPILTLSK X6E_6 6E 2
## 43 16.17743 VTGLDFIPGLHPILTLSK X6E_7 6E 3
## 44 17.42877 VTGLDFIPGLHPILTLSK X6E_8 6E 3
## 45 18.18318 VTGLDFIPGLHPILTLSK X6E_9 6E 3
Remove the overparameterized models from the protLM object modelCPTAC_RR
and from the data.
modelCPTAC_RR_cleaned <- modelCPTAC_RR[-na_indices]
data <- data[-na_indices]
Construct the contrast matrix L for inferring on the fold changes of interest.
L <- makeContrast(contrasts=c("condition6B-condition6A","condition6C-condition6A","condition6D-condition6A","condition6E-condition6A","condition6C-condition6B","condition6D-condition6B","condition6E-condition6B","condition6D-condition6C","condition6E-condition6C","condition6E-condition6D"),
levels=c("condition6A","condition6B","condition6C","condition6D","condition6E"))
Make the DA rankings for the ordinary least squares model (no ridge, no squeezed variances, no M estimation).
#modelCPTAC_Lm <- squeezePars(modelCPTAC_Lm, squeezeVar=FALSE)
lm_normal <- test.protLMcontrast(modelCPTAC_Lm, L)
lm_normal <- prot.p.adjust(lm_normal)
lm_normal <- prot.signif(lm_normal)
Make the DA rankings for the ordinary least squares model with squeezed variances (no ridge, no M estimation).
modelCPTAC_Lm_Sq <- squeezePars(modelCPTAC_Lm, squeezeVar=TRUE)
lm_Sq <- test.protLMcontrast(modelCPTAC_Lm_Sq, L)
lm_Sq <- prot.p.adjust(lm_Sq)
lm_Sq <- prot.signif(lm_Sq)
Make the DA rankings for the ordinary least squares model with squeezed variances and M estimation (no ridge).
modelCPTAC_Lm_M_Sq <- squeezePars(modelCPTAC_Lm_M, squeezeVar=TRUE)
## Warning: 2 very small variances detected, have been offset away from zero
lm_SqM <- test.protLMcontrast(modelCPTAC_Lm_M_Sq, L)
lm_SqM <- prot.p.adjust(lm_SqM)
lm_SqM <- prot.signif(lm_SqM)
Make the DA rankings for the ridge regression model (no squeezed variances, no M estimation).
#modelCPTAC_Ridge <- squeezePars(modelCPTAC_Ridge, squeezeVar=FALSE)
ridge <- test.protLMcontrast(modelCPTAC_Ridge, L)
ridge <- prot.p.adjust(ridge)
ridge <- prot.signif(ridge)
Make the DA rankings for the ridge regression model with squeezed variances (no M estimation).
modelCPTAC_Ridge_Sq <- squeezePars(modelCPTAC_Ridge, squeezeVar=TRUE)
ridgeSq <- test.protLMcontrast(modelCPTAC_Ridge_Sq, L)
ridgeSq <- prot.p.adjust(ridgeSq)
ridgeSq <- prot.signif(ridgeSq)
Make the DA rankings for the ridge regression model with squeezed variances and M estimation. This is our robust ridge (RR) approach (the default MSqRob approach, although we included “lab” here as a random effect and we did not include a “run” effect).
modelCPTAC_Ridge_M_Sq <- squeezePars(modelCPTAC_Ridge_M, squeezeVar=TRUE)
ridgeSqM <- test.protLMcontrast(modelCPTAC_Ridge_M, L)
ridgeSqM <- prot.p.adjust(ridgeSqM)
ridgeSqM <- prot.signif(ridgeSqM)
Our standard approach is to calculate the degrees of freedom via the trace of the Hat matrix. One could as well use custom degrees of freedom, for example in order to work more conservatively. Indeed, if we consider the peptides as pseudo-replicates, our experimental units are the different spike-in conditions in each lab. The number of degrees of freedom is then the number of combinations lab x condition minus the number of parameters estimated for lab (i.e. the number of labs minus 1) minus the number of parameters estimated for condition (i.e. the number of conditions minus 1) minus 1 for the intercept. We show an example for the modelCPTAC_RR
model.
custom_dfs <- sapply(data, function(x){return(length(unique(paste0(x$condition,x$lab)))-(length(unique(x$condition))-1)-(length(unique(x$lab))-1)-1)})
custom_dfs
## [1] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
## [35] 1 8 8 8 8 8 3 7 8 8 8 8 7 8 5 8 8 8 8 4 8 8 8 8 8 8 8 8 8 8 7 8 8 8
## [69] 8 8 8 8 8 4 8 8 8 8 8 8 8 4 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
## [103] 8 8 8 8 8 5 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 7 8 8 8 8 7 8 8 4 8 8 8 8
## [137] 8 8 8 8 8 8 8 8 8 8 8 8 8 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 5 8 8 8
## [171] 8 8 8 8 8 6 8 8 8 8 7 7 8 8 8 8 8 7 8 8 8 4 8 8 8 7 8 8 8 8 8 8 8 8
## [205] 8 8 8 8 8 8 8 2 8 8 8 8 8 8 8 8 8 8 8 8 6 8 8 8 8 4 8 8 8 8 8 8 8 8
## [239] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 4 8 8 8 8 8 7 8 8 8 8 8 8 8 5 8
## [273] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 5 8 8 8 8 8 8 8 8 8 8 1 8 8 8 8 7 8
## [307] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 1 8 8 8 6 4 7 8 8 6 8 8 6 8 7 8 5 5
## [341] 8 8 8 8 8 8 8 8 1 8 8 8 7 2 8 8 8 8 8 8 5 8 6 8 8 8 7 8 8 8 7 5 8 8
## [375] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 7 5 8 8 8 8 8 8 8 8 8 8 2 8 8 8 8 3 8 8
## [409] 3 8 8 8 8 8 8 8 7 3 8 7 0 8 8 1 5 8 7 8 8 8 6 8 8 8 8 8 8 8 8 8 3 8
## [443] 8 8 8 8 8 6 4 4 8 8 8 8 8 8 8 8 8 8 8 8 4 8 2 8 8 8 8 8 8 8 8 8 8 3
## [477] 8 8 8 5 8 8 8 7 8 8 8 5 8 4 8 8 8 8 8 5 5 8 5 8 8 8 8 8 5 4 5 8 8 8
## [511] 8 7 7 7 8 8 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 4 8 8 8 8 7 8 7 8 8
## [545] 8 7 8 8 8 8 8 6 8 8 8 7 8 8 8 8 8 8 8 8 4 8 8 7 8 8 8 8 8 8 8 8 8 4
## [579] 4 5 7 8 8 8 8 8 7 8 8 8 8 2 8 8 4 5 8 8 8 8 5 5 8 8 4 8 8 8 8 8 8 8
## [613] 8 7 4 8 8 8 8 8 7 8 8 7 8 8 8 8 8 8 8 6 7 8 5 8 8 8 8 8 6 8 2 8 8 8
## [647] 8 7 8 8 8 8 8 8 7 8 7 8 5 8 8 8 8 8 7 8 8 7 8 8 8 8 8 2 2 8 7 8 7 4
## [681] 8 8 4 8 4 8 8 8 8 8 8 5 8 8 8 8 8 8 8 8 8 8 8 7 8 8 8 8 8 8 8 8 7 8
## [715] 8 7 8 8 8 6 8 8 8 8 8 8 8 8 0 8 6 7 6 8 8 8 8 8 8 4 4 8 8 4 8 8 7 3
## [749] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 4 8 8 8 7 8 8 8 8 8 6 8
## [783] 8 8 8 8 8 8 8 8 7 8 8 8 7 8 6 8 8 8 5 8 8 2 5 8 8 8 6 8 8 8 6 8 8 8
## [817] 8 8 8 8 5 8 8 8 8 8 5 5 8 7 3 8 8 8 7 8 8 8 8 8 5 8 2 8 8 8 7 8 8 0
## [851] 6 8 8 3 8 8 6 4 8 5 7 8 8 8 8 8 7 7 5 8 8 7 8 8 8 8 8 7 8 8 6 8 8 8
## [885] 4 7 8 8 8 8 8 8 8 8 8 1 8 6 8 8 8 8 8 8 7 7 8 8 8 8 8 8 8 8 7 8 7 8
## [919] 8 8 8 8 8 8 8 7 5 8 7 8 4 8 8 5 7 6 8 8 8 5 8 8 8 8 8 8 2 8 8 4 8 8
## [953] 8 8 8 8 6 8 8 8 6 7 8 8 8 7 8 2 8 5 8 7 8 8 8 7 8 8 1 8 5 8 7 4 7 5
## [987] 8 8 8 3 8 8 8 7 8 8 8 5 8 8 8 6 8 1 5 8 8 1 8 4 7 8 8 4 8 8 7 8 8 8
## [1021] 5 1 8 6 4 5 8 8 8 6 5 8 8 8 8 8 8 8 8 5 8 6 8 8 7 7 8 8 2 8 4 8 8 8
## [1055] 8 8 8 8 4 8 8 6 8 8 8 8 8 4 8 8 8 8 4 4 8 8 3 8 8 8 4 8 3 8 8 8 4 2
## [1089] 3 8 7 3 8 7 8 6 4 7 8 8 6 6 7 4 6 7 4 8 7 5 8 4 4 4 4 8 8 8 8 5 8 4
## [1123] 8 6 8 6 8 6 8 8 7 8 7 8 8 6 8 7 4 2 8 8 6 8 2 6 8 8 8 7 8 6 8 8 8 3
## [1157] 8 6 8 8 8 3 3 8 8 8 8 8 8 8 7 7 8 6 8 7 8 8 8 5 8 8 8 8 8 4 7 8 8 4
## [1191] 8 7 4 4 4 4 8 8 4 8 2 7 1 4 8 4 0 6 8 4 5 7 6 3 4 8 8 0 8 0 6 5 8 8
## [1225] 0 1 8 4 8 4 4 8 7 8 4 4 5 8 4 4 5 5 8 8 8 1 8 8 5 7 6 4 8 8 8 7 8 6
## [1259] 6 8 1 2 7 3 6 6 6 8 7 8 8 6 8 3 4 8 8 5 7 7 8 8 2 8 6 4 8 7 7 7 4 4
## [1293] 5 4 5 4 5 1 6 8 5 3 5 8 8 5 6 8 3 8 7 3 8 8 8 8 3 3 8 8 8 1 8 8 8 5
## [1327] 8 5 7 8 4 8 4 6 7 3 8
ridgeSqM_custom <- test.protLMcontrast(modelCPTAC_RR, L, custom_dfs = custom_dfs)
ridgeSqM_custom <- prot.p.adjust(ridgeSqM_custom)
ridgeSqM_custom <- prot.signif(ridgeSqM_custom)
In order to be able to make diagnostic plots and tables, load the required functions from our GitHub page.
source("https://raw.githubusercontent.com/statOmics/MSqRob/master/vignettes/plotfunctions.R")
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
Extra: add a column containing TRUE if the corresponding protein is a spiked-in UPS1 protein and FALSE if it is a yeast protein. For this, we use the addTPcol
function. pattern
indicates for the pattern in the protein name that points to a true positive and TPcolname
is the name to be given to the added column.
lm_normal <- addTPcol(lm_normal, pattern="UPS", TPcolname="ups")
lm_Sq <- addTPcol(lm_Sq, pattern="UPS", TPcolname="ups")
lm_SqM <- addTPcol(lm_SqM, pattern="UPS", TPcolname="ups")
ridge <- addTPcol(ridge, pattern="UPS", TPcolname="ups")
ridgeSq <- addTPcol(ridgeSq, pattern="UPS", TPcolname="ups")
ridgeSqM <- addTPcol(ridgeSqM, pattern="UPS", TPcolname="ups")
The makeSummary
function allows to evaluate the performance of all methods on the CPTAC dataset. For this, we need to create a vector that contains the theoretical log2 fold changes of the spiked-in UPS1 proteins.
upratio <- c(1.565597,3.137504,4.744161,6.321928,1.571906,
3.178564,4.756331,1.60665,3.18442,1.577767)
Create the summary objects.
summary_lm <- makeSummary(lm_normal, upratio, TPcol="ups")
summary_lmSq <- makeSummary(lm_Sq, upratio, TPcol="ups")
summary_lmSqM <- makeSummary(lm_SqM, upratio, TPcol="ups")
summary_ridge <- makeSummary(ridge, upratio, TPcol="ups")
summary_ridgeSq <- makeSummary(ridgeSq, upratio, TPcol="ups")
summary_ridgeSqM <- makeSummary(ridgeSqM, upratio, TPcol="ups")
Results
summary_lm
## bias0 bias1 sd0 sd1
## condition6B-condition6A -0.049856737 -0.26374480 0.2178967 0.5458871
## condition6C-condition6A -0.066602648 -0.35999656 0.2263513 0.6987931
## condition6D-condition6A -0.046040848 -0.33193686 0.3014827 0.8405903
## condition6E-condition6A -0.130322615 -0.46343147 0.3826877 0.8998965
## condition6C-condition6B -0.015086174 -0.02559164 0.1893331 0.3421951
## condition6D-condition6B 0.003024942 -0.01200667 0.3426266 0.4939676
## condition6E-condition6B -0.083155730 -0.15369149 0.4391523 0.7370612
## condition6D-condition6C 0.019111062 0.01573994 0.3271477 0.2814700
## condition6E-condition6C -0.065348662 -0.11069860 0.4329431 0.5299769
## condition6E-condition6D -0.082512359 -0.05807474 0.2894723 0.5719607
## mad0 mad1 RMSE0 RMSE1 TP5proc
## condition6B-condition6A 0.1414074 0.3276234 0.2234547 0.5999277 28
## condition6C-condition6A 0.1525443 0.4102281 0.2358719 0.7782683 38
## condition6D-condition6A 0.1965858 0.5441178 0.3048755 0.8939292 39
## condition6E-condition6A 0.2473029 0.4309940 0.4041443 1.0021662 40
## condition6C-condition6B 0.1241324 0.2474419 0.1898682 0.3392508 39
## condition6D-condition6B 0.2323222 0.2924419 0.3425220 0.4884697 42
## condition6E-condition6B 0.2829725 0.3493636 0.4468065 0.7446700 43
## condition6D-condition6C 0.2207689 0.1905404 0.3275930 0.2787697 42
## condition6E-condition6C 0.2857494 0.2645359 0.4376989 0.5356193 44
## condition6E-condition6D 0.1747342 0.1914513 0.3009065 0.5686827 44
## FP5proc TN5proc FN5proc
## condition6B-condition6A 1 1451 11
## condition6C-condition6A 5 1447 2
## condition6D-condition6A 40 1413 1
## condition6E-condition6A 292 1154 0
## condition6C-condition6B 1 1452 5
## condition6D-condition6B 56 1397 2
## condition6E-condition6B 308 1136 1
## condition6D-condition6C 46 1407 3
## condition6E-condition6C 262 1182 1
## condition6E-condition6D 30 1420 2
summary_lmSq
## bias0 bias1 sd0 sd1
## condition6B-condition6A -0.049856737 -0.26374480 0.2178967 0.5458871
## condition6C-condition6A -0.066602648 -0.35999656 0.2263513 0.6987931
## condition6D-condition6A -0.046040848 -0.33193686 0.3014827 0.8405903
## condition6E-condition6A -0.130322615 -0.46343147 0.3826877 0.8998965
## condition6C-condition6B -0.015086174 -0.02559164 0.1893331 0.3421951
## condition6D-condition6B 0.003024942 -0.01200667 0.3426266 0.4939676
## condition6E-condition6B -0.083155730 -0.15369149 0.4391523 0.7370612
## condition6D-condition6C 0.019111062 0.01573994 0.3271477 0.2814700
## condition6E-condition6C -0.065348662 -0.11069860 0.4329431 0.5299769
## condition6E-condition6D -0.082512359 -0.05807474 0.2894723 0.5719607
## mad0 mad1 RMSE0 RMSE1 TP5proc
## condition6B-condition6A 0.1414074 0.3276234 0.2234547 0.5999277 28
## condition6C-condition6A 0.1525443 0.4102281 0.2358719 0.7782683 38
## condition6D-condition6A 0.1965858 0.5441178 0.3048755 0.8939292 39
## condition6E-condition6A 0.2473029 0.4309940 0.4041443 1.0021662 40
## condition6C-condition6B 0.1241324 0.2474419 0.1898682 0.3392508 39
## condition6D-condition6B 0.2323222 0.2924419 0.3425220 0.4884697 42
## condition6E-condition6B 0.2829725 0.3493636 0.4468065 0.7446700 43
## condition6D-condition6C 0.2207689 0.1905404 0.3275930 0.2787697 43
## condition6E-condition6C 0.2857494 0.2645359 0.4376989 0.5356193 44
## condition6E-condition6D 0.1747342 0.1914513 0.3009065 0.5686827 44
## FP5proc TN5proc FN5proc
## condition6B-condition6A 1 1451 11
## condition6C-condition6A 4 1448 2
## condition6D-condition6A 41 1412 1
## condition6E-condition6A 279 1167 0
## condition6C-condition6B 0 1453 5
## condition6D-condition6B 53 1400 2
## condition6E-condition6B 299 1145 1
## condition6D-condition6C 45 1408 2
## condition6E-condition6C 255 1189 1
## condition6E-condition6D 26 1424 2
summary_lmSqM
## bias0 bias1 sd0 sd1
## condition6B-condition6A -0.051102312 -0.22068964 0.2061248 0.6441525
## condition6C-condition6A -0.076178729 -0.22020142 0.2146940 0.7579452
## condition6D-condition6A -0.070823770 -0.16107843 0.2750647 0.8758196
## condition6E-condition6A -0.178240910 -0.27070587 0.3621340 0.9359432
## condition6C-condition6B -0.023217613 0.06777870 0.1705309 0.2873883
## condition6D-condition6B -0.020434384 0.10905664 0.3149625 0.4290169
## condition6E-condition6B -0.130350435 -0.02728519 0.4244712 0.7205215
## condition6D-condition6C 0.004014907 0.04371379 0.3020534 0.2401626
## condition6E-condition6C -0.104370563 -0.07846443 0.4237050 0.5577448
## condition6E-condition6D -0.106267679 -0.04940243 0.2782478 0.6298645
## mad0 mad1 RMSE0 RMSE1 TP5proc
## condition6B-condition6A 0.1226133 0.2620543 0.2122961 0.6730505 32
## condition6C-condition6A 0.1311001 0.4165869 0.2277388 0.7801330 38
## condition6D-condition6A 0.1616557 0.5340900 0.2839445 0.8796759 39
## condition6E-condition6A 0.2146158 0.3733051 0.4035098 0.9630013 40
## condition6C-condition6B 0.1073873 0.1729820 0.1720460 0.2920769 41
## condition6D-condition6B 0.1985114 0.2100466 0.3155165 0.4379107 43
## condition6E-condition6B 0.2674615 0.2322210 0.4438944 0.7128091 43
## condition6D-condition6C 0.1944814 0.1533858 0.3019761 0.2414689 43
## condition6E-condition6C 0.2610014 0.1585251 0.4362279 0.5570665 44
## condition6E-condition6D 0.1681527 0.1582347 0.2977604 0.6249363 44
## FP5proc TN5proc FN5proc
## condition6B-condition6A 6 1446 7
## condition6C-condition6A 17 1435 2
## condition6D-condition6A 96 1356 1
## condition6E-condition6A 576 870 0
## condition6C-condition6B 2 1451 3
## condition6D-condition6B 114 1339 1
## condition6E-condition6B 542 902 1
## condition6D-condition6C 79 1374 2
## condition6E-condition6C 480 964 1
## condition6E-condition6D 180 1270 2
summary_ridge
## bias0 bias1 sd0 sd1
## condition6B-condition6A -0.022821204 -0.28651011 0.1285001 0.5510196
## condition6C-condition6A -0.031517517 -0.40657348 0.1363416 0.7167196
## condition6D-condition6A -0.025406053 -0.40840787 0.1898219 0.8732607
## condition6E-condition6A -0.058529496 -0.56970162 0.2660547 0.9688576
## condition6C-condition6B -0.007310000 -0.08452214 0.1225707 0.4156455
## condition6D-condition6B -0.004071671 -0.11608482 0.2272722 0.6692130
## condition6E-condition6B -0.039083209 -0.29294585 0.3178031 0.9618865
## condition6D-condition6C 0.003774039 -0.03025730 0.2192206 0.3371939
## condition6E-condition6C -0.030789519 -0.19193587 0.3110181 0.6221030
## condition6E-condition6D -0.033137803 -0.09782654 0.1959141 0.5637166
## mad0 mad1 RMSE0 RMSE1 TP5proc
## condition6B-condition6A 0.02986928 0.2967893 0.1304674 0.6147564 28
## condition6C-condition6A 0.04068911 0.4511841 0.1398914 0.8161782 38
## condition6D-condition6A 0.05174847 0.5286015 0.1914499 0.9541051 39
## condition6E-condition6A 0.09720540 0.4315930 0.2723269 1.1134531 40
## condition6C-condition6B 0.02228200 0.3069350 0.1227465 0.4194982 37
## condition6D-condition6B 0.05551943 0.2833159 0.2272306 0.6716721 42
## condition6E-condition6B 0.09570841 0.3440744 0.3200883 0.9949950 43
## condition6D-condition6C 0.04909078 0.1920756 0.2191778 0.3347963 42
## condition6E-condition6C 0.09226521 0.2562150 0.3124314 0.6443999 43
## condition6E-condition6D 0.05357493 0.1897926 0.1986304 0.5660727 42
## FP5proc TN5proc FN5proc
## condition6B-condition6A 0 1456 11
## condition6C-condition6A 1 1454 2
## condition6D-condition6A 27 1429 1
## condition6E-condition6A 178 1271 0
## condition6C-condition6B 0 1456 7
## condition6D-condition6B 31 1425 2
## condition6E-condition6B 190 1257 1
## condition6D-condition6C 28 1428 3
## condition6E-condition6C 148 1299 2
## condition6E-condition6D 17 1437 4
summary_ridgeSq
## bias0 bias1 sd0 sd1
## condition6B-condition6A -0.022821204 -0.28651011 0.1285001 0.5510196
## condition6C-condition6A -0.031517517 -0.40657348 0.1363416 0.7167196
## condition6D-condition6A -0.025406053 -0.40840787 0.1898219 0.8732607
## condition6E-condition6A -0.058529496 -0.56970162 0.2660547 0.9688576
## condition6C-condition6B -0.007310000 -0.08452214 0.1225707 0.4156455
## condition6D-condition6B -0.004071671 -0.11608482 0.2272722 0.6692130
## condition6E-condition6B -0.039083209 -0.29294585 0.3178031 0.9618865
## condition6D-condition6C 0.003774039 -0.03025730 0.2192206 0.3371939
## condition6E-condition6C -0.030789519 -0.19193587 0.3110181 0.6221030
## condition6E-condition6D -0.033137803 -0.09782654 0.1959141 0.5637166
## mad0 mad1 RMSE0 RMSE1 TP5proc
## condition6B-condition6A 0.02986928 0.2967893 0.1304674 0.6147564 28
## condition6C-condition6A 0.04068911 0.4511841 0.1398914 0.8161782 38
## condition6D-condition6A 0.05174847 0.5286015 0.1914499 0.9541051 39
## condition6E-condition6A 0.09720540 0.4315930 0.2723269 1.1134531 40
## condition6C-condition6B 0.02228200 0.3069350 0.1227465 0.4194982 38
## condition6D-condition6B 0.05551943 0.2833159 0.2272306 0.6716721 42
## condition6E-condition6B 0.09570841 0.3440744 0.3200883 0.9949950 43
## condition6D-condition6C 0.04909078 0.1920756 0.2191778 0.3347963 42
## condition6E-condition6C 0.09226521 0.2562150 0.3124314 0.6443999 43
## condition6E-condition6D 0.05357493 0.1897926 0.1986304 0.5660727 43
## FP5proc TN5proc FN5proc
## condition6B-condition6A 0 1456 11
## condition6C-condition6A 1 1454 2
## condition6D-condition6A 28 1428 1
## condition6E-condition6A 175 1274 0
## condition6C-condition6B 0 1456 6
## condition6D-condition6B 30 1426 2
## condition6E-condition6B 182 1265 1
## condition6D-condition6C 27 1429 3
## condition6E-condition6C 152 1295 2
## condition6E-condition6D 16 1438 3
summary_ridgeSqM
## bias0 bias1 sd0 sd1
## condition6B-condition6A -0.021546334 -0.221222067 0.11027082 0.5890038
## condition6C-condition6A -0.033838468 -0.247990270 0.13424914 0.7634001
## condition6D-condition6A -0.039822113 -0.208430810 0.17090693 0.8892894
## condition6E-condition6A -0.089816147 -0.342742366 0.25703729 0.9904412
## condition6C-condition6B -0.010325975 0.009247665 0.09530226 0.3901085
## condition6D-condition6B -0.019616744 0.007723056 0.21193367 0.6578738
## condition6E-condition6B -0.071722048 -0.139601184 0.30795780 0.9434899
## condition6D-condition6C -0.008452197 0.001254246 0.20423518 0.3253406
## condition6E-condition6C -0.059762726 -0.133145228 0.30770773 0.6064819
## condition6E-condition6D -0.050029492 -0.071096844 0.19257281 0.5499110
## mad0 mad1 RMSE0 RMSE1 TP5proc
## condition6B-condition6A 0.02580259 0.2811846 0.11231895 0.6220685 32
## condition6C-condition6A 0.03648585 0.4242615 0.13840335 0.7935423 38
## condition6D-condition6A 0.05080025 0.5296974 0.17542781 0.9025010 39
## condition6E-condition6A 0.09455430 0.3857245 0.27219389 1.0363020 39
## condition6C-condition6B 0.02085484 0.2004131 0.09582749 0.3857609 41
## condition6D-condition6B 0.05118631 0.2270338 0.21276713 0.6504009 42
## condition6E-condition6B 0.09630768 0.2854367 0.31609574 0.9430962 43
## condition6D-condition6C 0.04454148 0.1574384 0.20433992 0.3217079 43
## condition6E-condition6C 0.08489230 0.1408196 0.31335315 0.6143078 44
## condition6E-condition6D 0.05284178 0.1725075 0.19890131 0.5485279 44
## FP5proc TN5proc FN5proc
## condition6B-condition6A 2 1454 7
## condition6C-condition6A 4 1451 2
## condition6D-condition6A 45 1411 1
## condition6E-condition6A 351 1098 1
## condition6C-condition6B 1 1455 3
## condition6D-condition6B 52 1404 2
## condition6E-condition6B 343 1104 1
## condition6D-condition6C 39 1417 2
## condition6E-condition6C 286 1161 1
## condition6E-condition6D 74 1380 2
colors <- c(1,2,3,4,5,6)
resultobjectlist <- list(lm_normal,lm_Sq,lm_SqM,ridge,ridgeSq,ridgeSqM)
truth <- lm_normal[[1]][,"ups"]
names(truth) <- rownames(lm_normal[[1]])
summary_objectlist <- lapply(resultobjectlist, function(x){return(makeSummary(x, upratio))})
plotROC(resultobjectlist, summary_objectlist=summary_objectlist, proteins=proteinsCPTAC, truth=truth, colors=colors, plotSVG=FALSE)
## $AUC
## [,1] [,2] [,3] [,4] [,5]
## condition6B-condition6A 0.9239222 0.9255424 0.9715589 0.9612919 0.9624542
## condition6C-condition6A 0.9895704 0.9907045 0.9731271 0.9941409 0.9948282
## condition6D-condition6A 0.9956044 0.9961882 0.9897493 0.9979224 0.9982658
## condition6E-condition6A 0.9957557 0.9962905 0.9913043 0.9975155 0.9979296
## condition6C-condition6B 0.9904158 0.9928509 0.9952391 0.9812531 0.9815497
## condition6D-condition6B 0.9871379 0.9901349 0.9926480 0.9802853 0.9807380
## condition6E-condition6B 0.9847333 0.9872620 0.9839323 0.9811522 0.9816234
## condition6D-condition6C 0.9746642 0.9769994 0.9839133 0.9730464 0.9735653
## condition6E-condition6C 0.9794671 0.9814943 0.9767642 0.9782078 0.9789757
## condition6E-condition6D 0.9711441 0.9725943 0.9763471 0.9757341 0.9766013
## [,6]
## condition6B-condition6A 0.9851367
## condition6C-condition6A 0.9841409
## condition6D-condition6A 0.9939732
## condition6E-condition6A 0.9929607
## condition6C-condition6B 0.9827672
## condition6D-condition6B 0.9818619
## condition6E-condition6B 0.9811208
## condition6D-condition6C 0.9747405
## condition6E-condition6C 0.9781924
## condition6E-condition6D 0.9751809
##
## $pAUC
## [,1] [,2] [,3] [,4]
## condition6B-condition6A 0.08252325 0.08310440 0.09002536 0.08475979
## condition6C-condition6A 0.09580756 0.09616838 0.09625430 0.09659794
## condition6D-condition6A 0.09725275 0.09783654 0.09708104 0.09751030
## condition6E-condition6A 0.09720497 0.09773982 0.09680814 0.09689441
## condition6C-condition6B 0.09496753 0.09563874 0.09710602 0.09640360
## condition6D-condition6B 0.09543581 0.09581044 0.09688749 0.09598214
## condition6E-condition6B 0.09584092 0.09618647 0.09632783 0.09588804
## condition6D-condition6C 0.09336081 0.09365079 0.09455128 0.09368132
## condition6E-condition6C 0.09501651 0.09503187 0.09506258 0.09501651
## condition6E-condition6D 0.09435739 0.09474613 0.09507506 0.09468632
## [,5] [,6]
## condition6B-condition6A 0.08505917 0.09131093
## condition6C-condition6A 0.09676976 0.09673540
## condition6D-condition6A 0.09785371 0.09708104
## condition6E-condition6A 0.09730849 0.09680814
## condition6C-condition6B 0.09659091 0.09665335
## condition6D-condition6B 0.09615385 0.09630994
## condition6E-condition6B 0.09623359 0.09494566
## condition6D-condition6C 0.09386447 0.09467338
## condition6E-condition6C 0.09500115 0.09500115
## condition6E-condition6D 0.09486574 0.09443215
Argentini, A., Goeminne, L. J. E., Verheggen, K., Hulstaert, N., Staes, A., Clement, L. and Martens, L. (2016) moFF: a robust and automated approach to extract peptide ion intensities. Nature Methods 13, pp 964–966.
Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. (2003) A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance. Bioinformatics 19(2), pp 185-193. http://bmbolstad.com/misc/normalize/normalize.html
Goeminne, L. J. E., Argentini, A., Martens, L., and Clement, L. (2015) Summarization vs Peptide-Based Models in Label-Free Quantitative Proteomics: Performance, Pitfalls, and Data Analysis Guidelines. Journal of Proteome Research 14(6), pp 2457-2465.
Goeminne, L. J. E., Gevaert, K., and Clement, L. (2016) Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics. Molecular & Cellular Proteomics 15(2), pp 657-668.
Griss, J., Jones, A.R., Sachsenberg, T., Walzer, M., Gatto, L., Hartler, J., Thallinger, G.G., Salek, R.M., Steinbeck, C., Neuhauser, N., Cox, J., Neumann, S., Fan, J., Reisinger, F., Xu, Q.W., Del Toro, N., Pérez-Riverol, Y., Ghali, F., Bandeira, N., Xenarios, I., Kohlbacher, O., Vizcaíno, J.A. & Hermjakob, H. (2014) The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Molecular and Cellular Proteomics 13(10), pp 2765-75.
Navarro P., Kuharev J., Gillet, L. C., Bernhardt, O. M., MacLean, B., Röst, H. L., Tate, S. A., Tsou, C., Reiter, L., Distler, U., Rosenberger, G., Perez-Riverol, Y., Nesvizhskii, A. I., Aebersold, R., and Tenzer, S. (2016) A multicenter study benchmarks software tools for label-free proteome quantification. Nature Biotechnology.
Paulovich AG, Billheimer D, Ham A-JL, et al. (2010) Interlaboratory Study Characterizing a Yeast Performance Standard for Benchmarking LC-MS Platform Performance. Molecular & Cellular Proteomic 9(2), pp 242-254.
Ramond, E., Gesbert, G., Guerrera, I. C., Chhuon, C., Dupuis, M., Rigard, M., Henry, T., Barel, M., and Charbit, A. (2015) Importance of host cell arginine uptake in Francisella phagosomal escape and ribosomal protein amounts. Molecular & Cellular Proteomics 14, pp 870–881.