MSqRob Vignette

Ludger Goeminne, Kris Gevaert and Lieven Clement

2018-11-23

A first notice

This vignette might be a bit overwhelming for a first-time user.

You might want to skim through this vignette and then take a look at the same two examples stripped to their essential analyses. These can be found at https://github.com/statOmics/MSqRobData/blob/master/inst/extdata/Francisella/analysis_Francisella.Rmd and https://github.com/statOmics/MSqRobData/blob/master/inst/extdata/CPTAC/analysis_CPTAC.Rmd for the Francisella and CPTAC example respectively.

Note that we mainly focus on MaxQuant output, but that any type type of search engine output can be used with MSqRob, see “1.2. Importing other data types” for more info.

If anything is unclear, please do not hesitate to contact me at ludger.goeminne@vib-ugent.be or to open an issue on GitHub. Your feedback is highly welcomed!

Introduction

This vignette will guide you through the main features of the MSqRob package. We will highlight two examples in order to help you familiarize with its main features.

The MSqRob package is designed for the analysis of differential abundance in label-free mass spectrometry-based shotgun proteomics experiments. These experiments typically output large spectral files (e.g. “.mzML” files) that need to be searched for identifications by specialized peak detection and peptide identification software such as Andromeda (used in the MaxQuant software package) and Mascot Distiller. These algorithms output lists of putative peptides accompanied by their corresponding overall intensity estimates.

Often, researchers further summarize these peptide intensities to protein intensities. However, as we have shown before, this summarization-based approaches tend to perform suboptimal compared to peptide-based models, wherein the peptide intensities are directly provided as input to the statistical model used for differential quantification (Goeminne et al., 2015). Nonetheless, most of these models still suffer from overfitting issues.

This package implements the robust ridge approach that prevents overfitting in sparse data but also accurately fits highly abundant proteins as described in Goeminne et al. (2016).

MSqRob is completely free and open source, but please make a reference to Goeminne et al. (2016) when using our package in any kind of publication.

Downloading, installing and loading MSqRob

Installing the package

On Windows, make sure that RTools is installed. Go to: https://cran.r-project.org/bin/windows/Rtools/ to download RTools. A user guide on how to install RTools on Windows can be found at: https://github.com/stan-dev/rstan/wiki/Install-Rtools-for-Windows. Errors in MSqRob on Windows related to unable to zip the results Excel file might be related to errors in configuring RTools.

First, we need to install the package devtools:

#Change to eval=TRUE to execute this piece of code.
install.packages("devtools", repos = "http://cran.us.r-project.org")
library(devtools)

Then, we call this to install MSqRob from GitHub:

#Change to eval=TRUE to execute this piece of code.
devtools::install_github("statOmics/MSqRob")
library(MSqRob)

Note that the latest version of MSqRob can always be found at: https://github.com/statOmics/MSqRob.

The latest version of this vignette can be found at: https://github.com/statOmics/MSqRob/blob/master/vignettes/MSqRob.Rmd.

Loading MSqRob

Loading the package is done via the library command, as with any other package.

library(MSqRob)
## Loading required package: lme4
## Loading required package: Matrix

Additionally, we load the packages Biobase, MSnbase and limma, as we will use a few of their functions in this vignette.

library(Biobase)
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:Matrix':
## 
##     colMeans, colSums, rowMeans, rowSums, which
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
##     as.data.frame, basename, cbind, colMeans, colSums, colnames,
##     dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
##     intersect, is.unsorted, lapply, lengths, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
##     rowMeans, rowSums, rownames, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which, which.max, which.min
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
library(MSnbase)
## Loading required package: mzR
## Loading required package: Rcpp
## Warning in fun(libname, pkgname): mzR has been built against a different Rcpp version (0.12.16)
## than is installed on your system (0.12.17). This might lead to errors
## when loading mzR. If you encounter such issues, please send a report,
## including the output of sessionInfo() to the Bioc support forum at 
## https://support.bioconductor.org/. For details see also
## https://github.com/sneumann/mzR/wiki/mzR-Rcpp-compiler-linker-issue.
## Loading required package: BiocParallel
## Loading required package: ProtGenerics
## 
## This is MSnbase version 2.6.1 
##   Visit https://lgatto.github.io/MSnbase/ to get started.
## 
## Attaching package: 'MSnbase'
## The following object is masked from 'package:stats':
## 
##     smooth
## The following object is masked from 'package:base':
## 
##     trimws
library(limma)
## 
## Attaching package: 'limma'
## The following object is masked from 'package:BiocGenerics':
## 
##     plotMA

The MSqRobData package

R packages uploaded on CRAN have a maximum size limit of 5 MB. As proteomics datasets are typically several hunderds of GB in size and even an output file of a peptide search can be tens of MBs in size, we uploaded our example data in a separate data package called MSqRobData. If you want to run the examples in this vignette, you also need the data package. This can also be downloaded from GitHub in the following way:

#Change to eval=TRUE to execute this piece of code.
devtools::install_github("statOmics/MSqRobData")
library(MSqRobData)

The Shiny App

If you want to work with the Shiny graphical user interface (MaxQuant output only), use the following command:

#Change to eval=TRUE to execute this piece of code.
shiny::runApp(system.file('App-MSqRob', package='MSqRob'))

Francisella tularensis dataset (design with pseudoreplication)

The bacterium Francisella tularensis is a facultative intracellular parasite of macrophages and must import host cell arginine in order to be able to multiply inside the host cell. Ramond et al. (2015) investigated the effect of gene deletion of a newly identified arginine transporter called ArgP. Their data is deposited in the PRIDE repository at http://www.ebi.ac.uk/pride/archive/projects/PXD001584. We used the authors’ supplied peptides.txt file which is the result of preprocessing with MaxQuant version 1.4.1.2. As the authors used now outdated GI numbers, we modified this file to incorporate NCBI accessions and protein names (September 25, 2016).

1. Importing data

MSqRob data input can come from any kind of source, as long as feature-level or peptide-level data is available. I will here focus on a MaxQuant example, but in fact, output from any search engine can be used.

1.1. Importing MaxQuant data

First, specify the location of the peptides.txt MaxQuant output file. By default, MaxQuant creates a “combined” folder in the same folder as the raw files are (or if the raw files are in different directories, in the directory of the first raw file). The peptides.txt file can then be found in “path_to_raw_files/combined/txt/peptides.txt”. Just provide the path to where the file is located on your computer. The system.file function is used here only to access the example tab-delimited peptides.txt file that is delivered with our package.

file_peptides_txt <- system.file("extdata/Francisella", "peptides.txt", package = "MSqRobData")
file_peptides_txt
## [1] "/Library/Frameworks/R.framework/Versions/3.5/Resources/library/MSqRobData/extdata/Francisella/peptides.txt"

Import the data using the import2MSnSet function. The pattern argument should contain a text string that is present in all column names that contain peptide intensities but not in the other column names. This defaults to "Intensity\ " for MaxQuant data. The default setting remove_pattern=TRUE then removes the pattern from these column names so that the names equal the names that were entered in MaxQuant’s “Experiment” column when conducting the search. In this particular Francisella experiment, our column names start with a number (either “1” or “3”), which is not a valid column name in R. Therefore, an “X” is prepended to these names.

peptidesFranc <- import2MSnSet(file_peptides_txt, filetype = "MaxQuant", remove_pattern = TRUE)

import2MSnSet creates an MSnSet object with 3 slots:

If you want to assess whether importing happened in a correct way, you can always check the exprs, fData and pData slots.

The exprs slot should contain all the peptide intensities grouped per column.

head(exprs(peptidesFranc))
##   X1WT_20_2h_n3_1 X1WT_20_2h_n3_2 X1WT_20_2h_n3_3 X1WT_20_2h_n4_1
## 1               0               0         9290400        92996000
## 2               0        20679000               0               0
## 3        28853000        28091000        30436000        19476000
## 4       339830000       420390000       393930000       235060000
## 5       240280000               0       241890000       222250000
## 6               0        31069000        31448000               0
##   X1WT_20_2h_n4_2 X1WT_20_2h_n4_3 X1WT_20_2h_n5_1 X1WT_20_2h_n5_2
## 1               0        98059000               0        80803000
## 2               0               0               0        17358000
## 3        22050000        20687000        36588000        28654000
## 4       381090000       360270000       255710000       311370000
## 5       189930000               0       160440000       147230000
## 6        27721000        28657000               0               0
##   X1WT_20_2h_n5_3 X3D8_20_2h_n3_1 X3D8_20_2h_n3_2 X3D8_20_2h_n3_3
## 1               0               0        95433000               0
## 2        13841000        12278000        10712000               0
## 3         7009100        14174000         8548000         7859200
## 4       271440000       225140000       289880000       282500000
## 5       143340000       135910000       185740000       168620000
## 6               0               0               0               0
##   X3D8_20_2h_n4_1 X3D8_20_2h_n4_2 X3D8_20_2h_n4_3 X3D8_20_2h_n5_1
## 1               0               0               0               0
## 2         9500700        11111000         8723500         8125200
## 3        14871000        11690000         4470500               0
## 4       209490000       223610000       219700000       169120000
## 5       160240000       158000000               0        92542000
## 6               0        18253000               0               0
##   X3D8_20_2h_n5_2 X3D8_20_2h_n5_3
## 1               0               0
## 2               0               0
## 3         4048100         3422300
## 4       178150000       170400000
## 5       102350000       100090000
## 6               0               0

The fData slot contains all other columns.

head(fData(peptidesFranc))
##                  Sequence Amino.acid.before First.amino.acid
## 1               AAAEELDTR                 K                A
## 2           AAAGFVITASHNK                 R                A
## 3 AAANEYELALAYSIEEVAPDLHK                 K                A
## 4            AAANNPQLEAFK                 K                A
## 5             AAASAGLVDEK                 K                A
## 6            AAASLDLYSYPK                 K                A
##   Second.amino.acid Second.last.amino.acid Last.amino.acid
## 1                 A                      T               R
## 2                 A                      N               K
## 3                 A                      H               K
## 4                 A                      F               K
## 5                 A                      E               K
## 6                 A                      P               K
##   Amino.acid.after A.Count R.Count N.Count D.Count C.Count Q.Count E.Count
## 1                K       3       1       0       1       0       0       2
## 2                F       4       0       1       0       0       0       0
## 3                Y       6       0       1       1       0       0       4
## 4                K       4       0       2       0       0       1       1
## 5                A       4       0       0       1       0       0       1
## 6                V       3       0       0       1       0       0       0
##   G.Count H.Count I.Count L.Count K.Count M.Count F.Count P.Count S.Count
## 1       0       0       0       1       0       0       0       0       0
## 2       1       1       1       0       1       0       1       0       1
## 3       0       1       1       3       1       0       0       1       1
## 4       0       0       0       1       1       0       1       1       0
## 5       1       0       0       1       1       0       0       0       1
## 6       0       0       0       2       1       0       0       1       2
##   T.Count W.Count Y.Count V.Count U.Count Length Missed.cleavages
## 1       1       0       0       0       0      9                0
## 2       1       0       0       1       0     13                0
## 3       0       0       2       1       0     23                0
## 4       0       0       0       0       0     12                0
## 5       0       0       0       1       0     11                0
## 6       0       0       2       0       0     12                0
##        Mass     Proteins    GI.number Leading.razor.protein
## 1  974.4669 WP_003038655 gi|118497196          gi|118497196
## 2 1285.6779 WP_003041237 gi|118498194          gi|118498194
## 3 2516.2435 WP_003038915 gi|118497331          gi|118497331
## 4 1272.6463 WP_003038264 gi|118496879          gi|118496879
## 5 1030.5295 WP_003035781 gi|118497152          gi|118497152
## 6 1297.6554 WP_003039212 gi|118497492          gi|118497492
##                                                    Protein.names
## 1                  hypothetical protein [Francisella tularensis]
## 2             phosphoglucosamine mutase [Francisella tularensis]
## 3     glycine--tRNA ligase subunit beta [Francisella tularensis]
## 4              molecular chaperone HtpG [Francisella tularensis]
## 5 delta-aminolevulinic acid dehydratase [Francisella tularensis]
## 6                         phosphorylase [Francisella tularensis]
##   Unique..Groups. Unique..Proteins. Charges        PEP  Score
## 1             yes               yes     1,2 1.2531e-04  59.35
## 2             yes               yes       2 5.3020e-05 45.825
## 3             yes               yes       3 1.0517e-45 84.327
## 4             yes               yes       2 8.7549e-24 108.56
## 5             yes               yes       2 9.5062e-05 67.385
## 6             yes               yes       2 1.2563e-02 31.675
##   Experiment.1WT_20_2h_n3_1 Experiment.1WT_20_2h_n3_2
## 1                        NA                        NA
## 2                        NA                         1
## 3                         1                         1
## 4                         1                         1
## 5                         1                        NA
## 6                        NA                         1
##   Experiment.1WT_20_2h_n3_3 Experiment.1WT_20_2h_n4_1
## 1                         1                         1
## 2                        NA                        NA
## 3                         2                         1
## 4                         1                         1
## 5                         1                         1
## 6                         1                        NA
##   Experiment.1WT_20_2h_n4_2 Experiment.1WT_20_2h_n4_3
## 1                        NA                         1
## 2                        NA                        NA
## 3                         1                         1
## 4                         1                         1
## 5                         1                        NA
## 6                         1                         1
##   Experiment.1WT_20_2h_n5_1 Experiment.1WT_20_2h_n5_2
## 1                        NA                         1
## 2                        NA                         1
## 3                         2                         2
## 4                         1                         1
## 5                         1                         1
## 6                        NA                        NA
##   Experiment.1WT_20_2h_n5_3 Experiment.3D8_20_2h_n3_1
## 1                        NA                        NA
## 2                         1                         1
## 3                         1                         1
## 4                         1                         1
## 5                         1                         1
## 6                        NA                        NA
##   Experiment.3D8_20_2h_n3_2 Experiment.3D8_20_2h_n3_3
## 1                         1                        NA
## 2                         1                        NA
## 3                         2                         1
## 4                         1                         1
## 5                         1                         1
## 6                        NA                        NA
##   Experiment.3D8_20_2h_n4_1 Experiment.3D8_20_2h_n4_2
## 1                        NA                        NA
## 2                         1                         1
## 3                         1                         1
## 4                         1                         1
## 5                         1                         1
## 6                        NA                         1
##   Experiment.3D8_20_2h_n4_3 Experiment.3D8_20_2h_n5_1
## 1                        NA                        NA
## 2                         1                         1
## 3                         1                        NA
## 4                         1                         1
## 5                        NA                         1
## 6                        NA                        NA
##   Experiment.3D8_20_2h_n5_2 Experiment.3D8_20_2h_n5_3  Intensity Reverse
## 1                        NA                        NA 1.3295e+09        
## 2                        NA                        NA 1.6719e+08        
## 3                         1                         1 5.5675e+08        
## 4                         1                         1 1.4830e+10        
## 5                         1                         1 4.8209e+09        
## 6                        NA                        NA 1.3715e+08        
##   Contaminant id Protein.group.IDs Mod..peptide.IDs
## 1              0               419                0
## 2              1              1150                1
## 3              2               513                2
## 4              3               199                3
## 5              4               388                4
## 6              5               624                5
##                                                                                                                                                               Evidence.IDs
## 1                                                                                                                           0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18
## 2                                                                                                                             19;20;21;22;23;24;25;26;27;28;29;30;31;32;33
## 3                                         34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76
## 4 77;78;79;80;81;82;83;84;85;86;87;88;89;90;91;92;93;94;95;96;97;98;99;100;101;102;103;104;105;106;107;108;109;110;111;112;113;114;115;116;117;118;119;120;121;122;123;124
## 5                                              125;126;127;128;129;130;131;132;133;134;135;136;137;138;139;140;141;142;143;144;145;146;147;148;149;150;151;152;153;154;155
## 6                                                                                                                                                      156;157;158;159;160
##                                                                                                                                                                                                                                                                                                                                                         MS.MS.IDs
## 1                                                                                                                                                                                                                                                                                                                  0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18
## 2                                                                                                                                                                                                                                                                                                                    19;20;21;22;23;24;25;26;27;28;29;30;31;32;33
## 3                                                                                                                                                                                                                       34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76;77;78;79
## 4 80;81;82;83;84;85;86;87;88;89;90;91;92;93;94;95;96;97;98;99;100;101;102;103;104;105;106;107;108;109;110;111;112;113;114;115;116;117;118;119;120;121;122;123;124;125;126;127;128;129;130;131;132;133;134;135;136;137;138;139;140;141;142;143;144;145;146;147;148;149;150;151;152;153;154;155;156;157;158;159;160;161;162;163;164;165;166;167;168;169;170;171;172
## 5                                                                                                                                                                                                                                     173;174;175;176;177;178;179;180;181;182;183;184;185;186;187;188;189;190;191;192;193;194;195;196;197;198;199;200;201;202;203
## 6                                                                                                                                                                                                                                                                                                                                             204;205;206;207;208
##   Best.MS.MS Oxidation..M..site.IDs
## 1         15                       
## 2         26                       
## 3         39                       
## 4        159                       
## 5        191                       
## 6        205

The pData slot will contain the experimental annotation. Right now, it has 18 rows (equal to the number of mass spec runs) and 0 columns (as the experimental annotation is not yet loaded).

pData(peptidesFranc)
## data frame with 0 columns and 18 rows

1.2. Importing other data types

The import2MSnSet has built-in support for moFF (Argentini et al., 2016), mzTab (Griss et al., 2014) and Progenesis (Nonlinear Dynamics, Newcastle upon Tyne, U.K.) output. You should then simply set filetype to "moFF", "mzTab" or "Progenesis" respectively.

If you are using yet another data type, you can use MSqRob’s read2MSnSet function to import your data. Hereby, you have to specify (1) your file, (2) a regular expression pattern that indicates the columns with the peptide intensities or simply the indices of these columns and (3) the separator (e.g. "\t", ",", …).

2. Preprocessing

2.1. Create an experimental annotation data frame.

Now, we are going to create a data frame that contains the experimental annotation. This data frame should contain all explanatory variables that groups different runs together. This is important in order to be able to include factors on which you will do inference (such as treatment effects or effects over time, or, in our case: the effect of the genetic knock-out) as well as other factors that might have an influence (such as biological repeats, technical repeats and potential confounders such as batch effects).

Note that the “experimental annotation” can also be given as an Excel file or a tab delimited file! This file should have the column names as a header (i.e. the first row in the file) and the same structure as the exp_annotation data frame we create below. If you have your experimental annotation in a file, just provide the filepath to this annotation file in the preprocess_MaxQuant function.

Also note that exactly one column in the experimental annotation should contain the names of the mass spec runs. These names should be equal to the names you entered in the “Experiment” column in MaxQuant (i.e. the column names of the exprs slot of the peptidesFranc MSnSet object). In our case, we will call this column “run” (see below). By default, MSqRob guesses this column by taking the only column of which its elements correspond to the column names of the exprs slot. If there is no such a column, or there are multiple such columns, MSqRob throws an error.

In this example, we create the experimental annotation data frame ourselves.

Extract the names of the mass spec runs. The mass spec run names are equal to the column names of the exprs slot of the peptidesFranc MSnSet object. Other columns in the experimental annotation data frame will typically be derived from this column.

runs <- sampleNames(peptidesFranc)
runs
##  [1] "X1WT_20_2h_n3_1" "X1WT_20_2h_n3_2" "X1WT_20_2h_n3_3"
##  [4] "X1WT_20_2h_n4_1" "X1WT_20_2h_n4_2" "X1WT_20_2h_n4_3"
##  [7] "X1WT_20_2h_n5_1" "X1WT_20_2h_n5_2" "X1WT_20_2h_n5_3"
## [10] "X3D8_20_2h_n3_1" "X3D8_20_2h_n3_2" "X3D8_20_2h_n3_3"
## [13] "X3D8_20_2h_n4_1" "X3D8_20_2h_n4_2" "X3D8_20_2h_n4_3"
## [16] "X3D8_20_2h_n5_1" "X3D8_20_2h_n5_2" "X3D8_20_2h_n5_3"
colnames(exprs(peptidesFranc))
##  [1] "X1WT_20_2h_n3_1" "X1WT_20_2h_n3_2" "X1WT_20_2h_n3_3"
##  [4] "X1WT_20_2h_n4_1" "X1WT_20_2h_n4_2" "X1WT_20_2h_n4_3"
##  [7] "X1WT_20_2h_n5_1" "X1WT_20_2h_n5_2" "X1WT_20_2h_n5_3"
## [10] "X3D8_20_2h_n3_1" "X3D8_20_2h_n3_2" "X3D8_20_2h_n3_3"
## [13] "X3D8_20_2h_n4_1" "X3D8_20_2h_n4_2" "X3D8_20_2h_n4_3"
## [16] "X3D8_20_2h_n5_1" "X3D8_20_2h_n5_2" "X3D8_20_2h_n5_3"

Note again that the sample names have a prepended “X” because a syntactically valid name in R cannot start with a number.

Add a factor for the genetic knock-out, which we will call “genotype” (i.e. wild type (WT) or knock-out (KO)).

#Create the "genotype" vector
genotype <- factor(c(rep("WT",9),rep("KO",9)), levels=c("WT","KO"))
genotype
##  [1] WT WT WT WT WT WT WT WT WT KO KO KO KO KO KO KO KO KO
## Levels: WT KO

Add a factor for each of the 6 biological replicates (i.e. 3 biological repeats for each genotype).

#Create a factor for the biological replicates
n <- nchar(runs)
biorep <- as.factor(paste0("b_",rep(1:6,each=3)))
biorep
##  [1] b_1 b_1 b_1 b_2 b_2 b_2 b_3 b_3 b_3 b_4 b_4 b_4 b_5 b_5 b_5 b_6 b_6
## [18] b_6
## Levels: b_1 b_2 b_3 b_4 b_5 b_6
#Alternative: extract the biological repeat based on the mass spec run names.
# biorep <- as.factor(paste0("b_",factor(as.numeric(factor(paste0(substr(runs, 1,1),substr(runs, n-2,n-2))))+c(rep(0,9),rep(3,9)))))

Each of the 18 technical repeats are here represented by the 18 mass spec runs.

Create an experimental annotation data frame.

exp_annotation <- data.frame(run=runs, genotype=genotype, biorep=biorep)
exp_annotation
##                run genotype biorep
## 1  X1WT_20_2h_n3_1       WT    b_1
## 2  X1WT_20_2h_n3_2       WT    b_1
## 3  X1WT_20_2h_n3_3       WT    b_1
## 4  X1WT_20_2h_n4_1       WT    b_2
## 5  X1WT_20_2h_n4_2       WT    b_2
## 6  X1WT_20_2h_n4_3       WT    b_2
## 7  X1WT_20_2h_n5_1       WT    b_3
## 8  X1WT_20_2h_n5_2       WT    b_3
## 9  X1WT_20_2h_n5_3       WT    b_3
## 10 X3D8_20_2h_n3_1       KO    b_4
## 11 X3D8_20_2h_n3_2       KO    b_4
## 12 X3D8_20_2h_n3_3       KO    b_4
## 13 X3D8_20_2h_n4_1       KO    b_5
## 14 X3D8_20_2h_n4_2       KO    b_5
## 15 X3D8_20_2h_n4_3       KO    b_5
## 16 X3D8_20_2h_n5_1       KO    b_6
## 17 X3D8_20_2h_n5_2       KO    b_6
## 18 X3D8_20_2h_n5_3       KO    b_6

Note that you can also import the experimental annotation from a tab-delimited or Excel file. As an example, we show you here how you can import data from an experimental annotation file delivered with our package. First, provide the path to where the data is saved.

file_exp_annotation <- system.file("extdata/Francisella", "label-free_Francisella_annotation.xlsx", package = "MSqRobData")
file_exp_annotation
## [1] "/Library/Frameworks/R.framework/Versions/3.5/Resources/library/MSqRobData/extdata/Francisella/label-free_Francisella_annotation.xlsx"

Then, import the data using the read.xlsx function from the openxlsx package and convert the columns of the data frame to factors.

#Import Excel file
exp_annotation_xlsx <- openxlsx::read.xlsx(file_exp_annotation)
#Convert columns of data frame to factors
exp_annotation_xlsx <- as.data.frame(unclass(exp_annotation_xlsx))
exp_annotation_xlsx
##               run genotype biorep
## 1  1WT_20_2h_n3_1       WT    b_1
## 2  1WT_20_2h_n3_2       WT    b_1
## 3  1WT_20_2h_n3_3       WT    b_1
## 4  1WT_20_2h_n4_1       WT    b_2
## 5  1WT_20_2h_n4_2       WT    b_2
## 6  1WT_20_2h_n4_3       WT    b_2
## 7  1WT_20_2h_n5_1       WT    b_3
## 8  1WT_20_2h_n5_2       WT    b_3
## 9  1WT_20_2h_n5_3       WT    b_3
## 10 3D8_20_2h_n3_1       KO    b_4
## 11 3D8_20_2h_n3_2       KO    b_4
## 12 3D8_20_2h_n3_3       KO    b_4
## 13 3D8_20_2h_n4_1       KO    b_5
## 14 3D8_20_2h_n4_2       KO    b_5
## 15 3D8_20_2h_n4_3       KO    b_5
## 16 3D8_20_2h_n5_1       KO    b_6
## 17 3D8_20_2h_n5_2       KO    b_6
## 18 3D8_20_2h_n5_3       KO    b_6

Note that here, there is no “X” prepended to the run names. However, names will be changed automatically to syntactically valid names by MSqRob during preprocessing.

2.2. Preprocess the data using the preprocess_MaxQuant function.

The identification of peptides that carry one or more chemical modifications is often not as reliable as the identification of unmodified peptides. Therefore, a common step during the preprocessing of proteomics data consists of removing the proteins of which all identified peptides have one or more modification sites. If we want to remove these proteins that are only identified by modified peptides from the dataset, we also need the proteinGroups.txt file. This file can be found in the same location as the peptides.txt file (“path_to_raw_files/combined/txt/proteinGroups.txt”). Here, we make use of the proteinGroups.txt file that is delivered with the package.

Note: if you don’t want to remove proteins that are only identified by modified peptides, set the remove_only_site argument of the preprocess_MaxQuant function to FALSE and leave the file_proteinGroups argument at its default value NULL.

Define the path to the proteinGroups.txt file.

file_proteinGroups <- system.file("extdata/Francisella", "proteinGroups.txt", package = "MSqRobData")
file_proteinGroups
## [1] "/Library/Frameworks/R.framework/Versions/3.5/Resources/library/MSqRobData/extdata/Francisella/proteinGroups.txt"

Add to the useful_properties argument all column names of the fData slot of the peptidesFranc MSnSet object that you would like to keep (such as gene names, protein names, ontologies, etc.). Other columns will be removed for improved efficiency and memory usage. Here, we would like to keep the GI number and the protein names. These 2 columns are present in our peptides.txt file, but note that you can use basic R manipulations to add other columns to the fData slot if you want to. We certainly want to keep the “Sequence” slot, as peptide-specific effects must be included in our model.

peptidesFranc2 <- preprocess_MaxQuant(peptidesFranc, accession="Proteins", exp_annotation=exp_annotation, logtransform=TRUE, base=2, normalisation="quantiles", smallestUniqueGroups=TRUE, useful_properties=c("GI number","Protein.names","Sequence"), filter=c("Contaminant","Reverse"), remove_only_site=TRUE, file_proteinGroups=file_proteinGroups,  filter_symbol="+", minIdentified=2)

preprocess_MaxQuant internally executes the following preprocessing steps in this order:

  1. Peptide intensities are (log2-)transformed (logtransform=TRUE, base=2).

  2. As a normalization approach, we choose to quantile normalize the peptide intensities (normalisation="quantiles"). Other options are "sum", "max", "center.mean", "center.median", "quantiles.robust", "vsn" or "none" (see MSnbase::normalize for more information).

  3. Handling overlapping protein groups (smallestUniqueGroups=TRUE): in our approach a peptide can map to multiple accessions (accession="Proteins"), as long as there is none of these proteins present in a smaller subgroup. Peptides that map to protein groups for wich also subgroups are present in the dataset are removed from the dataset. (Proteins in a protein group are separated by a ;).

  4. Remove contaminants and reverse sequences (filter=c("Contaminant","Reverse"), filter_symbol="+"): each row with the filter symbol (“+” in our case) in the columns “Contaminant” and “Reverse” will be removed from the data. Note that the “Contaminant” column in older MaxQuant peptides.txt files can also be called “Potential.contaminant”.

  5. Remove all proteins that are only identified by peptides carrying a modification site (remove_only_site=TRUE, file_proteinGroups=file_proteinGroups).

  6. Remove all superfluous columns in the fData slot, except the “Proteins”, “GI.number”, “Protein.names” and “Sequence” columns (useful_properties=c("GI.number","Protein.names","Sequence")). Note that the “Proteins” column does not have to be provided: the accession column is always retained.

  7. Remove all peptides that are only identified once in the dataset (minIdentified=2).

  8. Add experimental annotation (exp_annotation=exp_annotation).

We can now make density plots to inspect the effect of the preprocessing on the distribution of our log2-transformed peptide intensities.

3. Make diagnostic plots

Here we take a closer look at how you can make the diagnostic plots from the Shiny app. We will plot both the density plots and the MDS plot.

#Make sure the order of the rows in your exp_annotation is the same as the columns of the exprs slot in your peptides object.
all(exp_annotation[,"run"]==colnames(exprs(peptidesFranc)))
## [1] TRUE
#Color by biological repeat
colors <- as.numeric(exp_annotation[,"biorep"])

### Density plot before preprocessing ###

densAll <- apply(log2(exprs(peptidesFranc)),2,density,na.rm=TRUE)

#Automatically determine the plot window size
ymax=max(vapply(densAll,function(d) max(d$y),1))
rangematrix <- vapply(densAll,function(d) range(d$x, na.rm=TRUE), c(1,1)) #no longer
xlim=range(rangematrix,na.rm=TRUE)
ylim=c(0,ymax)

plot(densAll[[1]],col=colors[1],xlim=xlim,ylim=ylim, las=1, frame.plot=FALSE, main="Density before preprocessing")
for (i in 2:length(densAll)) lines(densAll[[i]],col=colors[i])

######

### Density plot after preprocessing ###

densAll <- apply(log2(exprs(peptidesFranc2)),2,density,na.rm=TRUE)

#Automatically determine the plot window size
ymax=max(vapply(densAll,function(d) max(d$y),1))
rangematrix <- vapply(densAll,function(d) range(d$x, na.rm=TRUE), c(1,1)) #no longer
xlim=range(rangematrix,na.rm=TRUE)
ylim=c(0,ymax)

plot(densAll[[1]],col=colors[1],xlim=xlim,ylim=ylim, las=1, frame.plot=FALSE, main="Density after preprocessing")
for (i in 2:length(densAll)) lines(densAll[[i]],col=colors[i])

######

We can also construct an MDS plot. This plot will cluster similar runs together.

plotMDS(exprs(peptidesFranc2), col=colors, las=1)

4. Convert the MSnSet object peptidesFranc2 into a protdata object proteinsFranc.

The accession argument denotes by which column the individual peptide observations should be grouped. The annotations argument indicates which columns in peptidesFranc2 are annotations linked to the accessions (in our case: the GI number and the protein names). These annotations will be added to a separate annotation slot.

proteinsFranc <- MSnSet2protdata(peptidesFranc2, accession="Proteins", annotations = c("GI.number", "Protein.names"))

Advanced: add information on the design after creating your protdata object

Imagine you forgot a factor in your experimental annotation. Should you redo the whole preprocessing part? No, often, you can still add this factor thanks to the addVarFromVar function. We will now show how we can add an extra (redundant) factor called techrep.

Create a factor for each of the 18 technical replicates (each sample was analyzed in triplicate). This factor is completely redundant, as the “run” column in the experimental annotation already has 18 different levels and is thus already equal to the factor for the technical replicates.

techrep <- as.factor(paste0("t_",1:18))
names(techrep) <- levels(exp_annotation$run)
techrep
## X1WT_20_2h_n3_1 X1WT_20_2h_n3_2 X1WT_20_2h_n3_3 X1WT_20_2h_n4_1 
##             t_1             t_2             t_3             t_4 
## X1WT_20_2h_n4_2 X1WT_20_2h_n4_3 X1WT_20_2h_n5_1 X1WT_20_2h_n5_2 
##             t_5             t_6             t_7             t_8 
## X1WT_20_2h_n5_3 X3D8_20_2h_n3_1 X3D8_20_2h_n3_2 X3D8_20_2h_n3_3 
##             t_9            t_10            t_11            t_12 
## X3D8_20_2h_n4_1 X3D8_20_2h_n4_2 X3D8_20_2h_n4_3 X3D8_20_2h_n5_1 
##            t_13            t_14            t_15            t_16 
## X3D8_20_2h_n5_2 X3D8_20_2h_n5_3 
##            t_17            t_18 
## 18 Levels: t_1 t_10 t_11 t_12 t_13 t_14 t_15 t_16 t_17 t_18 t_2 ... t_9
proteinsFranc2 <- addVarFromVar(proteinsFranc, basecol="run", name="techrep", vector=techrep)
proteinsFranc2
## @accession 
## 
## [1] "WP_003038655"
## 
##  @annotation 
## 
##                                       GI.number 
##                                  "gi|118497196" 
##                                   Protein.names 
## "hypothetical protein [Francisella tularensis]" 
## 
##  @data 
## 
##     quant_value           Sequence             run genotype biorep techrep
## 1.1    25.11978       AVVVFFDYGCGK X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.2    25.34586 DLLSDPATPTVGPQDANK X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.3    24.11491   ETNGELTVQDVDNVVK X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.4    26.50674          IIAIPDIVK X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.5    24.69046     QTNVITDNYQQAAK X1WT_20_2h_n3_1       WT    b_1     t_1
## 2.1    24.75196       AVVVFFDYGCGK X1WT_20_2h_n3_2       WT    b_1     t_2
## 
##  96 more rows... 
## 
## @accession 
## 
## [1] "WP_003041237"
## 
##  @annotation 
## 
##                                            GI.number 
##                                       "gi|118498194" 
##                                        Protein.names 
## "phosphoglucosamine mutase [Francisella tularensis]" 
## 
##  @data 
## 
##     quant_value         Sequence             run genotype biorep techrep
## 1.1    25.81732    ADLGISLDGDADR X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.2    27.36716        FVIVGQDTR X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.3    24.91571  GEVANSTITAEFTQK X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.4    27.38539 ILANAIDEYIESIHSR X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.6    27.93398    SLATNEAEYLVEK X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.7    26.59163       VASDVNDVEK X1WT_20_2h_n3_1       WT    b_1     t_1
## 
##  187 more rows... 
## 
## @accession 
## 
## [1] "WP_003038915"
## 
##  @annotation 
## 
##                                                    GI.number 
##                                               "gi|118497331" 
##                                                Protein.names 
## "glycine--tRNA ligase subunit beta [Francisella tularensis]" 
## 
##  @data 
## 
##     quant_value                Sequence             run genotype biorep
## 1      24.37301 AAANEYELALAYSIEEVAPDLHK X1WT_20_2h_n3_1       WT    b_1
## 1.3    22.60222   EEGIAVDIFEAINNTNYDSIK X1WT_20_2h_n3_1       WT    b_1
## 1.4    25.34002           EGEPTQVGLGFAK X1WT_20_2h_n3_1       WT    b_1
## 1.5    25.13163          EYSYALELLTCLDK X1WT_20_2h_n3_1       WT    b_1
## 1.6    24.09977       FHHPQAIVIDNISDYIK X1WT_20_2h_n3_1       WT    b_1
## 1.8    25.75712              IIDITLESYK X1WT_20_2h_n3_1       WT    b_1
##     techrep
## 1       t_1
## 1.3     t_1
## 1.4     t_1
## 1.5     t_1
## 1.6     t_1
## 1.8     t_1
## 
##  215 more rows... 
## 
## @accession 
## 
## [1] "WP_003038264"
## 
##  @annotation 
## 
##                                           GI.number 
##                                      "gi|118496879" 
##                                       Protein.names 
## "molecular chaperone HtpG [Francisella tularensis]" 
## 
##  @data 
## 
##     quant_value        Sequence             run genotype biorep techrep
## 1      27.92076    AAANNPQLEAFK X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.2    26.48741 EGQDTIYYITSDSYK X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.4    27.33299    EHLDLLEYHVLK X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.5    26.37656         EILQHNK X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.6    28.11488        ELLPPYLR X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.7    28.77850    ELVSNSSDAIEK X1WT_20_2h_n3_1       WT    b_1     t_1
## 
##  338 more rows... 
## 
## @accession 
## 
## [1] "WP_003035781"
## 
##  @annotation 
## 
##                                                        GI.number 
##                                                   "gi|118497152" 
##                                                    Protein.names 
## "delta-aminolevulinic acid dehydratase [Francisella tularensis]" 
## 
##  @data 
## 
##     quant_value        Sequence             run genotype biorep techrep
## 1      27.47327     AAASAGLVDEK X1WT_20_2h_n3_1       WT    b_1     t_1
## 1.1    24.05872 VILTYTALDVANWLK X1WT_20_2h_n3_1       WT    b_1     t_1
## 2.1    23.88890 VILTYTALDVANWLK X1WT_20_2h_n3_2       WT    b_1     t_2
## 2.2    25.37749      YASAYYGPFR X1WT_20_2h_n3_2       WT    b_1     t_2
## 3      27.47269     AAASAGLVDEK X1WT_20_2h_n3_3       WT    b_1     t_3
## 3.1    24.93174 VILTYTALDVANWLK X1WT_20_2h_n3_3       WT    b_1     t_3
## 
##  41 more rows... 
## 
## @accession 
## 
## [1] "WP_003039212"
## 
##  @annotation 
## 
##                                GI.number 
##                           "gi|118497492" 
##                            Protein.names 
## "phosphorylase [Francisella tularensis]" 
## 
##  @data 
## 
##     quant_value             Sequence             run genotype biorep
## 1.2    22.96510 FPAGLNDVEFVAEHIFQHSK X1WT_20_2h_n3_1       WT    b_1
## 1.3    25.10257           LQDLLTFIGK X1WT_20_2h_n3_1       WT    b_1
## 1.6    24.55598             VVFSVDYR X1WT_20_2h_n3_1       WT    b_1
## 2      24.53369         AAASLDLYSYPK X1WT_20_2h_n3_2       WT    b_1
## 2.1    24.68029              EVLFLNR X1WT_20_2h_n3_2       WT    b_1
## 2.2    23.16038 FPAGLNDVEFVAEHIFQHSK X1WT_20_2h_n3_2       WT    b_1
##     techrep
## 1.2     t_1
## 1.3     t_1
## 1.6     t_1
## 2       t_2
## 2.1     t_2
## 2.2     t_2
## 
##  56 more rows... 
## 
## 1182 more elements...

5. Inspect the data for a specific protein via a detailed protein plot

Here, we illustrate how you can reproduce the detail plots from the Shiny app. We will take a closer look at the protein “WP_003039212”.

protein_name <- "WP_003039212"
data <- getData(proteinsFranc[protein_name])

#We plot the peptide sequences as different point symbols
points <- as.numeric(droplevels(as.factor(data$Sequence)))
#We don't want to use repeated plot symbols or invisible symbols
pch_vals <- c(0:25,32:127)
#Repeat pch_vals if there would be more than 122 unique levels
pch_vals <- rep_len(pch_vals, length(unique(points)))
pch_vals <- pch_vals[points]

#We color by mass spec run
colors <- grDevices::colorRampPalette(RColorBrewer::brewer.pal(8,"Spectral"))(length(unique(droplevels(as.factor(data$run)))))
colors <- colors[as.numeric(droplevels(as.factor(data$run)))]

#We make a different boxplot for each genotype
boxplot(data$quant_value~as.factor(data$genotype), outline=FALSE, ylim=c(min(data$quant_value)-0.2,max(data$quant_value)+0.2), ylab="preprocessed peptide intensity", xlab="", main=paste0("Detail plot for protein ", protein_name), las=2, frame.plot=FALSE, frame=FALSE, col="grey", pars=list(boxcol="white")) 

#Add the data as separate points     
points(jitter((as.numeric(data$genotype)), factor=2),data$quant_value, col=colors, pch=pch_vals)

6. Fit the model

Fit the robust ridge models for each protein. The fit.model function will fit a model to each protein in the protdata object proteinsFranc. The default setting reg="ridge" indicates that ridge regression should be used. The default setting weights="Huber" indicates that an M estimation algorithm should be used that assigns Huber weights to each peptide observation based on their residuals.

Caution: fitting the models takes some time (less than 15 minutes for the Francisella tularensis dataset on our system).

system.time(protLMFranc <- fit.model(proteinsFranc, response="quant_value", fixed=c("genotype"),  random=c("biorep","run","Sequence"), add.intercept=TRUE, weights="Huber"))
##    user  system elapsed 
## 384.990   1.433 396.099

The system.time function was only added to output the duration of the model fitting. If you are not interested in this, you can of course omit it.

5.1. Inspect the model

protLMFranc is a protLM object.

class(protLMFranc)
## [1] "protLM"
## attr(,"package")
## [1] "MSqRob"

Printing out a protLM object returns an overview of the first 6 accessions with their annotations and their corresponding models.

protLMFranc
## @accession 
## 
## [1] "WP_003038655"
## 
##  @annotation 
## 
##                                       GI.number 
##                                  "gi|118497196" 
##                                   Protein.names 
## "hypothetical protein [Francisella tularensis]" 
## 
##  @model 
## 
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 66.4301
## Random effects:
##  Groups       Name        Std.Dev.
##  run          (Intercept) 0.0000  
##  Sequence     (Intercept) 0.6682  
##  biorep       (Intercept) 0.2247  
##  ridgeGroup.1 (Intercept) 1.0983  
##  Residual                 0.2524  
## Number of obs: 102, groups:  
## run, 18; Sequence, 7; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##             -260  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## 
## @accession 
## 
## [1] "WP_003041237"
## 
##  @annotation 
## 
##                                            GI.number 
##                                       "gi|118498194" 
##                                        Protein.names 
## "phosphoglucosamine mutase [Francisella tularensis]" 
## 
##  @model 
## 
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 136.4625
## Random effects:
##  Groups       Name        Std.Dev.
##  run          (Intercept) 0.00000 
##  Sequence     (Intercept) 1.28720 
##  biorep       (Intercept) 0.09322 
##  ridgeGroup.1 (Intercept) 0.00000 
##  Residual                 0.27463 
## Number of obs: 193, groups:  
## run, 18; Sequence, 12; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -370.9  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## 
## @accession 
## 
## [1] "WP_003038915"
## 
##  @annotation 
## 
##                                                    GI.number 
##                                               "gi|118497331" 
##                                                Protein.names 
## "glycine--tRNA ligase subunit beta [Francisella tularensis]" 
## 
##  @model 
## 
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 340.3599
## Random effects:
##  Groups       Name        Std.Dev.
##  Sequence     (Intercept) 1.07373 
##  run          (Intercept) 0.00000 
##  biorep       (Intercept) 0.05502 
##  ridgeGroup.1 (Intercept) 2.11989 
##  Residual                 0.40125 
## Number of obs: 221, groups:  
## Sequence, 18; run, 18; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -365.9  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## 
## @accession 
## 
## [1] "WP_003038264"
## 
##  @annotation 
## 
##                                           GI.number 
##                                      "gi|118496879" 
##                                       Protein.names 
## "molecular chaperone HtpG [Francisella tularensis]" 
## 
##  @model 
## 
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 378.4025
## Random effects:
##  Groups       Name        Std.Dev.
##  Sequence     (Intercept) 1.02426 
##  run          (Intercept) 0.09019 
##  biorep       (Intercept) 0.11985 
##  ridgeGroup.1 (Intercept) 0.38438 
##  Residual                 0.31793 
## Number of obs: 344, groups:  
## Sequence, 23; run, 18; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -489.8  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## 
## @accession 
## 
## [1] "WP_003035781"
## 
##  @annotation 
## 
##                                                        GI.number 
##                                                   "gi|118497152" 
##                                                    Protein.names 
## "delta-aminolevulinic acid dehydratase [Francisella tularensis]" 
## 
##  @model 
## 
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 57.1314
## Random effects:
##  Groups       Name        Std.Dev.
##  run          (Intercept) 0.0000  
##  biorep       (Intercept) 0.2059  
##  Sequence     (Intercept) 1.9048  
##  ridgeGroup.1 (Intercept) 0.5694  
##  Residual                 0.3522  
## Number of obs: 47, groups:  
## run, 18; biorep, 6; Sequence, 3; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -173.3  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## 
## @accession 
## 
## [1] "WP_003039212"
## 
##  @annotation 
## 
##                                GI.number 
##                           "gi|118497492" 
##                            Protein.names 
## "phosphorylase [Francisella tularensis]" 
## 
##  @model 
## 
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 9.2867
## Random effects:
##  Groups       Name        Std.Dev.
##  run          (Intercept) 0.0000  
##  Sequence     (Intercept) 0.6635  
##  biorep       (Intercept) 0.0000  
##  ridgeGroup.1 (Intercept) 0.3159  
##  Residual                 0.2026  
## Number of obs: 62, groups:  
## run, 18; Sequence, 7; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -192.6  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## 
## 1182 more elements...

The numbers in the Std.Dev. column give the estimated standard deviations of the random effects. Effects starting with ridgeGroup indicate shrunken fixed effects (we make use of the random effects framework of the lm4 package in order to estimate Ridge penalties). However, as the fixed effects in the ridgeGroups are orthogonalized using the Gramm-Schmidt procedure, these numbers are not very informative.

A protLM object has three slots:

  1. an accession slot: a vector containing all the accessions
  2. a model slot: a list of all fitted models (one for each accession)
  3. an annotation slot: data frame of which each row (one for each accession) corresponds to the annotations of a different accession.

These slots can be accessed using the getAccessions, getModels and getAnnotations functions respectively. Here we give the accessions, models and annotations for the first 5 models.

getAccessions(protLMFranc)[1:5]
## [1] WP_003038655 WP_003041237 WP_003038915 WP_003038264 WP_003035781
## 1188 Levels: WP_003013731 WP_003013860 WP_003013909 ... WP_011733734
getModels(protLMFranc)[1:5]
## [[1]]
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 66.4301
## Random effects:
##  Groups       Name        Std.Dev.
##  run          (Intercept) 0.0000  
##  Sequence     (Intercept) 0.6682  
##  biorep       (Intercept) 0.2247  
##  ridgeGroup.1 (Intercept) 1.0983  
##  Residual                 0.2524  
## Number of obs: 102, groups:  
## run, 18; Sequence, 7; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##             -260  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## 
## [[2]]
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 136.4625
## Random effects:
##  Groups       Name        Std.Dev.
##  run          (Intercept) 0.00000 
##  Sequence     (Intercept) 1.28720 
##  biorep       (Intercept) 0.09322 
##  ridgeGroup.1 (Intercept) 0.00000 
##  Residual                 0.27463 
## Number of obs: 193, groups:  
## run, 18; Sequence, 12; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -370.9  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## 
## [[3]]
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 340.3599
## Random effects:
##  Groups       Name        Std.Dev.
##  Sequence     (Intercept) 1.07373 
##  run          (Intercept) 0.00000 
##  biorep       (Intercept) 0.05502 
##  ridgeGroup.1 (Intercept) 2.11989 
##  Residual                 0.40125 
## Number of obs: 221, groups:  
## Sequence, 18; run, 18; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -365.9  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## 
## [[4]]
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 378.4025
## Random effects:
##  Groups       Name        Std.Dev.
##  Sequence     (Intercept) 1.02426 
##  run          (Intercept) 0.09019 
##  biorep       (Intercept) 0.11985 
##  ridgeGroup.1 (Intercept) 0.38438 
##  Residual                 0.31793 
## Number of obs: 344, groups:  
## Sequence, 23; run, 18; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -489.8  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## 
## [[5]]
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 57.1314
## Random effects:
##  Groups       Name        Std.Dev.
##  run          (Intercept) 0.0000  
##  biorep       (Intercept) 0.2059  
##  Sequence     (Intercept) 1.9048  
##  ridgeGroup.1 (Intercept) 0.5694  
##  Residual                 0.3522  
## Number of obs: 47, groups:  
## run, 18; biorep, 6; Sequence, 3; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -173.3  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
getAnnotations(protLMFranc)[1:5,]
##      GI.number     
## [1,] "gi|118497196"
## [2,] "gi|118498194"
## [3,] "gi|118497331"
## [4,] "gi|118496879"
## [5,] "gi|118497152"
##      Protein.names                                                   
## [1,] "hypothetical protein [Francisella tularensis]"                 
## [2,] "phosphoglucosamine mutase [Francisella tularensis]"            
## [3,] "glycine--tRNA ligase subunit beta [Francisella tularensis]"    
## [4,] "molecular chaperone HtpG [Francisella tularensis]"             
## [5,] "delta-aminolevulinic acid dehydratase [Francisella tularensis]"

protLM objects can be subsetted using either their numeric index or their corresponding accession.

Retain only the part corresponding to model 24 and model 50.

protLMFranc[c(24,50)]
## @accession 
## 
## [1] "WP_003035135"
## 
##  @annotation 
## 
##                                       GI.number 
##                                  "gi|118498204" 
##                                   Protein.names 
## "superoxide dismutase [Francisella tularensis]" 
## 
##  @model 
## 
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 130.6537
## Random effects:
##  Groups       Name        Std.Dev.
##  run          (Intercept) 0.00000 
##  Sequence     (Intercept) 1.48360 
##  biorep       (Intercept) 0.05107 
##  ridgeGroup.1 (Intercept) 1.40062 
##  Residual                 0.29509 
## Number of obs: 148, groups:  
## run, 18; Sequence, 10; biorep, 6; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -344.8  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## 
## @accession 
## 
## [1] "WP_003041024"
## 
##  @annotation 
## 
##                                 GI.number 
##                            "gi|118496740" 
##                             Protein.names 
## "acetate kinase [Francisella tularensis]" 
## 
##  @model 
## 
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 26.4739
## Random effects:
##  Groups       Name        Std.Dev.
##  run          (Intercept) 0.0000  
##  biorep       (Intercept) 0.2982  
##  Sequence     (Intercept) 0.3214  
##  ridgeGroup.1 (Intercept) 0.0000  
##  Residual                 0.2708  
## Number of obs: 36, groups:  
## run, 16; biorep, 6; Sequence, 4; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -147.4  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling

Inspect the protLM object corresponding to accession "WP_003033643".

protLMFranc["WP_003033643"]
## @accession 
## 
## [1] "WP_003033643"
## 
##  @annotation 
## 
##                                  GI.number 
##                             "gi|118497379" 
##                              Protein.names 
## "isochorismatase [Francisella tularensis]" 
## 
##  @model 
## 
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 5.8471
## Random effects:
##  Groups       Name        Std.Dev.
##  run          (Intercept) 0.0000  
##  biorep       (Intercept) 0.1829  
##  Sequence     (Intercept) 1.1305  
##  ridgeGroup.1 (Intercept) 0.3079  
##  Residual                 0.9620  
## Number of obs: 3, groups:  run, 3; biorep, 2; Sequence, 1; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -36.64  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling

Say we want to inspect cell division protein "WP_003040227". We can then select protein "WP_003040227" from the protLM object. We can extract the model via the getModels function. simplify=TRUE is the default setting and indicates that if you select one model, you only want this model and not a list with as only element this model. Note that if you select multiple models, you automatically receive a list of models.

modelWP_003040227 <- getModels(protLMFranc["WP_003040227"], simplify=TRUE)
modelWP_003040227
## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 35.3401
## Random effects:
##  Groups       Name        Std.Dev.
##  run          (Intercept) 0.0000  
##  biorep       (Intercept) 0.5312  
##  Sequence     (Intercept) 0.0000  
##  ridgeGroup.1 (Intercept) 0.8636  
##  Residual                 0.4931  
## Number of obs: 20, groups:  
## run, 15; biorep, 5; Sequence, 2; ridgeGroup.1, 1
## Fixed Effects:
## Intercept_MSqRob  
##           -111.7  
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling

The betaBVcovDf function extracts a list from a model containing the following four elements:

  1. beta: the estimated parameters: both fixed effect sizes and BLUPs (best linear unbiased predictors) for the random effects
  2. vcov: the variance-covariance matrix
  3. df: the residual degrees of freedom
  4. sigma: the residual standard deviation
  5. df_exp: if the number of experimental units would be specified: a more conservative estimate for the degrees of freedom based on the number of experimental units minus the degrees of freedom lost due to between-treatment effects. Otherwise: NULL.
betaBVcovDf <- getBetaVcovDf(modelWP_003040227)
betaBVcovDf$beta
##                              [,1]
## (Intercept)          2.478692e+01
## runX1WT_20_2h_n4_1  -3.268340e-19
## runX1WT_20_2h_n4_2   6.312569e-19
## runX1WT_20_2h_n4_3  -5.514836e-19
## runX1WT_20_2h_n5_1  -2.653938e-19
## runX1WT_20_2h_n5_2   1.300820e-19
## runX1WT_20_2h_n5_3   3.002867e-20
## runX3D8_20_2h_n3_1   6.312569e-19
## runX3D8_20_2h_n3_2   5.345092e-20
## runX3D8_20_2h_n3_3  -7.547956e-20
## runX3D8_20_2h_n4_1   3.677317e-19
## runX3D8_20_2h_n4_2   1.418812e-19
## runX3D8_20_2h_n4_3  -3.821573e-19
## runX3D8_20_2h_n5_1  -3.498709e-19
## runX3D8_20_2h_n5_2  -1.035900e-19
## runX3D8_20_2h_n5_3   6.912076e-20
## biorepb_2           -2.866945e-01
## biorepb_3           -1.221728e-01
## biorepb_4            7.069615e-01
## biorepb_5            1.479023e-01
## biorepb_6           -4.459965e-01
## SequenceMVFTFYNNLK   6.400946e-19
## SequenceYTYIYQIASFR -6.400946e-19
## genotypeKO           2.573084e-01
betaBVcovDf$vcov
##                       (Intercept) runX1WT_20_2h_n4_1 runX1WT_20_2h_n4_2
## (Intercept)          4.644336e-01      -1.133292e-19      -7.590798e-20
## runX1WT_20_2h_n4_1  -1.133292e-19       1.000000e-18       2.081844e-37
## runX1WT_20_2h_n4_2  -7.590798e-20       2.081844e-37       1.000000e-18
## runX1WT_20_2h_n4_3  -1.133292e-19       3.108154e-37       2.081844e-37
## runX1WT_20_2h_n5_1  -1.036390e-19       2.528956e-38       1.693897e-38
## runX1WT_20_2h_n5_2  -1.036390e-19       2.528956e-38       1.693897e-38
## runX1WT_20_2h_n5_3  -1.036390e-19       2.528956e-38       1.693897e-38
## runX3D8_20_2h_n3_1  -1.868609e-20       4.559704e-39       3.054093e-39
## runX3D8_20_2h_n3_2  -4.832685e-20       1.179252e-38       7.898639e-39
## runX3D8_20_2h_n3_3  -4.832685e-20       1.179252e-38       7.898639e-39
## runX3D8_20_2h_n4_1  -2.678135e-20       6.535074e-39       4.377198e-39
## runX3D8_20_2h_n4_2  -5.356270e-20       1.307015e-38       8.754397e-39
## runX3D8_20_2h_n4_3  -5.356270e-20       1.307015e-38       8.754397e-39
## runX3D8_20_2h_n5_1  -4.575672e-20       1.116537e-38       7.478572e-39
## runX3D8_20_2h_n5_2  -4.575672e-20       1.116537e-38       7.478572e-39
## runX3D8_20_2h_n5_3  -4.575672e-20       1.116537e-38       7.478572e-39
## biorepb_2           -3.511045e-01      -1.974862e-19      -1.322764e-19
## biorepb_3           -3.607947e-01       8.803963e-20       5.896902e-20
## biorepb_4           -1.338428e-01       3.265976e-20       2.187553e-20
## biorepb_5           -1.553883e-01       3.791720e-20       2.539697e-20
## biorepb_6           -1.592912e-01       3.886959e-20       2.603489e-20
## SequenceMVFTFYNNLK  -5.668853e-19      -1.056869e-37      -7.078916e-38
## SequenceYTYIYQIASFR -4.331147e-19       1.056869e-37       7.078916e-38
## genotypeKO          -2.822640e-01       6.887691e-20       4.613381e-20
##                     runX1WT_20_2h_n4_3 runX1WT_20_2h_n5_1
## (Intercept)              -1.133292e-19      -1.036390e-19
## runX1WT_20_2h_n4_1        3.108154e-37       2.528956e-38
## runX1WT_20_2h_n4_2        2.081844e-37       1.693897e-38
## runX1WT_20_2h_n4_3        1.000000e-18       2.528956e-38
## runX1WT_20_2h_n5_1        2.528956e-38       1.000000e-18
## runX1WT_20_2h_n5_2        2.528956e-38       2.820767e-37
## runX1WT_20_2h_n5_3        2.528956e-38       2.820767e-37
## runX3D8_20_2h_n3_1        4.559704e-39       4.169826e-39
## runX3D8_20_2h_n3_2        1.179252e-38       1.078420e-38
## runX3D8_20_2h_n3_3        1.179252e-38       1.078420e-38
## runX3D8_20_2h_n4_1        6.535074e-39       5.976293e-39
## runX3D8_20_2h_n4_2        1.307015e-38       1.195259e-38
## runX3D8_20_2h_n4_3        1.307015e-38       1.195259e-38
## runX3D8_20_2h_n5_1        1.116537e-38       1.021067e-38
## runX3D8_20_2h_n5_2        1.116537e-38       1.021067e-38
## runX3D8_20_2h_n5_3        1.116537e-38       1.021067e-38
## biorepb_2                -1.974862e-19       7.834942e-20
## biorepb_3                 8.803963e-20      -1.784378e-19
## biorepb_4                 3.265976e-20       2.986719e-20
## biorepb_5                 3.791720e-20       3.467509e-20
## biorepb_6                 3.886959e-20       3.554605e-20
## SequenceMVFTFYNNLK       -1.056869e-37       1.265012e-37
## SequenceYTYIYQIASFR       1.056869e-37      -1.265012e-37
## genotypeKO                6.887691e-20       6.298759e-20
##                     runX1WT_20_2h_n5_2 runX1WT_20_2h_n5_3
## (Intercept)              -1.036390e-19      -1.036390e-19
## runX1WT_20_2h_n4_1        2.528956e-38       2.528956e-38
## runX1WT_20_2h_n4_2        1.693897e-38       1.693897e-38
## runX1WT_20_2h_n4_3        2.528956e-38       2.528956e-38
## runX1WT_20_2h_n5_1        2.820767e-37       2.820767e-37
## runX1WT_20_2h_n5_2        1.000000e-18       2.820767e-37
## runX1WT_20_2h_n5_3        2.820767e-37       1.000000e-18
## runX3D8_20_2h_n3_1        4.169826e-39       4.169826e-39
## runX3D8_20_2h_n3_2        1.078420e-38       1.078420e-38
## runX3D8_20_2h_n3_3        1.078420e-38       1.078420e-38
## runX3D8_20_2h_n4_1        5.976293e-39       5.976293e-39
## runX3D8_20_2h_n4_2        1.195259e-38       1.195259e-38
## runX3D8_20_2h_n4_3        1.195259e-38       1.195259e-38
## runX3D8_20_2h_n5_1        1.021067e-38       1.021067e-38
## runX3D8_20_2h_n5_2        1.021067e-38       1.021067e-38
## runX3D8_20_2h_n5_3        1.021067e-38       1.021067e-38
## biorepb_2                 7.834942e-20       7.834942e-20
## biorepb_3                -1.784378e-19      -1.784378e-19
## biorepb_4                 2.986719e-20       2.986719e-20
## biorepb_5                 3.467509e-20       3.467509e-20
## biorepb_6                 3.554605e-20       3.554605e-20
## SequenceMVFTFYNNLK        1.265012e-37       1.265012e-37
## SequenceYTYIYQIASFR      -1.265012e-37      -1.265012e-37
## genotypeKO                6.298759e-20       6.298759e-20
##                     runX3D8_20_2h_n3_1 runX3D8_20_2h_n3_2
## (Intercept)              -1.868609e-20      -4.832685e-20
## runX1WT_20_2h_n4_1        4.559704e-39       1.179252e-38
## runX1WT_20_2h_n4_2        3.054093e-39       7.898639e-39
## runX1WT_20_2h_n4_3        4.559704e-39       1.179252e-38
## runX1WT_20_2h_n5_1        4.169826e-39       1.078420e-38
## runX1WT_20_2h_n5_2        4.169826e-39       1.078420e-38
## runX1WT_20_2h_n5_3        4.169826e-39       1.078420e-38
## runX3D8_20_2h_n3_1        1.000000e-18       1.287695e-37
## runX3D8_20_2h_n3_2        1.287695e-37       1.000000e-18
## runX3D8_20_2h_n3_3        1.287695e-37       3.330298e-37
## runX3D8_20_2h_n4_1        5.397077e-39       1.395817e-38
## runX3D8_20_2h_n4_2        1.079415e-38       2.791634e-38
## runX3D8_20_2h_n4_3        1.079415e-38       2.791634e-38
## runX3D8_20_2h_n5_1        9.221064e-39       2.384795e-38
## runX3D8_20_2h_n5_2        9.221064e-39       2.384795e-38
## runX3D8_20_2h_n5_3        9.221064e-39       2.384795e-38
## biorepb_2                 1.412639e-20       3.653433e-20
## biorepb_3                 1.451627e-20       3.754265e-20
## biorepb_4                -9.205803e-20      -2.380848e-19
## biorepb_5                 3.131442e-20       8.098680e-20
## biorepb_6                 3.210096e-20       8.302100e-20
## SequenceMVFTFYNNLK       -3.713523e-38      -9.604086e-38
## SequenceYTYIYQIASFR       3.713523e-38       9.604086e-38
## genotypeKO               -1.802540e-20      -4.661813e-20
##                     runX3D8_20_2h_n3_3 runX3D8_20_2h_n4_1
## (Intercept)              -4.832685e-20      -2.678135e-20
## runX1WT_20_2h_n4_1        1.179252e-38       6.535074e-39
## runX1WT_20_2h_n4_2        7.898639e-39       4.377198e-39
## runX1WT_20_2h_n4_3        1.179252e-38       6.535074e-39
## runX1WT_20_2h_n5_1        1.078420e-38       5.976293e-39
## runX1WT_20_2h_n5_2        1.078420e-38       5.976293e-39
## runX1WT_20_2h_n5_3        1.078420e-38       5.976293e-39
## runX3D8_20_2h_n3_1        1.287695e-37       5.397077e-39
## runX3D8_20_2h_n3_2        3.330298e-37       1.395817e-38
## runX3D8_20_2h_n3_3        1.000000e-18       1.395817e-38
## runX3D8_20_2h_n4_1        1.395817e-38       1.000000e-18
## runX3D8_20_2h_n4_2        2.791634e-38       3.566651e-37
## runX3D8_20_2h_n4_3        2.791634e-38       3.566651e-37
## runX3D8_20_2h_n5_1        2.384795e-38       1.321585e-38
## runX3D8_20_2h_n5_2        2.384795e-38       1.321585e-38
## runX3D8_20_2h_n5_3        2.384795e-38       1.321585e-38
## biorepb_2                 3.653433e-20       2.024628e-20
## biorepb_3                 3.754265e-20       2.080506e-20
## biorepb_4                -2.380848e-19       3.865760e-20
## biorepb_5                 8.098680e-20      -1.257168e-19
## biorepb_6                 8.302100e-20       4.600785e-20
## SequenceMVFTFYNNLK       -9.604086e-38      -3.944178e-37
## SequenceYTYIYQIASFR       9.604086e-38       3.944178e-37
## genotypeKO               -4.661813e-20      -2.583443e-20
##                     runX3D8_20_2h_n4_2 runX3D8_20_2h_n4_3
## (Intercept)              -5.356270e-20      -5.356270e-20
## runX1WT_20_2h_n4_1        1.307015e-38       1.307015e-38
## runX1WT_20_2h_n4_2        8.754397e-39       8.754397e-39
## runX1WT_20_2h_n4_3        1.307015e-38       1.307015e-38
## runX1WT_20_2h_n5_1        1.195259e-38       1.195259e-38
## runX1WT_20_2h_n5_2        1.195259e-38       1.195259e-38
## runX1WT_20_2h_n5_3        1.195259e-38       1.195259e-38
## runX3D8_20_2h_n3_1        1.079415e-38       1.079415e-38
## runX3D8_20_2h_n3_2        2.791634e-38       2.791634e-38
## runX3D8_20_2h_n3_3        2.791634e-38       2.791634e-38
## runX3D8_20_2h_n4_1        3.566651e-37       3.566651e-37
## runX3D8_20_2h_n4_2        1.000000e-18       7.133303e-37
## runX3D8_20_2h_n4_3        7.133303e-37       1.000000e-18
## runX3D8_20_2h_n5_1        2.643169e-38       2.643169e-38
## runX3D8_20_2h_n5_2        2.643169e-38       2.643169e-38
## runX3D8_20_2h_n5_3        2.643169e-38       2.643169e-38
## biorepb_2                 4.049255e-20       4.049255e-20
## biorepb_3                 4.161011e-20       4.161011e-20
## biorepb_4                 7.731521e-20       7.731521e-20
## biorepb_5                -2.514336e-19      -2.514336e-19
## biorepb_6                 9.201570e-20       9.201570e-20
## SequenceMVFTFYNNLK        2.111644e-37       2.111644e-37
## SequenceYTYIYQIASFR      -2.111644e-37      -2.111644e-37
## genotypeKO               -5.166885e-20      -5.166885e-20
##                     runX3D8_20_2h_n5_1 runX3D8_20_2h_n5_2
## (Intercept)              -4.575672e-20      -4.575672e-20
## runX1WT_20_2h_n4_1        1.116537e-38       1.116537e-38
## runX1WT_20_2h_n4_2        7.478572e-39       7.478572e-39
## runX1WT_20_2h_n4_3        1.116537e-38       1.116537e-38
## runX1WT_20_2h_n5_1        1.021067e-38       1.021067e-38
## runX1WT_20_2h_n5_2        1.021067e-38       1.021067e-38
## runX1WT_20_2h_n5_3        1.021067e-38       1.021067e-38
## runX3D8_20_2h_n3_1        9.221064e-39       9.221064e-39
## runX3D8_20_2h_n3_2        2.384795e-38       2.384795e-38
## runX3D8_20_2h_n3_3        2.384795e-38       2.384795e-38
## runX3D8_20_2h_n4_1        1.321585e-38       1.321585e-38
## runX3D8_20_2h_n4_2        2.643169e-38       2.643169e-38
## runX3D8_20_2h_n4_3        2.643169e-38       2.643169e-38
## runX3D8_20_2h_n5_1        1.000000e-18       6.055208e-37
## runX3D8_20_2h_n5_2        6.055208e-37       1.000000e-18
## runX3D8_20_2h_n5_3        6.055208e-37       6.055208e-37
## biorepb_2                 3.459136e-20       3.459136e-20
## biorepb_3                 3.554605e-20       3.554605e-20
## biorepb_4                 6.604765e-20       6.604765e-20
## biorepb_5                 7.667975e-20       7.667975e-20
## biorepb_6                -2.128648e-19      -2.128648e-19
## SequenceMVFTFYNNLK        3.465504e-38       3.465504e-38
## SequenceYTYIYQIASFR      -3.465504e-38      -3.465504e-38
## genotypeKO               -4.413888e-20      -4.413888e-20
##                     runX3D8_20_2h_n5_3     biorepb_2     biorepb_3
## (Intercept)              -4.575672e-20 -3.511045e-01 -3.607947e-01
## runX1WT_20_2h_n4_1        1.116537e-38 -1.974862e-19  8.803963e-20
## runX1WT_20_2h_n4_2        7.478572e-39 -1.322764e-19  5.896902e-20
## runX1WT_20_2h_n4_3        1.116537e-38 -1.974862e-19  8.803963e-20
## runX1WT_20_2h_n5_1        1.021067e-38  7.834942e-20 -1.784378e-19
## runX1WT_20_2h_n5_2        1.021067e-38  7.834942e-20 -1.784378e-19
## runX1WT_20_2h_n5_3        1.021067e-38  7.834942e-20 -1.784378e-19
## runX3D8_20_2h_n3_1        9.221064e-39  1.412639e-20  1.451627e-20
## runX3D8_20_2h_n3_2        2.384795e-38  3.653433e-20  3.754265e-20
## runX3D8_20_2h_n3_3        2.384795e-38  3.653433e-20  3.754265e-20
## runX3D8_20_2h_n4_1        1.321585e-38  2.024628e-20  2.080506e-20
## runX3D8_20_2h_n4_2        2.643169e-38  4.049255e-20  4.161011e-20
## runX3D8_20_2h_n4_3        2.643169e-38  4.049255e-20  4.161011e-20
## runX3D8_20_2h_n5_1        6.055208e-37  3.459136e-20  3.554605e-20
## runX3D8_20_2h_n5_2        6.055208e-37  3.459136e-20  3.554605e-20
## runX3D8_20_2h_n5_3        1.000000e-18  3.459136e-20  3.554605e-20
## biorepb_2                 3.459136e-20  5.485906e-01  2.727550e-01
## biorepb_3                 3.554605e-20  2.727550e-01  5.392324e-01
## biorepb_4                 6.604765e-20  1.011830e-01  1.039756e-01
## biorepb_5                 7.667975e-20  1.174711e-01  1.207132e-01
## biorepb_6                -2.128648e-19  1.204216e-01  1.237452e-01
## SequenceMVFTFYNNLK        3.465504e-38 -3.274279e-19  4.403841e-19
## SequenceYTYIYQIASFR      -3.465504e-38  3.274279e-19 -4.403841e-19
## genotypeKO               -4.413888e-20  2.133871e-01  2.192765e-01
##                         biorepb_4     biorepb_5     biorepb_6
## (Intercept)         -1.338428e-01 -1.553883e-01 -1.592912e-01
## runX1WT_20_2h_n4_1   3.265976e-20  3.791720e-20  3.886959e-20
## runX1WT_20_2h_n4_2   2.187553e-20  2.539697e-20  2.603489e-20
## runX1WT_20_2h_n4_3   3.265976e-20  3.791720e-20  3.886959e-20
## runX1WT_20_2h_n5_1   2.986719e-20  3.467509e-20  3.554605e-20
## runX1WT_20_2h_n5_2   2.986719e-20  3.467509e-20  3.554605e-20
## runX1WT_20_2h_n5_3   2.986719e-20  3.467509e-20  3.554605e-20
## runX3D8_20_2h_n3_1  -9.205803e-20  3.131442e-20  3.210096e-20
## runX3D8_20_2h_n3_2  -2.380848e-19  8.098680e-20  8.302100e-20
## runX3D8_20_2h_n3_3  -2.380848e-19  8.098680e-20  8.302100e-20
## runX3D8_20_2h_n4_1   3.865760e-20 -1.257168e-19  4.600785e-20
## runX3D8_20_2h_n4_2   7.731521e-20 -2.514336e-19  9.201570e-20
## runX3D8_20_2h_n4_3   7.731521e-20 -2.514336e-19  9.201570e-20
## runX3D8_20_2h_n5_1   6.604765e-20  7.667975e-20 -2.128648e-19
## runX3D8_20_2h_n5_2   6.604765e-20  7.667975e-20 -2.128648e-19
## runX3D8_20_2h_n5_3   6.604765e-20  7.667975e-20 -2.128648e-19
## biorepb_2            1.011830e-01  1.174711e-01  1.204216e-01
## biorepb_3            1.039756e-01  1.207132e-01  1.237452e-01
## biorepb_4            5.010379e-01  2.242956e-01  2.299293e-01
## biorepb_5            2.242956e-01  4.309991e-01  2.669425e-01
## biorepb_6            2.299293e-01  2.669425e-01  4.193827e-01
## SequenceMVFTFYNNLK  -2.659883e-19  3.238866e-20  1.206434e-19
## SequenceYTYIYQIASFR  2.659883e-19 -3.238866e-20 -1.206434e-19
## genotypeKO          -1.291104e-01 -1.498941e-01 -1.536591e-01
##                     SequenceMVFTFYNNLK SequenceYTYIYQIASFR    genotypeKO
## (Intercept)              -5.668853e-19       -4.331147e-19 -2.822640e-01
## runX1WT_20_2h_n4_1       -1.056869e-37        1.056869e-37  6.887691e-20
## runX1WT_20_2h_n4_2       -7.078916e-38        7.078916e-38  4.613381e-20
## runX1WT_20_2h_n4_3       -1.056869e-37        1.056869e-37  6.887691e-20
## runX1WT_20_2h_n5_1        1.265012e-37       -1.265012e-37  6.298759e-20
## runX1WT_20_2h_n5_2        1.265012e-37       -1.265012e-37  6.298759e-20
## runX1WT_20_2h_n5_3        1.265012e-37       -1.265012e-37  6.298759e-20
## runX3D8_20_2h_n3_1       -3.713523e-38        3.713523e-38 -1.802540e-20
## runX3D8_20_2h_n3_2       -9.604086e-38        9.604086e-38 -4.661813e-20
## runX3D8_20_2h_n3_3       -9.604086e-38        9.604086e-38 -4.661813e-20
## runX3D8_20_2h_n4_1       -3.944178e-37        3.944178e-37 -2.583443e-20
## runX3D8_20_2h_n4_2        2.111644e-37       -2.111644e-37 -5.166885e-20
## runX3D8_20_2h_n4_3        2.111644e-37       -2.111644e-37 -5.166885e-20
## runX3D8_20_2h_n5_1        3.465504e-38       -3.465504e-38 -4.413888e-20
## runX3D8_20_2h_n5_2        3.465504e-38       -3.465504e-38 -4.413888e-20
## runX3D8_20_2h_n5_3        3.465504e-38       -3.465504e-38 -4.413888e-20
## biorepb_2                -3.274279e-19        3.274279e-19  2.133871e-01
## biorepb_3                 4.403841e-19       -4.403841e-19  2.192765e-01
## biorepb_4                -2.659883e-19        2.659883e-19 -1.291104e-01
## biorepb_5                 3.238866e-20       -3.238866e-20 -1.498941e-01
## biorepb_6                 1.206434e-19       -1.206434e-19 -1.536591e-01
## SequenceMVFTFYNNLK        1.000000e-18        3.142651e-36 -7.108561e-20
## SequenceYTYIYQIASFR       3.142651e-36        1.000000e-18  7.108561e-20
## genotypeKO               -7.108561e-20        7.108561e-20  4.579926e-01
betaBVcovDf$df
## [1] 18.69393
betaBVcovDf$df_exp
## NULL
betaBVcovDf$sigma
## [1] 0.4618902

The getBetaVcovDfList function can be applied on (a subset of) the protLM object protLMFranc. This returns a list containing one or more lists similar to the betaBVcovDf list.

betaVcovDfList <- getBetaVcovDfList(protLMFranc[1:2])
#Inspect the structure of the list without printing it out compeletely:
str(betaVcovDfList)
## List of 2
##  $ WP_003038655:List of 6
##   ..$ beta   : num [1:33, 1] 2.57e+01 -9.42e-20 1.96e-19 -5.17e-19 -1.58e-19 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:33] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
##   .. .. ..$ : NULL
##   ..$ vcov   : num [1:33, 1:33] 1.23 -7.08e-20 -1.04e-19 -8.46e-20 -9.38e-20 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:33] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
##   .. .. ..$ : chr [1:33] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
##   ..$ df     : num 93.4
##   ..$ sigma  : num 0.252
##   ..$ df_exp : NULL
##   ..$ df_pars: Named num [1:33] 1.00 3.19e-18 3.70e-18 3.48e-18 3.31e-18 ...
##   .. ..- attr(*, "names")= chr [1:33] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
##  $ WP_003041237:List of 6
##   ..$ beta   : num [1:38, 1] 2.67e+01 -6.09e-19 -2.51e-20 3.31e-19 -1.80e-18 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:38] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
##   .. .. ..$ : NULL
##   ..$ vcov   : num [1:38, 1:38] 1.86 -5.44e-20 -6.04e-20 -5.10e-20 -3.68e-20 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:38] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
##   .. .. ..$ : chr [1:38] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...
##   ..$ df     : num 180
##   ..$ sigma  : num 0.274
##   ..$ df_exp : NULL
##   ..$ df_pars: Named num [1:38] 1.00 7.14e-18 7.28e-18 6.98e-18 5.14e-18 ...
##   .. ..- attr(*, "names")= chr [1:38] "(Intercept)" "runX1WT_20_2h_n3_1" "runX1WT_20_2h_n3_2" "runX1WT_20_2h_n3_3" ...

6.2 Save your results

You can save your proteinsFranc and protLMFranc objects to a .RDatas object that can be loaded in R or in the Shiny graphical user interface. If you want your .RDatas object to be loaded into the Shiny App, you should also specify the levelOptions, random and plot2DependentVars objects. These objects are settings that are saved automatically when using the MSqRob Shiny App. In this example, I save the .RDatas object to my desktop.

#Change to eval=TRUE to execute this piece of code.

result_files_new <- list()
result_files_new$proteins <- proteinsFranc
result_files_new$models <- protLMFranc

### This part is only needed if you want to load your .RDatas file in the MSqRob Shiny App ###

#Should contain all contrast options for the fixed effects (see also 5. Statistical inference)
result_files_new$levelOptions <- c("genotypeWT","genotypeKO")
#Should contain all random effects
result_files_new$random <- c("Sequence","biorep","run")
#Should contain a list with all fixed and random effects
result_files_new$plot2DependentVars <- as.list(c("genotype","Sequence","biorep","run"))

#############################################################################################

saves_MSqRob(result_files, file="/Users/lgoeminn/Desktop/project_Francisella.RDatas") #Change file path to your desired save location!

If you want to load your saved file, first specify the folder where it is saved.

#Change to eval=TRUE to execute this piece of code.
result_path <- "/Users/lgoeminn/Desktop/project_Francisella.RDatas"

To inspect the different objects in the file, you can use the inspect_loads_MSqRob function.

#Change to eval=TRUE to execute this piece of code.
inspect_loads_MSqRob(result_path)

To load this file and assign new protdata and protLM object to it, execute the following piece of code.

#Change to eval=TRUE to execute this piece of code.

#To load the "project_Francisella.RDatas" file
result_files <- loads_MSqRob(result_path)

#assign a new protdata object to the saved proteins
proteinsFranc_result <- result_files$proteins
#assign a new protLM object to the saved models
protLMFranc_result <- result_files$models

7. Statistical inference

We are interested in which proteins differ significantly in abundance between ArgP mutant and wild type Francisella tularensis. Via the "MSqRob_levels" attribute, you can check which factor levels are present in each model. Pick a model for which there are no missing levels for the factors for which you want to test contrasts and inspect the names. Note that these names are always the combination of the parameter name and the factor level.

attr(modelWP_003040227,"MSqRob_levels")
##  [1] "(Intercept)"         "genotypeWT"          "genotypeKO"         
##  [4] "runX1WT_20_2h_n4_1"  "runX1WT_20_2h_n4_2"  "runX1WT_20_2h_n4_3" 
##  [7] "runX1WT_20_2h_n5_1"  "runX1WT_20_2h_n5_2"  "runX1WT_20_2h_n5_3" 
## [10] "runX3D8_20_2h_n3_1"  "runX3D8_20_2h_n3_2"  "runX3D8_20_2h_n3_3" 
## [13] "runX3D8_20_2h_n4_1"  "runX3D8_20_2h_n4_2"  "runX3D8_20_2h_n4_3" 
## [16] "runX3D8_20_2h_n5_1"  "runX3D8_20_2h_n5_2"  "runX3D8_20_2h_n5_3" 
## [19] "biorepb_2"           "biorepb_3"           "biorepb_4"          
## [22] "biorepb_5"           "biorepb_6"           "SequenceMVFTFYNNLK" 
## [25] "SequenceYTYIYQIASFR"

Here, we notice that the mutant is encoded as "genotypeKO" and the wild type as "genotypeWT". We construct a contrast matrix L for inferring on the fold change of interest. We are interested in the log2 fold change between mutant and wild type. As all intensities are log2-transformed, we should take the difference between "genotypeKO" and "genotypeWT" (because of the following property of logarithms: log2(a/b)=log2(a)-log2(b)).

L <- makeContrast(contrasts=c("genotypeKO-genotypeWT"),
                  levels=c("genotypeWT","genotypeKO"))
L
##             Contrasts
## Levels       genotypeKO-genotypeWT
##   genotypeWT                    -1
##   genotypeKO                     1

Imagine we would want to test the average log2 protein intensity between mutant and wild type, then L would be constructed as follows:

L2 <- makeContrast(contrasts=c("genotypeKO/2+genotypeWT/2"),
                  levels=c("genotypeWT","genotypeKO"))
L2
##             Contrasts
## Levels       genotypeKO/2+genotypeWT/2
##   genotypeWT                       0.5
##   genotypeKO                       0.5

Important notification: do not use the estimated parameters beta to determine the factor levels! You will notice for example that the level "genotypeWT" is missing in the column names of the betas. This is because "genotypeWT" is the reference class, giving the parameter corresponding to "genotypeKO" the interpretation of the effect of "genotypeKO" relative to "genotypeWT". It is important however to include "genotypeWT" in the contrast matrix because for proteins where factor levels are missing, these parameters can get different interpretations. The makeContrast function takes this into account and will return NA when a contrast cannot be properly estimated.

betaBVcovDf$beta
##                              [,1]
## (Intercept)          2.478692e+01
## runX1WT_20_2h_n4_1  -3.268340e-19
## runX1WT_20_2h_n4_2   6.312569e-19
## runX1WT_20_2h_n4_3  -5.514836e-19
## runX1WT_20_2h_n5_1  -2.653938e-19
## runX1WT_20_2h_n5_2   1.300820e-19
## runX1WT_20_2h_n5_3   3.002867e-20
## runX3D8_20_2h_n3_1   6.312569e-19
## runX3D8_20_2h_n3_2   5.345092e-20
## runX3D8_20_2h_n3_3  -7.547956e-20
## runX3D8_20_2h_n4_1   3.677317e-19
## runX3D8_20_2h_n4_2   1.418812e-19
## runX3D8_20_2h_n4_3  -3.821573e-19
## runX3D8_20_2h_n5_1  -3.498709e-19
## runX3D8_20_2h_n5_2  -1.035900e-19
## runX3D8_20_2h_n5_3   6.912076e-20
## biorepb_2           -2.866945e-01
## biorepb_3           -1.221728e-01
## biorepb_4            7.069615e-01
## biorepb_5            1.479023e-01
## biorepb_6           -4.459965e-01
## SequenceMVFTFYNNLK   6.400946e-19
## SequenceYTYIYQIASFR -6.400946e-19
## genotypeKO           2.573084e-01

Build a data frame containing estimate, standard error, degrees of freedom, T-value and p-value for each protein in the protLM object protLMFranc, by using the test.protLMcontrast function.

resultFranc <- test.protLMcontrast(protLMFranc, L)

The prot.p.adjust function adds a new column "qval" to the data frame based on an existing "pval" column. The new column contains adjusted p-values using one of the methods in the p.adjust.methods vector.

resultFranc <- prot.p.adjust(resultFranc, method="fdr")

The prot.signif function needs a matrix (or a list of matrices) containing a column named “pval” and a column named "qval". The "pval" column will be used to sort the matrix. The function generates a new column "signif" containing 1 for all values of "qval" with a value lower than the specified FDR level (default: 0.05) and 0 otherwise.

resultFranc <- prot.signif(resultFranc, level = 0.05)

Inspect the first 6 significant proteins:

head(resultFranc,6)
##                 GI.number
## WP_003040894 gi|118496676
## WP_003026016 gi|118497407
## WP_003023392 gi|118497654
## WP_003040849 gi|118496659
## WP_003020026 gi|118496882
## WP_003038350 gi|118496950
##                                                                                           Protein.names
## WP_003040894                       3-isopropylmalate dehydratase large subunit [Francisella tularensis]
## WP_003026016                                                   biotin synthase [Francisella tularensis]
## WP_003023392 tRNA (N6-isopentenyl adenosine(37)-C2)-methylthiotransferase MiaB [Francisella tularensis]
## WP_003040849                                              hypothetical protein [Francisella tularensis]
## WP_003020026                                                    CTP synthetase [Francisella tularensis]
## WP_003038350                                              hypothetical protein [Francisella tularensis]
##                estimate         se        df      Tval         pval
## WP_003040894 -0.9057261 0.05555518 121.14412 -16.30318 2.457599e-32
## WP_003026016 -0.9617749 0.07478739 133.04725 -12.86012 4.225036e-25
## WP_003023392 -1.3965407 0.08530562  44.43458 -16.37103 1.941754e-20
## WP_003040849 -1.1356799 0.09816434 101.51013 -11.56917 3.023500e-20
## WP_003020026 -0.5163597 0.04825840 150.49382 -10.69989 3.184831e-20
## WP_003038350  0.8107740 0.06684619  72.40727  12.12895 4.127913e-19
##                      qval signif
## WP_003040894 2.831154e-29   TRUE
## WP_003026016 2.433621e-22   TRUE
## WP_003023392 7.337851e-18   TRUE
## WP_003040849 7.337851e-18   TRUE
## WP_003020026 7.337851e-18   TRUE
## WP_003038350 7.925594e-17   TRUE

Advanced: importing as a data frame

It is also possible to import your data as a simple data frame and make use of some custom preprocessing pipeline. Here, we will show the same preprocessing pipeline for the Francisella dataset as outlined above, but note that adding, removing or modifying steps in the preprocessing pipeline is now very simple as the data is in data frame format.

First, we import the Francisella dataset as a data frame via the base R function read.table.

pepFrancDf <- read.table(file = file_peptides_txt, sep = "\t", header = TRUE, quote="", comment.char = "")

Of course, other filetypes can also be imported. Say for example your input file is a comma separated file stored in My Documents on a Windows computer, you could use for example:

#Change to eval=TRUE to execute this piece of code.
pepFrancDf <- read.table("C:/Users/Ludger/Documents/peptides.csv", sep = ",", dec=".", header = TRUE, quote="", comment.char = "")

The data frame pepFrancD is in wide format.

head(pepFrancDf)
##                  Sequence Amino.acid.before First.amino.acid
## 1               AAAEELDTR                 K                A
## 2           AAAGFVITASHNK                 R                A
## 3 AAANEYELALAYSIEEVAPDLHK                 K                A
## 4            AAANNPQLEAFK                 K                A
## 5             AAASAGLVDEK                 K                A
## 6            AAASLDLYSYPK                 K                A
##   Second.amino.acid Second.last.amino.acid Last.amino.acid
## 1                 A                      T               R
## 2                 A                      N               K
## 3                 A                      H               K
## 4                 A                      F               K
## 5                 A                      E               K
## 6                 A                      P               K
##   Amino.acid.after A.Count R.Count N.Count D.Count C.Count Q.Count E.Count
## 1                K       3       1       0       1       0       0       2
## 2                F       4       0       1       0       0       0       0
## 3                Y       6       0       1       1       0       0       4
## 4                K       4       0       2       0       0       1       1
## 5                A       4       0       0       1       0       0       1
## 6                V       3       0       0       1       0       0       0
##   G.Count H.Count I.Count L.Count K.Count M.Count F.Count P.Count S.Count
## 1       0       0       0       1       0       0       0       0       0
## 2       1       1       1       0       1       0       1       0       1
## 3       0       1       1       3       1       0       0       1       1
## 4       0       0       0       1       1       0       1       1       0
## 5       1       0       0       1       1       0       0       0       1
## 6       0       0       0       2       1       0       0       1       2
##   T.Count W.Count Y.Count V.Count U.Count Length Missed.cleavages
## 1       1       0       0       0       0      9                0
## 2       1       0       0       1       0     13                0
## 3       0       0       2       1       0     23                0
## 4       0       0       0       0       0     12                0
## 5       0       0       0       1       0     11                0
## 6       0       0       2       0       0     12                0
##        Mass     Proteins    GI.number Leading.razor.protein
## 1  974.4669 WP_003038655 gi|118497196          gi|118497196
## 2 1285.6779 WP_003041237 gi|118498194          gi|118498194
## 3 2516.2435 WP_003038915 gi|118497331          gi|118497331
## 4 1272.6463 WP_003038264 gi|118496879          gi|118496879
## 5 1030.5295 WP_003035781 gi|118497152          gi|118497152
## 6 1297.6554 WP_003039212 gi|118497492          gi|118497492
##                                                    Protein.names
## 1                  hypothetical protein [Francisella tularensis]
## 2             phosphoglucosamine mutase [Francisella tularensis]
## 3     glycine--tRNA ligase subunit beta [Francisella tularensis]
## 4              molecular chaperone HtpG [Francisella tularensis]
## 5 delta-aminolevulinic acid dehydratase [Francisella tularensis]
## 6                         phosphorylase [Francisella tularensis]
##   Unique..Groups. Unique..Proteins. Charges        PEP  Score
## 1             yes               yes     1,2 1.2531e-04  59.35
## 2             yes               yes       2 5.3020e-05 45.825
## 3             yes               yes       3 1.0517e-45 84.327
## 4             yes               yes       2 8.7549e-24 108.56
## 5             yes               yes       2 9.5062e-05 67.385
## 6             yes               yes       2 1.2563e-02 31.675
##   Experiment.1WT_20_2h_n3_1 Experiment.1WT_20_2h_n3_2
## 1                        NA                        NA
## 2                        NA                         1
## 3                         1                         1
## 4                         1                         1
## 5                         1                        NA
## 6                        NA                         1
##   Experiment.1WT_20_2h_n3_3 Experiment.1WT_20_2h_n4_1
## 1                         1                         1
## 2                        NA                        NA
## 3                         2                         1
## 4                         1                         1
## 5                         1                         1
## 6                         1                        NA
##   Experiment.1WT_20_2h_n4_2 Experiment.1WT_20_2h_n4_3
## 1                        NA                         1
## 2                        NA                        NA
## 3                         1                         1
## 4                         1                         1
## 5                         1                        NA
## 6                         1                         1
##   Experiment.1WT_20_2h_n5_1 Experiment.1WT_20_2h_n5_2
## 1                        NA                         1
## 2                        NA                         1
## 3                         2                         2
## 4                         1                         1
## 5                         1                         1
## 6                        NA                        NA
##   Experiment.1WT_20_2h_n5_3 Experiment.3D8_20_2h_n3_1
## 1                        NA                        NA
## 2                         1                         1
## 3                         1                         1
## 4                         1                         1
## 5                         1                         1
## 6                        NA                        NA
##   Experiment.3D8_20_2h_n3_2 Experiment.3D8_20_2h_n3_3
## 1                         1                        NA
## 2                         1                        NA
## 3                         2                         1
## 4                         1                         1
## 5                         1                         1
## 6                        NA                        NA
##   Experiment.3D8_20_2h_n4_1 Experiment.3D8_20_2h_n4_2
## 1                        NA                        NA
## 2                         1                         1
## 3                         1                         1
## 4                         1                         1
## 5                         1                         1
## 6                        NA                         1
##   Experiment.3D8_20_2h_n4_3 Experiment.3D8_20_2h_n5_1
## 1                        NA                        NA
## 2                         1                         1
## 3                         1                        NA
## 4                         1                         1
## 5                        NA                         1
## 6                        NA                        NA
##   Experiment.3D8_20_2h_n5_2 Experiment.3D8_20_2h_n5_3  Intensity
## 1                        NA                        NA 1.3295e+09
## 2                        NA                        NA 1.6719e+08
## 3                         1                         1 5.5675e+08
## 4                         1                         1 1.4830e+10
## 5                         1                         1 4.8209e+09
## 6                        NA                        NA 1.3715e+08
##   Intensity.1WT_20_2h_n3_1 Intensity.1WT_20_2h_n3_2
## 1                        0                        0
## 2                        0                 20679000
## 3                 28853000                 28091000
## 4                339830000                420390000
## 5                240280000                        0
## 6                        0                 31069000
##   Intensity.1WT_20_2h_n3_3 Intensity.1WT_20_2h_n4_1
## 1                  9290400                 92996000
## 2                        0                        0
## 3                 30436000                 19476000
## 4                393930000                235060000
## 5                241890000                222250000
## 6                 31448000                        0
##   Intensity.1WT_20_2h_n4_2 Intensity.1WT_20_2h_n4_3
## 1                        0                 98059000
## 2                        0                        0
## 3                 22050000                 20687000
## 4                381090000                360270000
## 5                189930000                        0
## 6                 27721000                 28657000
##   Intensity.1WT_20_2h_n5_1 Intensity.1WT_20_2h_n5_2
## 1                        0                 80803000
## 2                        0                 17358000
## 3                 36588000                 28654000
## 4                255710000                311370000
## 5                160440000                147230000
## 6                        0                        0
##   Intensity.1WT_20_2h_n5_3 Intensity.3D8_20_2h_n3_1
## 1                        0                        0
## 2                 13841000                 12278000
## 3                  7009100                 14174000
## 4                271440000                225140000
## 5                143340000                135910000
## 6                        0                        0
##   Intensity.3D8_20_2h_n3_2 Intensity.3D8_20_2h_n3_3
## 1                 95433000                        0
## 2                 10712000                        0
## 3                  8548000                  7859200
## 4                289880000                282500000
## 5                185740000                168620000
## 6                        0                        0
##   Intensity.3D8_20_2h_n4_1 Intensity.3D8_20_2h_n4_2
## 1                        0                        0
## 2                  9500700                 11111000
## 3                 14871000                 11690000
## 4                209490000                223610000
## 5                160240000                158000000
## 6                        0                 18253000
##   Intensity.3D8_20_2h_n4_3 Intensity.3D8_20_2h_n5_1
## 1                        0                        0
## 2                  8723500                  8125200
## 3                  4470500                        0
## 4                219700000                169120000
## 5                        0                 92542000
## 6                        0                        0
##   Intensity.3D8_20_2h_n5_2 Intensity.3D8_20_2h_n5_3 Reverse Contaminant id
## 1                        0                        0                      0
## 2                        0                        0                      1
## 3                  4048100                  3422300                      2
## 4                178150000                170400000                      3
## 5                102350000                100090000                      4
## 6                        0                        0                      5
##   Protein.group.IDs Mod..peptide.IDs
## 1               419                0
## 2              1150                1
## 3               513                2
## 4               199                3
## 5               388                4
## 6               624                5
##                                                                                                                                                               Evidence.IDs
## 1                                                                                                                           0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18
## 2                                                                                                                             19;20;21;22;23;24;25;26;27;28;29;30;31;32;33
## 3                                         34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76
## 4 77;78;79;80;81;82;83;84;85;86;87;88;89;90;91;92;93;94;95;96;97;98;99;100;101;102;103;104;105;106;107;108;109;110;111;112;113;114;115;116;117;118;119;120;121;122;123;124
## 5                                              125;126;127;128;129;130;131;132;133;134;135;136;137;138;139;140;141;142;143;144;145;146;147;148;149;150;151;152;153;154;155
## 6                                                                                                                                                      156;157;158;159;160
##                                                                                                                                                                                                                                                                                                                                                         MS.MS.IDs
## 1                                                                                                                                                                                                                                                                                                                  0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18
## 2                                                                                                                                                                                                                                                                                                                    19;20;21;22;23;24;25;26;27;28;29;30;31;32;33
## 3                                                                                                                                                                                                                       34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76;77;78;79
## 4 80;81;82;83;84;85;86;87;88;89;90;91;92;93;94;95;96;97;98;99;100;101;102;103;104;105;106;107;108;109;110;111;112;113;114;115;116;117;118;119;120;121;122;123;124;125;126;127;128;129;130;131;132;133;134;135;136;137;138;139;140;141;142;143;144;145;146;147;148;149;150;151;152;153;154;155;156;157;158;159;160;161;162;163;164;165;166;167;168;169;170;171;172
## 5                                                                                                                                                                                                                                     173;174;175;176;177;178;179;180;181;182;183;184;185;186;187;188;189;190;191;192;193;194;195;196;197;198;199;200;201;202;203
## 6                                                                                                                                                                                                                                                                                                                                             204;205;206;207;208
##   Best.MS.MS Oxidation..M..site.IDs
## 1         15                       
## 2         26                       
## 3         39                       
## 4        159                       
## 5        191                       
## 6        205

If we want to mimick the MaxQuant preprocessing pipeline, we need to add a factor that indicates whether a protein is only identified by modified peptides. This requires some basic R programming. The information about which proteins are only identified by modified peptides can be found in the proteinGroups.txt file. We again make use of the proteinGroups.txt file that is delivered with the package.

#Import the proteinGroups.txt file delivered with the MSqRobData package 
file_proteinGroups <- system.file("extdata/Francisella", "proteinGroups.txt", package = "MSqRobData")

  proteinGroups <- read.table(file_proteinGroups, sep="\t", header=TRUE, quote="", comment.char = "")
  
  #Extract the column that indicates which proteins are only identified by a modification site
  only_site <- proteinGroups$Only.identified.by.site
  
  #If there are no such proteins (as is the case here), the column will be completely empty and R will import this by default as "NA". However, we want the an empty value instead of "NA" to keep this column consistent with the "Contaminant" and "Reverse" columns.
  only_site[is.na(only_site)] <- ""
  #Select the protein accessions that are only identified by one or more modified peptides
  filter_symbol <- "+"
  removed_proteins <- proteinGroups$Protein.IDs[only_site==filter_symbol]

  #Create a logical variable "rem" that holds "TRUE" if a row in the pepFrancDf data frame should be removed and FALSE otherwise.
  accession <- "Proteins"
  rem <- as.character(pepFrancDf[,accession]) %in% as.character(removed_proteins)
  
  #Create a new column "only_site" in "pepFrancDf" that indicates with a "+" which rows should be removed.
  pepFrancDf$only_site <- ""
  pepFrancDf$only_site[rem] <- "+"

Note that in this particular case, the previous chunck of code was superfluous, as there are no proteins in the dataset that are only identified by modified peptides.

We can easily preprocess the data frame using the preprocess_wide function (for data in “long” format, use the preprocess_long function). This function uses our standard preprocessing pipeline on data frames in “wide” format (i.e. the data frame has one observation row per subject with each measurement present as a different variable).

We already made the exp_annotation object before (see “2.1. Create an experimental annotation data frame.”), so we will not repeat this code again here. Importantly, however, contrary to the exprs slot in the peptidesFranc2 object created before, the names of all intensity columns in data frame pepFrancDf start with “Intensity.”.

This is because the default setting remove_pattern=TRUE in the import2MSnSet function removes the prefix "Intensity\ " from the columns containing the intensities as this allows users that are unfamiliar with R to just copy and paste the names they provided in MaxQuant’s “Experiment” column as an input. Here, we just imported the peptides.txt file as it is. We thus create a new experimental annotation data frame called exp_annotation2 with run names starting with “Intensity.” (a space is not syntactically valid in R, so it is automatically converted to a dot).

runs2 <- colnames(pepFrancDf)[grepl("Intensity.",colnames(pepFrancDf))]
runs2
##  [1] "Intensity.1WT_20_2h_n3_1" "Intensity.1WT_20_2h_n3_2"
##  [3] "Intensity.1WT_20_2h_n3_3" "Intensity.1WT_20_2h_n4_1"
##  [5] "Intensity.1WT_20_2h_n4_2" "Intensity.1WT_20_2h_n4_3"
##  [7] "Intensity.1WT_20_2h_n5_1" "Intensity.1WT_20_2h_n5_2"
##  [9] "Intensity.1WT_20_2h_n5_3" "Intensity.3D8_20_2h_n3_1"
## [11] "Intensity.3D8_20_2h_n3_2" "Intensity.3D8_20_2h_n3_3"
## [13] "Intensity.3D8_20_2h_n4_1" "Intensity.3D8_20_2h_n4_2"
## [15] "Intensity.3D8_20_2h_n4_3" "Intensity.3D8_20_2h_n5_1"
## [17] "Intensity.3D8_20_2h_n5_2" "Intensity.3D8_20_2h_n5_3"
#Extract the relevant part of the mass spec run names that points to the genotype.
genotype2 <- factor(substr(runs2, 11,13))
genotype2
##  [1] 1WT 1WT 1WT 1WT 1WT 1WT 1WT 1WT 1WT 3D8 3D8 3D8 3D8 3D8 3D8 3D8 3D8
## [18] 3D8
## Levels: 1WT 3D8
n <- nchar(runs2)
biorep2 <- as.factor(paste0("b_",factor(as.numeric(factor(paste0(substr(runs2, 10,10),substr(runs2, n-2,n-2))))+c(rep(0,9),rep(3,9)))))
biorep
##  [1] b_1 b_1 b_1 b_2 b_2 b_2 b_3 b_3 b_3 b_4 b_4 b_4 b_5 b_5 b_5 b_6 b_6
## [18] b_6
## Levels: b_1 b_2 b_3 b_4 b_5 b_6
exp_annotation2 <- data.frame(run=runs2, genotype=genotype2, biorep=biorep2)
exp_annotation2
##                         run genotype biorep
## 1  Intensity.1WT_20_2h_n3_1      1WT    b_1
## 2  Intensity.1WT_20_2h_n3_2      1WT    b_1
## 3  Intensity.1WT_20_2h_n3_3      1WT    b_1
## 4  Intensity.1WT_20_2h_n4_1      1WT    b_2
## 5  Intensity.1WT_20_2h_n4_2      1WT    b_2
## 6  Intensity.1WT_20_2h_n4_3      1WT    b_2
## 7  Intensity.1WT_20_2h_n5_1      1WT    b_3
## 8  Intensity.1WT_20_2h_n5_2      1WT    b_3
## 9  Intensity.1WT_20_2h_n5_3      1WT    b_3
## 10 Intensity.3D8_20_2h_n3_1      3D8    b_4
## 11 Intensity.3D8_20_2h_n3_2      3D8    b_4
## 12 Intensity.3D8_20_2h_n3_3      3D8    b_4
## 13 Intensity.3D8_20_2h_n4_1      3D8    b_5
## 14 Intensity.3D8_20_2h_n4_2      3D8    b_5
## 15 Intensity.3D8_20_2h_n4_3      3D8    b_5
## 16 Intensity.3D8_20_2h_n5_1      3D8    b_6
## 17 Intensity.3D8_20_2h_n5_2      3D8    b_6
## 18 Intensity.3D8_20_2h_n5_3      3D8    b_6

The preprocess_wide function allows for the same preprocessing as the preprocess_MaxQuant function on a regular data frame. If you inspect this function, you will notice that the same 8 preprocessing steps are executed. Remember that you can do any preprocessing you like using basic R data frame manipulations. The split argument indicates which character string is used to separate the accession groups. The exp_annotation argument, like before, contains the experimental annotation data frame or a character string indicating the filepath where the experimental annotation is saved as an Excel file or a tab-delimited file. The quant_cols argument contains a vector of names or a vector of numeric indices pointing to the columns containing the quantitative values of interest (mostly intensities or areas under the curve). If quant_cols contains only one character string, each column containing this pattern will be regarded as a column containing the quantitative values of interest. aggr_by indicates the column by which the data should be aggregated. Here, the data is already aggregated to the peptide level (i.e. over different charge states and modification statuses). We need no further aggregation, thus we provide the “Sequence” column (the level to which aggregation has already been done). aggr_function will be superfluous here, but if further aggregation would have been necessary, raw intensities would be summed up (aggr_function="sum"). The other parameters are similar to the preprocess_MaxQuant function.

pepFrancDf2 <- preprocess_wide(pepFrancDf, accession="Proteins", split=";", exp_annotation=exp_annotation2, quant_cols="Intensity.", aggr_by="Sequence", aggr_function="sum", logtransform=TRUE, base=2, normalisation="quantiles", smallestUniqueGroups=TRUE, useful_properties=c("GI.number","Protein.names","Sequence"), filter=c("Contaminant","Reverse","only_site"), filter_symbol="+", minIdentified=2)

Note that the experimental annotation is now present as an attribute of the pepFrancDf2 data frame.

attr(pepFrancDf2,"MSqRob_exp_annotation")
##                         run genotype biorep
## 1  Intensity.1WT_20_2h_n3_1      1WT    b_1
## 2  Intensity.1WT_20_2h_n3_2      1WT    b_1
## 3  Intensity.1WT_20_2h_n3_3      1WT    b_1
## 4  Intensity.1WT_20_2h_n4_1      1WT    b_2
## 5  Intensity.1WT_20_2h_n4_2      1WT    b_2
## 6  Intensity.1WT_20_2h_n4_3      1WT    b_2
## 7  Intensity.1WT_20_2h_n5_1      1WT    b_3
## 8  Intensity.1WT_20_2h_n5_2      1WT    b_3
## 9  Intensity.1WT_20_2h_n5_3      1WT    b_3
## 10 Intensity.3D8_20_2h_n3_1      3D8    b_4
## 11 Intensity.3D8_20_2h_n3_2      3D8    b_4
## 12 Intensity.3D8_20_2h_n3_3      3D8    b_4
## 13 Intensity.3D8_20_2h_n4_1      3D8    b_5
## 14 Intensity.3D8_20_2h_n4_2      3D8    b_5
## 15 Intensity.3D8_20_2h_n4_3      3D8    b_5
## 16 Intensity.3D8_20_2h_n5_1      3D8    b_6
## 17 Intensity.3D8_20_2h_n5_2      3D8    b_6
## 18 Intensity.3D8_20_2h_n5_3      3D8    b_6

After custom preprocessing, you can transform the data to a protdata object via the df2protdata function.

The acc_col argument in the df2protdata function indicates the column index or column name of the accessions (i.e. the protein (group) identifiers). The quant_cols argument is a vector of indices pointing to all columns containing peptide intensities. In the example below, the column named “Proteins” will be added to the accession slot while all columns with indices in the quant_cols vector will be added as a matrix to the intensities slot. quant_name indicates the name that will be given to the column containing the (preprocessed) intensities and run_name indicates the name that will be given to the column containing the mass spec run names in the new protdata object. Like before, we keep “GI.number” and “Protein.names” as annotations.

#Select columns with intensities.
quant_cols <- which(grepl("Intensity.", colnames(pepFrancDf2)))
quant_cols
##  [1]  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
#Call df2protdata function
proteinsFrancDf <- df2protdata(pepFrancDf2, acc_col="Proteins", quant_cols=quant_cols, quant_name="quant_value", run_name="run", annotations=c("GI.number","Protein.names"))
#Inspect proteinsFrancDf protdata object.
proteinsFrancDf
## @accession 
## 
## [1] "WP_003038655"
## 
##  @annotation 
## 
##                                       GI.number 
##                                  "gi|118497196" 
##                                   Protein.names 
## "hypothetical protein [Francisella tularensis]" 
## 
##  @data 
## 
##   quant_value           Sequence genotype biorep                      run
## 2    25.11978       AVVVFFDYGCGK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 3    25.34586 DLLSDPATPTVGPQDANK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 4    24.11491   ETNGELTVQDVDNVVK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 5    26.50674          IIAIPDIVK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 6    24.69046     QTNVITDNYQQAAK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 9    24.75196       AVVVFFDYGCGK      1WT    b_1 Intensity.1WT_20_2h_n3_2
## 
##  96 more rows... 
## 
## @accession 
## 
## [1] "WP_003041237"
## 
##  @annotation 
## 
##                                            GI.number 
##                                       "gi|118498194" 
##                                        Protein.names 
## "phosphoglucosamine mutase [Francisella tularensis]" 
## 
##  @data 
## 
##   quant_value         Sequence genotype biorep                      run
## 2    25.81732    ADLGISLDGDADR      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 3    27.36716        FVIVGQDTR      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 4    24.91571  GEVANSTITAEFTQK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 5    27.38539 ILANAIDEYIESIHSR      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 7    27.93398    SLATNEAEYLVEK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 8    26.59163       VASDVNDVEK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 
##  187 more rows... 
## 
## @accession 
## 
## [1] "WP_003038915"
## 
##  @annotation 
## 
##                                                    GI.number 
##                                               "gi|118497331" 
##                                                Protein.names 
## "glycine--tRNA ligase subunit beta [Francisella tularensis]" 
## 
##  @data 
## 
##   quant_value                Sequence genotype biorep
## 1    24.37301 AAANEYELALAYSIEEVAPDLHK      1WT    b_1
## 4    22.60222   EEGIAVDIFEAINNTNYDSIK      1WT    b_1
## 5    25.34002           EGEPTQVGLGFAK      1WT    b_1
## 6    25.13163          EYSYALELLTCLDK      1WT    b_1
## 7    24.09977       FHHPQAIVIDNISDYIK      1WT    b_1
## 9    25.75712              IIDITLESYK      1WT    b_1
##                        run
## 1 Intensity.1WT_20_2h_n3_1
## 4 Intensity.1WT_20_2h_n3_1
## 5 Intensity.1WT_20_2h_n3_1
## 6 Intensity.1WT_20_2h_n3_1
## 7 Intensity.1WT_20_2h_n3_1
## 9 Intensity.1WT_20_2h_n3_1
## 
##  215 more rows... 
## 
## @accession 
## 
## [1] "WP_003038264"
## 
##  @annotation 
## 
##                                           GI.number 
##                                      "gi|118496879" 
##                                       Protein.names 
## "molecular chaperone HtpG [Francisella tularensis]" 
## 
##  @data 
## 
##   quant_value        Sequence genotype biorep                      run
## 1    27.92076    AAANNPQLEAFK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 3    26.48741 EGQDTIYYITSDSYK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 5    27.33299    EHLDLLEYHVLK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 6    26.37656         EILQHNK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 7    28.11488        ELLPPYLR      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 8    28.77850    ELVSNSSDAIEK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 
##  338 more rows... 
## 
## @accession 
## 
## [1] "WP_003035781"
## 
##  @annotation 
## 
##                                                        GI.number 
##                                                   "gi|118497152" 
##                                                    Protein.names 
## "delta-aminolevulinic acid dehydratase [Francisella tularensis]" 
## 
##  @data 
## 
##   quant_value        Sequence genotype biorep                      run
## 1    27.47327     AAASAGLVDEK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 2    24.05872 VILTYTALDVANWLK      1WT    b_1 Intensity.1WT_20_2h_n3_1
## 5    23.88890 VILTYTALDVANWLK      1WT    b_1 Intensity.1WT_20_2h_n3_2
## 6    25.37749      YASAYYGPFR      1WT    b_1 Intensity.1WT_20_2h_n3_2
## 7    27.47269     AAASAGLVDEK      1WT    b_1 Intensity.1WT_20_2h_n3_3
## 8    24.93174 VILTYTALDVANWLK      1WT    b_1 Intensity.1WT_20_2h_n3_3
## 
##  41 more rows... 
## 
## @accession 
## 
## [1] "WP_003039212"
## 
##  @annotation 
## 
##                                GI.number 
##                           "gi|118497492" 
##                            Protein.names 
## "phosphorylase [Francisella tularensis]" 
## 
##  @data 
## 
##    quant_value             Sequence genotype biorep
## 3     22.96510 FPAGLNDVEFVAEHIFQHSK      1WT    b_1
## 4     25.10257           LQDLLTFIGK      1WT    b_1
## 7     24.55598             VVFSVDYR      1WT    b_1
## 8     24.53369         AAASLDLYSYPK      1WT    b_1
## 9     24.68029              EVLFLNR      1WT    b_1
## 10    23.16038 FPAGLNDVEFVAEHIFQHSK      1WT    b_1
##                         run
## 3  Intensity.1WT_20_2h_n3_1
## 4  Intensity.1WT_20_2h_n3_1
## 7  Intensity.1WT_20_2h_n3_1
## 8  Intensity.1WT_20_2h_n3_2
## 9  Intensity.1WT_20_2h_n3_2
## 10 Intensity.1WT_20_2h_n3_2
## 
##  56 more rows... 
## 
## 1182 more elements...

Note that the proteinsFrancDf object is the same as the proteinsFranc object, only with a different order of columns in their data slot.

CPTAC dataset (randomized complete block design)

library(MSqRob)
library(Biobase)

This example involves the CPTAC Study 6. In this dataset Sigma’s UPS1 standard containing 48 human proteins was spiked into a reference yeast proteome and analyzed on 7 instruments in 5 different laboratories (Paulovich et al., 2010). Raw data files can be downloaded at: https://cptac-data-portal.georgetown.edu/cptac/dataPublic/list?currentPath=%2FPhase_I_Data%2FStudy6&nonav=true

Here, we limited ourselves to the data of LTQ-Orbitrap at site 86, LTQ-Orbitrap O at site 65 and LTQ-Orbitrap W at site 56. The data was searched with MaxQuant version 1.5.2.8. Detailed search settings can be found in Goeminne et al. (2016).

1. Import MaxQuant’s peptides.txt file

First, specify the location of the peptides.txt MaxQuant output file. By default, MaxQuant creates a “combined” folder in the same folder as the raw files are (or if the raw files are in different directories, in the directory of the first raw file). The peptides.txt file can then be found in “path_to_raw_files/combined/txt/peptides.txt”. Here, we make use of the peptides.txt file that is delivered with the package.

file_peptides_txt <- system.file("extdata/CPTAC", "peptides.txt", package = "MSqRobData")

Import MaxQuant’s peptides.txt file and convert it to an MSnSet object (see package MSnbase).

peptidesCPTAC <- import2MSnSet(file_peptides_txt, filetype="MaxQuant", remove_pattern=TRUE)

The “pattern” argument indicates which columns in the peptides.txt file contain peptide intensities. Currently, all columns containing "Intensity\ " in their column names will be treated as peptide intensities (default for MaxQuant). remove_pattern=TRUE (default) indicates that the name of the pattern is removed from the column names.

2. Preprocessing

2.1. Add information on the design: create an experimental annotation data frame.

Extract the names of the mass spec runs via the sampleNames function from the Biobase package.

runs <- sampleNames(peptidesCPTAC)

Note that as the names again start with a number, an “X” is prepended to make the names syntactically valid. Add a grouping factor for the spike-in condition (i.e. 6A, 6B, 6C, 6D and 6E).

condition <- factor(substr(runs, 2,3))

Add a grouping factor for the lab effect (i.e. LTQ-Orbitrap at site 86, LTQ-Orbitrap O at site 65 and LTQ-Orbitrap W at site 56).

lab <- factor(rep(rep(1:3,each=3),5))
#Alternative:
#lab <- factor((as.numeric(substr(runs, 5,5))-1)%/%3+1)

Create an experimental annotation data frame.

exp_annotation <- data.frame(run=runs, condition=condition, lab=lab)
exp_annotation
##      run condition lab
## 1  X6A_1        6A   1
## 2  X6A_2        6A   1
## 3  X6A_3        6A   1
## 4  X6A_4        6A   2
## 5  X6A_5        6A   2
## 6  X6A_6        6A   2
## 7  X6A_7        6A   3
## 8  X6A_8        6A   3
## 9  X6A_9        6A   3
## 10 X6B_1        6B   1
## 11 X6B_2        6B   1
## 12 X6B_3        6B   1
## 13 X6B_4        6B   2
## 14 X6B_5        6B   2
## 15 X6B_6        6B   2
## 16 X6B_7        6B   3
## 17 X6B_8        6B   3
## 18 X6B_9        6B   3
## 19 X6C_1        6C   1
## 20 X6C_2        6C   1
## 21 X6C_3        6C   1
## 22 X6C_4        6C   2
## 23 X6C_5        6C   2
## 24 X6C_6        6C   2
## 25 X6C_7        6C   3
## 26 X6C_8        6C   3
## 27 X6C_9        6C   3
## 28 X6D_1        6D   1
## 29 X6D_2        6D   1
## 30 X6D_3        6D   1
## 31 X6D_4        6D   2
## 32 X6D_5        6D   2
## 33 X6D_6        6D   2
## 34 X6D_7        6D   3
## 35 X6D_8        6D   3
## 36 X6D_9        6D   3
## 37 X6E_1        6E   1
## 38 X6E_2        6E   1
## 39 X6E_3        6E   1
## 40 X6E_4        6E   2
## 41 X6E_5        6E   2
## 42 X6E_6        6E   2
## 43 X6E_7        6E   3
## 44 X6E_8        6E   3
## 45 X6E_9        6E   3

Remember that the exp_annotation can also be given as an Excel file or a tab delimited file!

2.2. Preprocess the data using the preprocess_MaxQuant function.

If we want to remove proteins that are only identified by modified peptides from the dataset, we also need the proteinGroups.txt file. This file can be found in the same location as the peptides.txt file (“path_to_raw_files/combined/txt/proteinGroups.txt”). Here, we make use of the proteinGroups.txt file that is delivered with the package.

Note: if you don’t want to remove proteins that are only identified by modified peptides, set remove_only_site=FALSE and leave the file_proteinGroups argument at its default value (NULL).

file_proteinGroups <- system.file("extdata/CPTAC", "proteinGroups.txt", package = "MSqRobData")

Important: in the CPTAC dataset, some human UPS proteins are NOT contaminants as these proteins were spiked in on purpose! We only want to remove those contaminants that are not human UPS proteins. We thus need to unmark these proteins as contaminant before preprocessing.

fData(peptidesCPTAC)[grepl("_HUMAN_UPS", fData(peptidesCPTAC)$Proteins),]$Potential.contaminant <- ""

Add to useful_properties all column names of the fData slot of the peptidesCPTAC MSnSet object that you would like to keep (such as gene names, protein names, ontologies, etc.). Other columns will be removed for improved efficiency and memory usage. We certainly want to keep the “Proteins” slot, as this will be our protein (group) identifier. You also want to keep the “Sequence” slot, as peptide-specific effects must be included in our model.

peptidesCPTAC2 <- preprocess_MaxQuant(peptidesCPTAC, accession="Proteins", exp_annotation=exp_annotation, logtransform=TRUE, base=2, normalisation="quantiles", smallestUniqueGroups=TRUE, useful_properties=c("Proteins","Sequence"), filter=c("Potential.contaminant","Reverse"), remove_only_site=TRUE, file_proteinGroups=file_proteinGroups, filter_symbol="+", minIdentified=2)

The following preprocessing steps are executed in this order:

  1. Peptide intensities are log2-transformed (logtransform=TRUE, base=2).

  2. As a normalization approach, we choose to quantile normalize the peptide intensities (normalisation="quantiles"). Other options are "sum", "max", "center.mean", "center.median", "quantiles.robust", "vsn" or "none" (see MSnbase::normalize for more information).

  3. Handling overlapping protein groups (smallestUniqueGroups=TRUE): in our approach a peptide can map to multiple proteins, as long as there is none of these proteins present in a smaller subgroup. Peptides that map to protein groups for wich also subgroups are present in the dataset are removed from the dataset.

  4. Remove reverse sequences and contaminants (filter=c("Potential.contaminant","Reverse")).

  5. Remove all proteins that are only identified by peptides carrying a modification site (remove_only_site=TRUE, file_proteinGroups=file_proteinGroups).

  6. Remove all superfluous columns in the fData slot, except the “Proteins” and “Sequence” columns (useful_properties=c("Proteins","Sequence")).

  7. Remove all peptides that are only identified once in the dataset (minIdentified=2).

  8. Add experimental annotation.

2.3. Advanced alternative: do the preprocessing yourself.

It is possible to do all the preprocessing steps yourself. That way, you can do more advanced preprocessing or change the order of certain preprocessing steps if you like to.

  1. Log2-transform the peptide intensities.
peptidesCPTAC3 <- log(peptidesCPTAC, base=2)
#Alternative: 
#peptidesCPTAC3 <- log2(peptidesCPTAC3)

After log2-transformation, we want to replace -Inf values in the peptide intensities by NA values as these values originate from zeros in the untransformed intensities matrix, which means these intensities were actually missing.

exprs <- exprs(peptidesCPTAC3)
exprs[is.infinite(exprs)] <- NA
exprs(peptidesCPTAC3) <- exprs
  1. As a normalization approach, we choose to quantile normalize the peptide intensities.
peptidesCPTAC3 <- normalize(peptidesCPTAC3, "quantiles")
  1. Handling overlapping protein groups: in our approach a peptide can map to multiple proteins, as long as there is none of these proteins present in a smaller subgroup.
groups2 <- smallestUniqueGroups(fData(peptidesCPTAC3)$Proteins)
sel <- fData(peptidesCPTAC3)$Proteins %in% groups2
peptidesCPTAC3 <- peptidesCPTAC3[sel]
head(fData(peptidesCPTAC3))
##             Sequence N.term.cleavage.window C.term.cleavage.window
## 1 AAAAGAGGAGDSGDAVTK       EHQHDEQKAAAAGAGG       DSGDAVTKIGSEDVKL
## 2          AAAALAGGK       QQLSKAAKAAAALAGG       AAALAGGKKSKKKWSK
## 3         AAAALAGGKK       QQLSKAAKAAAALAGG       AALAGGKKSKKKWSKK
## 4       AAADALSDLEIK       MPKETPSKAAADALSD       ALSDLEIKDSKSNLNK
## 5    AAADALSDLEIKDSK       MPKETPSKAAADALSD       DLEIKDSKSNLNKELE
## 6           AAAEEFQR       RERKEREKAAAEEFQR       AAAEEFQRQQELLRQQ
##   Amino.acid.before First.amino.acid Second.amino.acid
## 1                 K                A                 A
## 2                 K                A                 A
## 3                 K                A                 A
## 4                 K                A                 A
## 5                 K                A                 A
## 6                 K                A                 A
##   Second.last.amino.acid Last.amino.acid Amino.acid.after A.Count R.Count
## 1                      T               K                I       7       0
## 2                      G               K                K       5       0
## 3                      K               K                S       5       0
## 4                      I               K                D       4       0
## 5                      S               K                S       4       0
## 6                      Q               R                Q       3       1
##   N.Count D.Count C.Count Q.Count E.Count G.Count H.Count I.Count L.Count
## 1       0       2       0       0       0       5       0       0       0
## 2       0       0       0       0       0       2       0       0       1
## 3       0       0       0       0       0       2       0       0       1
## 4       0       2       0       0       1       0       0       1       2
## 5       0       3       0       0       1       0       0       1       2
## 6       0       0       0       1       2       0       0       0       0
##   K.Count M.Count F.Count P.Count S.Count T.Count W.Count Y.Count V.Count
## 1       1       0       0       0       1       1       0       0       1
## 2       1       0       0       0       0       0       0       0       0
## 3       2       0       0       0       0       0       0       0       0
## 4       1       0       0       0       1       0       0       0       0
## 5       2       0       0       0       2       0       0       0       0
## 6       0       0       1       0       0       0       0       0       0
##   U.Count Length Missed.cleavages      Mass
## 1       0     18                0 1445.6746
## 2       0      9                0  728.4181
## 3       0     10                1  856.5131
## 4       0     12                0 1215.6347
## 5       0     15                1 1545.7886
## 6       0      8                0  920.4352
##                                      Proteins Leading.razor.protein
## 1                        sp|P38915|SPT8_YEAST  sp|P38915|SPT8_YEAST
## 2 sp|Q3E792|RS25A_YEAST;sp|P0C0T4|RS25B_YEAST sp|Q3E792|RS25A_YEAST
## 3 sp|Q3E792|RS25A_YEAST;sp|P0C0T4|RS25B_YEAST sp|Q3E792|RS25A_YEAST
## 4                        sp|P09938|RIR2_YEAST  sp|P09938|RIR2_YEAST
## 5                        sp|P09938|RIR2_YEAST  sp|P09938|RIR2_YEAST
## 6                       sp|P53075|SHE10_YEAST sp|P53075|SHE10_YEAST
##   Start.position End.position Unique..Groups. Unique..Proteins. Charges
## 1             97          114             yes               yes       2
## 2             13           21             yes                no       2
## 3             13           22             yes                no       2
## 4              9           20             yes               yes       2
## 5              9           23             yes               yes       3
## 6            538          545             yes               yes       2
##          PEP   Score Identification.type.6A_1 Identification.type.6A_2
## 1 1.1843e-05  82.942              By matching                 By MS/MS
## 2 7.4562e-06 134.810              By matching              By matching
## 3 3.3094e-09 143.730              By matching              By matching
## 4 9.1593e-23 182.230                 By MS/MS                 By MS/MS
## 5 1.5319e-04  73.927              By matching              By matching
## 6 1.7588e-02  79.474              By matching              By matching
##   Identification.type.6A_3 Identification.type.6A_4
## 1              By matching                 By MS/MS
## 2              By matching                 By MS/MS
## 3              By matching                 By MS/MS
## 4              By matching                 By MS/MS
## 5              By matching                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6A_5 Identification.type.6A_6
## 1              By matching              By matching
## 2              By matching              By matching
## 3              By matching              By matching
## 4                 By MS/MS                 By MS/MS
## 5                 By MS/MS                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6A_7 Identification.type.6A_8
## 1                 By MS/MS                 By MS/MS
## 2                 By MS/MS                 By MS/MS
## 3                 By MS/MS                 By MS/MS
## 4                 By MS/MS              By matching
## 5                 By MS/MS                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6A_9 Identification.type.6B_1
## 1                 By MS/MS              By matching
## 2                 By MS/MS                 By MS/MS
## 3                 By MS/MS              By matching
## 4                 By MS/MS                 By MS/MS
## 5                 By MS/MS              By matching
## 6              By matching                 By MS/MS
##   Identification.type.6B_2 Identification.type.6B_3
## 1              By matching              By matching
## 2              By matching              By matching
## 3                 By MS/MS                 By MS/MS
## 4                 By MS/MS              By matching
## 5              By matching              By matching
## 6              By matching              By matching
##   Identification.type.6B_4 Identification.type.6B_5
## 1              By matching              By matching
## 2              By matching              By matching
## 3              By matching              By matching
## 4                 By MS/MS                 By MS/MS
## 5                 By MS/MS                 By MS/MS
## 6                 By MS/MS              By matching
##   Identification.type.6B_6 Identification.type.6B_7
## 1              By matching              By matching
## 2              By matching                 By MS/MS
## 3              By matching                 By MS/MS
## 4                 By MS/MS                 By MS/MS
## 5                 By MS/MS                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6B_8 Identification.type.6B_9
## 1                 By MS/MS                 By MS/MS
## 2                 By MS/MS                 By MS/MS
## 3                 By MS/MS                 By MS/MS
## 4              By matching              By matching
## 5                 By MS/MS                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6C_1 Identification.type.6C_2
## 1              By matching              By matching
## 2              By matching                 By MS/MS
## 3              By matching                 By MS/MS
## 4                 By MS/MS              By matching
## 5              By matching              By matching
## 6              By matching              By matching
##   Identification.type.6C_3 Identification.type.6C_4
## 1              By matching              By matching
## 2              By matching                 By MS/MS
## 3              By matching                 By MS/MS
## 4                 By MS/MS                 By MS/MS
## 5              By matching                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6C_5 Identification.type.6C_6
## 1                 By MS/MS              By matching
## 2              By matching              By matching
## 3              By matching              By matching
## 4                 By MS/MS                 By MS/MS
## 5                 By MS/MS                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6C_7 Identification.type.6C_8
## 1                 By MS/MS              By matching
## 2                 By MS/MS                 By MS/MS
## 3                 By MS/MS                 By MS/MS
## 4              By matching                 By MS/MS
## 5                 By MS/MS                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6C_9 Identification.type.6D_1
## 1              By matching              By matching
## 2                 By MS/MS              By matching
## 3                 By MS/MS              By matching
## 4                 By MS/MS                 By MS/MS
## 5                 By MS/MS                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6D_2 Identification.type.6D_3
## 1              By matching              By matching
## 2              By matching              By matching
## 3              By matching              By matching
## 4              By matching              By matching
## 5                 By MS/MS              By matching
## 6              By matching              By matching
##   Identification.type.6D_4 Identification.type.6D_5
## 1              By matching              By matching
## 2              By matching              By matching
## 3                 By MS/MS              By matching
## 4                 By MS/MS                 By MS/MS
## 5                 By MS/MS                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6D_6 Identification.type.6D_7
## 1                 By MS/MS              By matching
## 2              By matching                 By MS/MS
## 3              By matching                 By MS/MS
## 4                 By MS/MS              By matching
## 5              By matching                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6D_8 Identification.type.6D_9
## 1              By matching              By matching
## 2                 By MS/MS                 By MS/MS
## 3                 By MS/MS                 By MS/MS
## 4                 By MS/MS                 By MS/MS
## 5                 By MS/MS                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6E_1 Identification.type.6E_2
## 1              By matching              By matching
## 2              By matching              By matching
## 3              By matching              By matching
## 4                 By MS/MS                 By MS/MS
## 5                 By MS/MS                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6E_3 Identification.type.6E_4
## 1              By matching              By matching
## 2              By matching                 By MS/MS
## 3              By matching              By matching
## 4              By matching                 By MS/MS
## 5                 By MS/MS              By matching
## 6              By matching              By matching
##   Identification.type.6E_5 Identification.type.6E_6
## 1              By matching              By matching
## 2              By matching              By matching
## 3              By matching              By matching
## 4                 By MS/MS              By matching
## 5                 By MS/MS                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6E_7 Identification.type.6E_8
## 1              By matching              By matching
## 2                 By MS/MS                 By MS/MS
## 3                 By MS/MS                 By MS/MS
## 4                 By MS/MS                 By MS/MS
## 5              By matching                 By MS/MS
## 6              By matching              By matching
##   Identification.type.6E_9 Experiment.6A_1 Experiment.6A_2 Experiment.6A_3
## 1              By matching              NA               1              NA
## 2                 By MS/MS              NA               1               2
## 3                 By MS/MS              NA               1              NA
## 4                 By MS/MS               1               1               1
## 5                 By MS/MS               1               1              NA
## 6              By matching              NA              NA              NA
##   Experiment.6A_4 Experiment.6A_5 Experiment.6A_6 Experiment.6A_7
## 1               1               1               1               1
## 2               1               1               1               2
## 3               1              NA               1               1
## 4               1               1               1               1
## 5               1               1               1               1
## 6              NA               1              NA              NA
##   Experiment.6A_8 Experiment.6A_9 Experiment.6B_1 Experiment.6B_2
## 1               1               1              NA              NA
## 2               1               1               1               1
## 3               1               1              NA               1
## 4               1               1               1               1
## 5               1               1              NA               1
## 6              NA               1               1              NA
##   Experiment.6B_3 Experiment.6B_4 Experiment.6B_5 Experiment.6B_6
## 1              NA              NA               1               1
## 2               1               2               1              NA
## 3               1               1              NA              NA
## 4               1               1               1               1
## 5              NA               1               1               1
## 6              NA               1              NA              NA
##   Experiment.6B_7 Experiment.6B_8 Experiment.6B_9 Experiment.6C_1
## 1              NA               1               1              NA
## 2               2               1               2              NA
## 3               1               1               1              NA
## 4               1               1               1               1
## 5               1               1               1               1
## 6              NA              NA               1              NA
##   Experiment.6C_2 Experiment.6C_3 Experiment.6C_4 Experiment.6C_5
## 1              NA              NA               1               1
## 2               1               2               2              NA
## 3               1              NA               1              NA
## 4               1               1               1               1
## 5               1               1               1               1
## 6              NA              NA               1              NA
##   Experiment.6C_6 Experiment.6C_7 Experiment.6C_8 Experiment.6C_9
## 1               1               1               1               1
## 2              NA               2               1               1
## 3              NA               1               1               1
## 4               1               1               1               1
## 5               1               1               1               1
## 6               1              NA              NA              NA
##   Experiment.6D_1 Experiment.6D_2 Experiment.6D_3 Experiment.6D_4
## 1              NA              NA              NA               1
## 2              NA               1               1               1
## 3              NA              NA              NA               1
## 4               1               1               1               1
## 5               1               1               1               1
## 6              NA              NA              NA               1
##   Experiment.6D_5 Experiment.6D_6 Experiment.6D_7 Experiment.6D_8
## 1               1               1               1              NA
## 2               1              NA               2               1
## 3              NA              NA               1               1
## 4               1               1               1               1
## 5               1               1               1               1
## 6              NA               1              NA              NA
##   Experiment.6D_9 Experiment.6E_1 Experiment.6E_2 Experiment.6E_3
## 1              NA              NA               1              NA
## 2               2              NA               1               2
## 3               1              NA              NA              NA
## 4               1               1               1               1
## 5               1               1               1               1
## 6               1              NA              NA              NA
##   Experiment.6E_4 Experiment.6E_5 Experiment.6E_6 Experiment.6E_7
## 1              NA               1               1              NA
## 2               2               1              NA               2
## 3               1              NA              NA               1
## 4               1               1               1               1
## 5               1               1               1              NA
## 6              NA              NA              NA              NA
##   Experiment.6E_8 Experiment.6E_9 Intensity Reverse Potential.contaminant
## 1              NA              NA   1190800                              
## 2               2               1 280990000                              
## 3               1               1  33360000                              
## 4               1               1  54622000                              
## 5               1               1  18910000                              
## 6              NA              NA   1158600                              
##   id Protein.group.IDs Mod..peptide.IDs
## 1  0               859                0
## 2  1               230                1
## 3  2               230                2
## 4  3               229                3
## 5  4               229                4
## 6  5              1143                5
##                                                                                                                                                                         Evidence.IDs
## 1                                                                                                                      0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23
## 2                              24;25;26;27;28;29;30;31;32;33;34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73
## 3                                                                                                         74;75;76;77;78;79;80;81;82;83;84;85;86;87;88;89;90;91;92;93;94;95;96;97;98
## 4 99;100;101;102;103;104;105;106;107;108;109;110;111;112;113;114;115;116;117;118;119;120;121;122;123;124;125;126;127;128;129;130;131;132;133;134;135;136;137;138;139;140;141;142;143
## 5                144;145;146;147;148;149;150;151;152;153;154;155;156;157;158;159;160;161;162;163;164;165;166;167;168;169;170;171;172;173;174;175;176;177;178;179;180;181;182;183;184
## 6                                                                                                                                            185;186;187;188;189;190;191;192;193;194
##                                                                                                          MS.MS.IDs
## 1                                                                                              0;1;2;3;4;5;6;7;8;9
## 2                                                      10;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25;26;27;28;29
## 3                                                   30;31;32;33;34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50
## 4            51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76;77;78;79;80;81;82;83;84
## 5 85;86;87;88;89;90;91;92;93;94;95;96;97;98;99;100;101;102;103;104;105;106;107;108;109;110;111;112;113;114;115;116
## 6                                                                                                          117;118
##   Best.MS.MS Oxidation..M..site.IDs MS.MS.Count
## 1          0                                 10
## 2         21                                 18
## 3         31                                 21
## 4         72                                 29
## 5         94                                 32
## 6        117                                  2
  1. Remove reverse sequences and contaminants.
#Remove reverse sequences
sel <- fData(peptidesCPTAC3)$Reverse != "+"
peptidesCPTAC3 <- peptidesCPTAC3[sel]
head(exprs(peptidesCPTAC3))
##      X6A_1    X6A_2    X6A_3    X6A_4    X6A_5    X6A_6    X6A_7    X6A_8
## 1       NA 15.89625       NA 15.74427 16.11690 16.26360       NA       NA
## 2       NA 23.33130 23.66245 23.91300 15.60841 17.66008 21.89950 20.94838
## 3       NA 21.62554       NA 22.05381       NA 15.02926 20.68958 20.00483
## 4 19.64640 20.09650 20.04574 20.32064 20.75817 20.45584 19.63882 20.01568
## 5 15.85773 15.59100       NA 19.33941 19.55917 19.62396 18.94303 19.31408
## 6       NA       NA       NA       NA 17.42813       NA       NA       NA
##      X6A_9    X6B_1    X6B_2    X6B_3    X6B_4    X6B_5    X6B_6    X6B_7
## 1 16.40301       NA       NA       NA       NA 15.60504 16.04583       NA
## 2 20.96813 23.43180 23.26411 23.64645 24.20048 14.91464       NA 22.20164
## 3 19.81333       NA 21.71777 19.37231 16.85073       NA       NA 20.36880
## 4 19.98197 19.62953 19.95218 20.13390 20.04605 20.30895 20.39173 19.42857
## 5 18.96642       NA 16.09153       NA 19.20895 19.09520 19.49627 19.01121
## 6 16.00602       NA       NA       NA 17.54346       NA       NA       NA
##      X6B_8    X6B_9    X6C_1    X6C_2    X6C_3    X6C_4    X6C_5    X6C_6
## 1 15.53808       NA       NA       NA       NA 16.16971 16.09331 15.76673
## 2 20.62686 21.25133       NA 23.52247 24.08292 23.82946       NA       NA
## 3 20.02875 19.62006       NA 21.60703       NA 20.82764       NA       NA
## 4 19.97507 20.02002 20.13560 19.58226 20.02870 19.84447 20.53751 20.65843
## 5 19.23466 19.15420 16.65312 16.28174 15.45126 18.94057 19.12564 19.46605
## 6       NA 16.52722       NA       NA       NA 17.69237       NA 17.58594
##      X6C_7    X6C_8    X6C_9    X6D_1    X6D_2    X6D_3    X6D_4    X6D_5
## 1       NA 15.68773 15.35170       NA       NA       NA 15.96516 16.06759
## 2 22.30765 20.49290 21.00310       NA 23.52274 23.29155 24.05100 16.88207
## 3 20.33127 19.95157 19.64324       NA       NA       NA 22.07140       NA
## 4 19.49678 20.41416 20.01655 20.46055 20.63712 20.54149 20.12157 20.54021
## 5 18.68434 19.24157 19.08266 18.35102 18.22513 18.08876 19.04524 19.98076
## 6       NA       NA       NA       NA       NA       NA 18.17092       NA
##      X6D_6    X6D_7    X6D_8    X6D_9    X6E_1    X6E_2    X6E_3    X6E_4
## 1 16.38325 16.60706       NA       NA       NA 15.48811       NA       NA
## 2       NA 22.09701 20.51658 21.57285       NA 22.82601 22.99603 23.72723
## 3       NA 20.36570 19.82115 19.79300       NA       NA       NA 15.44656
## 4 20.53321 19.90789 19.95061 19.64517 20.26774 19.74733 19.81515 19.80937
## 5 19.65537 18.40360 19.38517 18.73883 18.10882 17.61519 18.09828 18.94790
## 6 15.59841       NA       NA 16.25820       NA       NA       NA       NA
##      X6E_5    X6E_6    X6E_7    X6E_8    X6E_9
## 1 15.48788 15.44796       NA       NA       NA
## 2 13.78541       NA 21.64825 19.69567 20.88645
## 3       NA       NA 20.17699 18.78196 19.33404
## 4 19.62211 19.86977 19.80026 19.96206 19.76630
## 5 18.87684 19.06988       NA 18.53583 18.69385
## 6       NA       NA       NA       NA       NA
#Remove contaminants
sel <- fData(peptidesCPTAC3)$Potential.contaminant != "+"
peptidesCPTAC3 <- peptidesCPTAC3[sel]
head(exprs(peptidesCPTAC3))
##      X6A_1    X6A_2    X6A_3    X6A_4    X6A_5    X6A_6    X6A_7    X6A_8
## 1       NA 15.89625       NA 15.74427 16.11690 16.26360       NA       NA
## 2       NA 23.33130 23.66245 23.91300 15.60841 17.66008 21.89950 20.94838
## 3       NA 21.62554       NA 22.05381       NA 15.02926 20.68958 20.00483
## 4 19.64640 20.09650 20.04574 20.32064 20.75817 20.45584 19.63882 20.01568
## 5 15.85773 15.59100       NA 19.33941 19.55917 19.62396 18.94303 19.31408
## 6       NA       NA       NA       NA 17.42813       NA       NA       NA
##      X6A_9    X6B_1    X6B_2    X6B_3    X6B_4    X6B_5    X6B_6    X6B_7
## 1 16.40301       NA       NA       NA       NA 15.60504 16.04583       NA
## 2 20.96813 23.43180 23.26411 23.64645 24.20048 14.91464       NA 22.20164
## 3 19.81333       NA 21.71777 19.37231 16.85073       NA       NA 20.36880
## 4 19.98197 19.62953 19.95218 20.13390 20.04605 20.30895 20.39173 19.42857
## 5 18.96642       NA 16.09153       NA 19.20895 19.09520 19.49627 19.01121
## 6 16.00602       NA       NA       NA 17.54346       NA       NA       NA
##      X6B_8    X6B_9    X6C_1    X6C_2    X6C_3    X6C_4    X6C_5    X6C_6
## 1 15.53808       NA       NA       NA       NA 16.16971 16.09331 15.76673
## 2 20.62686 21.25133       NA 23.52247 24.08292 23.82946       NA       NA
## 3 20.02875 19.62006       NA 21.60703       NA 20.82764       NA       NA
## 4 19.97507 20.02002 20.13560 19.58226 20.02870 19.84447 20.53751 20.65843
## 5 19.23466 19.15420 16.65312 16.28174 15.45126 18.94057 19.12564 19.46605
## 6       NA 16.52722       NA       NA       NA 17.69237       NA 17.58594
##      X6C_7    X6C_8    X6C_9    X6D_1    X6D_2    X6D_3    X6D_4    X6D_5
## 1       NA 15.68773 15.35170       NA       NA       NA 15.96516 16.06759
## 2 22.30765 20.49290 21.00310       NA 23.52274 23.29155 24.05100 16.88207
## 3 20.33127 19.95157 19.64324       NA       NA       NA 22.07140       NA
## 4 19.49678 20.41416 20.01655 20.46055 20.63712 20.54149 20.12157 20.54021
## 5 18.68434 19.24157 19.08266 18.35102 18.22513 18.08876 19.04524 19.98076
## 6       NA       NA       NA       NA       NA       NA 18.17092       NA
##      X6D_6    X6D_7    X6D_8    X6D_9    X6E_1    X6E_2    X6E_3    X6E_4
## 1 16.38325 16.60706       NA       NA       NA 15.48811       NA       NA
## 2       NA 22.09701 20.51658 21.57285       NA 22.82601 22.99603 23.72723
## 3       NA 20.36570 19.82115 19.79300       NA       NA       NA 15.44656
## 4 20.53321 19.90789 19.95061 19.64517 20.26774 19.74733 19.81515 19.80937
## 5 19.65537 18.40360 19.38517 18.73883 18.10882 17.61519 18.09828 18.94790
## 6 15.59841       NA       NA 16.25820       NA       NA       NA       NA
##      X6E_5    X6E_6    X6E_7    X6E_8    X6E_9
## 1 15.48788 15.44796       NA       NA       NA
## 2 13.78541       NA 21.64825 19.69567 20.88645
## 3       NA       NA 20.17699 18.78196 19.33404
## 4 19.62211 19.86977 19.80026 19.96206 19.76630
## 5 18.87684 19.06988       NA 18.53583 18.69385
## 6       NA       NA       NA       NA       NA

Small check:

length(unique(fData(peptidesCPTAC3)$Proteins[grepl('_HUMAN_UPS',fData(peptidesCPTAC3)$Proteins)]))
## [1] 46

46 out of a total 48 spiked-in human UPS proteins are present in this dataset with at least one peptide sequence identified.

  1. Remove proteins that are only identified by modified peptides.
  proteinGroups <- read.table(file_proteinGroups, sep="\t", header=TRUE, quote="", comment.char = "")
  only_site <- proteinGroups$Only.identified.by.site
  only_site[is.na(only_site)] <- ""
  removed_proteins <- proteinGroups$Protein.IDs[only_site=="+"]

  sel <- !(as.character(Biobase::fData(peptidesCPTAC3)[,"Proteins"]) %in% as.character(removed_proteins))
  peptidesCPTAC3 <- peptidesCPTAC3[sel]
  1. Retain only those properties in the properties slot that are useful for our further analysis.
fData(peptidesCPTAC3) <- fData(peptidesCPTAC3)[,c("Proteins","Sequence")] #, "ups"
  1. How many times shoud a peptide be identified in order to keep it in the dataset? We require at least 2 identifications of a peptide sequence, as with 1 identification, the models will be perfectly confounded.
minIdentified <- 2
keepers <- rowSums(!is.na(exprs(peptidesCPTAC3)))>=minIdentified
peptidesCPTAC3 <- peptidesCPTAC3[keepers]
head(exprs(peptidesCPTAC3))
##      X6A_1    X6A_2    X6A_3    X6A_4    X6A_5    X6A_6    X6A_7    X6A_8
## 1       NA 15.89625       NA 15.74427 16.11690 16.26360       NA       NA
## 2       NA 23.33130 23.66245 23.91300 15.60841 17.66008 21.89950 20.94838
## 3       NA 21.62554       NA 22.05381       NA 15.02926 20.68958 20.00483
## 4 19.64640 20.09650 20.04574 20.32064 20.75817 20.45584 19.63882 20.01568
## 5 15.85773 15.59100       NA 19.33941 19.55917 19.62396 18.94303 19.31408
## 6       NA       NA       NA       NA 17.42813       NA       NA       NA
##      X6A_9    X6B_1    X6B_2    X6B_3    X6B_4    X6B_5    X6B_6    X6B_7
## 1 16.40301       NA       NA       NA       NA 15.60504 16.04583       NA
## 2 20.96813 23.43180 23.26411 23.64645 24.20048 14.91464       NA 22.20164
## 3 19.81333       NA 21.71777 19.37231 16.85073       NA       NA 20.36880
## 4 19.98197 19.62953 19.95218 20.13390 20.04605 20.30895 20.39173 19.42857
## 5 18.96642       NA 16.09153       NA 19.20895 19.09520 19.49627 19.01121
## 6 16.00602       NA       NA       NA 17.54346       NA       NA       NA
##      X6B_8    X6B_9    X6C_1    X6C_2    X6C_3    X6C_4    X6C_5    X6C_6
## 1 15.53808       NA       NA       NA       NA 16.16971 16.09331 15.76673
## 2 20.62686 21.25133       NA 23.52247 24.08292 23.82946       NA       NA
## 3 20.02875 19.62006       NA 21.60703       NA 20.82764       NA       NA
## 4 19.97507 20.02002 20.13560 19.58226 20.02870 19.84447 20.53751 20.65843
## 5 19.23466 19.15420 16.65312 16.28174 15.45126 18.94057 19.12564 19.46605
## 6       NA 16.52722       NA       NA       NA 17.69237       NA 17.58594
##      X6C_7    X6C_8    X6C_9    X6D_1    X6D_2    X6D_3    X6D_4    X6D_5
## 1       NA 15.68773 15.35170       NA       NA       NA 15.96516 16.06759
## 2 22.30765 20.49290 21.00310       NA 23.52274 23.29155 24.05100 16.88207
## 3 20.33127 19.95157 19.64324       NA       NA       NA 22.07140       NA
## 4 19.49678 20.41416 20.01655 20.46055 20.63712 20.54149 20.12157 20.54021
## 5 18.68434 19.24157 19.08266 18.35102 18.22513 18.08876 19.04524 19.98076
## 6       NA       NA       NA       NA       NA       NA 18.17092       NA
##      X6D_6    X6D_7    X6D_8    X6D_9    X6E_1    X6E_2    X6E_3    X6E_4
## 1 16.38325 16.60706       NA       NA       NA 15.48811       NA       NA
## 2       NA 22.09701 20.51658 21.57285       NA 22.82601 22.99603 23.72723
## 3       NA 20.36570 19.82115 19.79300       NA       NA       NA 15.44656
## 4 20.53321 19.90789 19.95061 19.64517 20.26774 19.74733 19.81515 19.80937
## 5 19.65537 18.40360 19.38517 18.73883 18.10882 17.61519 18.09828 18.94790
## 6 15.59841       NA       NA 16.25820       NA       NA       NA       NA
##      X6E_5    X6E_6    X6E_7    X6E_8    X6E_9
## 1 15.48788 15.44796       NA       NA       NA
## 2 13.78541       NA 21.64825 19.69567 20.88645
## 3       NA       NA 20.17699 18.78196 19.33404
## 4 19.62211 19.86977 19.80026 19.96206 19.76630
## 5 18.87684 19.06988       NA 18.53583 18.69385
## 6       NA       NA       NA       NA       NA
#Remove unused levels
fData(peptidesCPTAC3) <- droplevels(fData(peptidesCPTAC3))
  1. Add the experimental design to the MSnSet object.
rownames(exp_annotation) <- exp_annotation[,"run"]
peptidesCPTAC3 <- MSnbase::MSnSet(exprs=exprs(peptidesCPTAC3), fData=fData(peptidesCPTAC3), pData=exp_annotation)

Check whether our manually preprocessed peptidesCPTAC3 is equal to peptidesCPTAC2.

all(exprs(peptidesCPTAC2)==exprs(peptidesCPTAC3), na.rm=TRUE)
## [1] TRUE
all(fData(peptidesCPTAC2)==fData(peptidesCPTAC3), na.rm=TRUE)
## [1] TRUE
all(pData(peptidesCPTAC2)==pData(peptidesCPTAC3), na.rm=TRUE)
## [1] TRUE

3. Convert the MSnSet object peptidesCPTAC2 into a protdata object proteinsCPTAC.

Caution: this step might take some time! (about 5 minutes on our system). The accession element denotes by which column the individual peptide observations should be grouped.

system.time(proteinsCPTAC <- MSnSet2protdata(peptidesCPTAC2, accession="Proteins"))
##    user  system elapsed 
##   9.899   0.300  10.223

4. Main Analysis

4.1. Fit the model

When your goal is to analyze the CPTAC data in the recommended way (i.e. a robust ridge model with variance shrinkage and M estimation), your model should be constructed look as follows:

system.time(modelCPTAC_RR <- fit.model(protdata=proteinsCPTAC, response="quant_value", fixed=c("condition","lab"), random=c("Sequence","run"), shrinkage.fixed=NULL, weights="Huber", squeezeVar=TRUE))
##    user  system elapsed 
## 416.557   1.065 418.827

Caution: model fitting takes some time with large datasets! On our computer, fitting the robust ridge model took about 13 - 18 minutes.

4.2. Save your results

Saving results is completely analogous to the Francisella example.

4.3. Comparing different types of models

In this section, we fit different types of models to the CPTAC data, analogous to what has been done in Goeminne et al. (2016) in order to compare their performances. We compare an ordinary least squares (OLS) model with and without M estimation (Huber weights) to a ridge model with and without M estimation. As we also want to assess the impact of the empirical Bayes variance estimation, we set squeezeVar=FALSE so that no variance squeezing is performed yet.

In order to do an objective comparison between different approaches, we chose here not to include the “run” effect, because this leads to massive overfitting in the ordinary least squares approaches. This effect will mostly be equal to zero anyways because each repeat in the CPTAC dataset is in fact a technical repeat and therefore the variability in peptide intensities between samples will be very small. When doing a standard robust ridge analysis on true biological data, we advise however to include the “run” effect as a random effect. As the simultanous shrinkage of fixed effects was not yet implemented at the time, we provide “lab” as a random effect in order to get results that are as close as possible to those of Goeminne et al. (2016).

Fit an OLS model (shrinkage.fixed=c(0,0,0,0)) without M estimation (weights=NULL).

system.time(modelCPTAC_Lm <- fit.model(protdata=proteinsCPTAC, response="quant_value", fixed=c("condition","Sequence","lab"), random=NULL, shrinkage.fixed=c(0,0,0,0), weights=NULL, squeezeVar=FALSE))
##    user  system elapsed 
##   3.038   0.058   3.103

Fit an OLS model (shrinkage.fixed=c(0,0,0,0)) with M estimation (weights=Huber).

system.time(modelCPTAC_Lm_M <- fit.model(protdata=proteinsCPTAC, response="quant_value", fixed=c("condition","Sequence","lab"), random=NULL, shrinkage.fixed=c(0,0,0,0), weights="Huber", squeezeVar=FALSE))
##    user  system elapsed 
##  43.976   0.863  45.278

Fit a ridge model (shrinkage.fixed=c(0,1)) without M estimation (weights=NULL).

system.time(modelCPTAC_Ridge <- fit.model(protdata=proteinsCPTAC, response="quant_value", fixed="condition", random=c("Sequence","lab"), shrinkage.fixed=c(0,1), weights=NULL, squeezeVar=FALSE))
##    user  system elapsed 
##  79.554   0.212  79.982

Fit a ridge model (shrinkage.fixed=c(0,1)) with M estimation (weights=Huber).

system.time(modelCPTAC_Ridge_M <- fit.model(protdata=proteinsCPTAC, response="quant_value", fixed="condition", random=c("Sequence","lab"), shrinkage.fixed=c(0,1), weights="Huber", squeezeVar=FALSE))
##    user  system elapsed 
## 307.672   0.765 309.228

As you noticed, an OLS model can be fit by setting shrinkage.fixed=c(0,0,0,0). This indicates that neither the intercept and none of our three fixed effects should be shrunken. The ridge model is here specified by setting shrinkage.fixed=c(0,1). This indicates that the intercept should not be shrunken, but the fixed effect "condition" should be shrunken. Alternatively, one could also set shrinkage.fixed=NULL as the shrinkage of all fixed effects except for the intercept is the default behaviour of MSqRob.

4.4. Inspect the models

attr(getModels(modelCPTAC_RR[1]),"MSqRob_levels")
##  [1] "(Intercept)"                "condition6A"               
##  [3] "condition6B"                "condition6C"               
##  [5] "condition6D"                "condition6E"               
##  [7] "lab1"                       "lab2"                      
##  [9] "lab3"                       "runX6A_2"                  
## [11] "runX6A_4"                   "runX6A_5"                  
## [13] "runX6A_6"                   "runX6A_9"                  
## [15] "runX6B_5"                   "runX6B_6"                  
## [17] "runX6B_8"                   "runX6C_4"                  
## [19] "runX6C_5"                   "runX6C_6"                  
## [21] "runX6C_8"                   "runX6C_9"                  
## [23] "runX6D_4"                   "runX6D_5"                  
## [25] "runX6D_6"                   "runX6D_7"                  
## [27] "runX6E_2"                   "runX6E_5"                  
## [29] "runX6E_6"                   "SequenceAAAAGAGGAGDSGDAVTK"

MSqRob always tries to fit a model, but some models are overparameterized because too many parameters are fit on too few observations. These models have convergence problems and can be removed from the data prior to estimating p-values. This is only relevant for models that perform shrinkage or use any kind of random effect.

Extract the data as a list from the protdata object proteinsCPTAC.

data <- getData(proteinsCPTAC)

4.5. Avanced and optional: check which models would fail to converge by building a linear mixed effects model.

convergence <- lapply(data, function(x){return(tryCatch(lme4::lFormula(formula("quant_value~1+(1|condition)+(1|lab)+(1|Sequence)"),x,control = lme4::lmerControl()), error=function(e){
  return(NA)
}))})

Get the indices of these overparameterized models.

na_indices <- which(is.na(convergence))

See which proteins you removed.

getAccessions(proteinsCPTAC)[na_indices]
##   [1] sp|P38915|SPT8_YEAST                                                                                                                                                                                                                                                   
##   [2] sp|P10622|RLA3_YEAST                                                                                                                                                                                                                                                   
##   [3] sp|P33421|SDH3_YEAST                                                                                                                                                                                                                                                   
##   [4] sp|P33328|SNC2_YEAST                                                                                                                                                                                                                                                   
##   [5] sp|P38555|YPT31_YEAST                                                                                                                                                                                                                                                  
##   [6] sp|P36139|PET10_YEAST                                                                                                                                                                                                                                                  
##   [7] sp|Q12517|DCP1_YEAST                                                                                                                                                                                                                                                   
##   [8] sp|P39516|RS14B_YEAST                                                                                                                                                                                                                                                  
##   [9] sp|Q12078|SMF3_YEAST                                                                                                                                                                                                                                                   
##  [10] sp|P38885|AIM46_YEAST                                                                                                                                                                                                                                                  
##  [11] sp|P40548|SNL1_YEAST                                                                                                                                                                                                                                                   
##  [12] sp|P48363|PFD3_YEAST                                                                                                                                                                                                                                                   
##  [13] sp|P14065|GCY1_YEAST                                                                                                                                                                                                                                                   
##  [14] sp|P46970|NMD5_YEAST                                                                                                                                                                                                                                                   
##  [15] sp|P23201|SPA2_YEAST                                                                                                                                                                                                                                                   
##  [16] sp|P05743|RL26A_YEAST                                                                                                                                                                                                                                                  
##  [17] sp|P0CH09|RL402_YEAST;sp|P0CH08|RL401_YEAST                                                                                                                                                                                                                            
##  [18] sp|P38256|YBU6_YEAST                                                                                                                                                                                                                                                   
##  [19] sp|Q03880|PKR1_YEAST                                                                                                                                                                                                                                                   
##  [20] sp|P38747|OTU2_YEAST                                                                                                                                                                                                                                                   
##  [21] sp|O43137|YB085_YEAST                                                                                                                                                                                                                                                  
##  [22] sp|P40212|RL13B_YEAST                                                                                                                                                                                                                                                  
##  [23] sp|Q04177|UTP5_YEAST                                                                                                                                                                                                                                                   
##  [24] sp|P24031|PPA3_YEAST;sp|P00635|PPA5_YEAST                                                                                                                                                                                                                              
##  [25] sp|P32504|CBF3A_YEAST                                                                                                                                                                                                                                                  
##  [26] sp|P38145|RISA_YEAST                                                                                                                                                                                                                                                   
##  [27] sp|P32803|EMP24_YEAST                                                                                                                                                                                                                                                  
##  [28] sp|Q12040|YO283_YEAST                                                                                                                                                                                                                                                  
##  [29] sp|Q12099|FAL1_YEAST                                                                                                                                                                                                                                                   
##  [30] sp|Q04175|SXM1_YEAST                                                                                                                                                                                                                                                   
##  [31] sp|Q07914|TIM14_YEAST                                                                                                                                                                                                                                                  
##  [32] sp|P30665|MCM4_YEAST                                                                                                                                                                                                                                                   
##  [33] sp|P06704|CDC31_YEAST                                                                                                                                                                                                                                                  
##  [34] sp|P40956|GTS1_YEAST                                                                                                                                                                                                                                                   
##  [35] sp|P53304|SLI1_YEAST                                                                                                                                                                                                                                                   
##  [36] sp|Q04409|EMI2_YEAST                                                                                                                                                                                                                                                   
##  [37] sp|P54858|DOS2_YEAST                                                                                                                                                                                                                                                   
##  [38] sp|P50106|RPA14_YEAST                                                                                                                                                                                                                                                  
##  [39] sp|P53064|RTF1_YEAST                                                                                                                                                                                                                                                   
##  [40] sp|P11746|MCM1_YEAST                                                                                                                                                                                                                                                   
##  [41] sp|P08456|PSS_YEAST                                                                                                                                                                                                                                                    
##  [42] sp|Q03525|TMA23_YEAST                                                                                                                                                                                                                                                  
##  [43] sp|Q12156|AIM7_YEAST                                                                                                                                                                                                                                                   
##  [44] sp|P47771|ALDH2_YEAST                                                                                                                                                                                                                                                  
##  [45] sp|P17898|CPT1_YEAST                                                                                                                                                                                                                                                   
##  [46] sp|P32500|NDC1_YEAST                                                                                                                                                                                                                                                   
##  [47] sp|P32910|RPC6_YEAST                                                                                                                                                                                                                                                   
##  [48] sp|P39998|EDC3_YEAST                                                                                                                                                                                                                                                   
##  [49] sp|Q03529|SCS7_YEAST                                                                                                                                                                                                                                                   
##  [50] sp|Q06449|PIN3_YEAST                                                                                                                                                                                                                                                   
##  [51] sp|P25618|CWH43_YEAST                                                                                                                                                                                                                                                  
##  [52] sp|P53093|YIP4_YEAST                                                                                                                                                                                                                                                   
##  [53] sp|P52893|ALAM_YEAST                                                                                                                                                                                                                                                   
##  [54] sp|Q02753|RL21A_YEAST                                                                                                                                                                                                                                                  
##  [55] sp|Q12672|RL21B_YEAST                                                                                                                                                                                                                                                  
##  [56] sp|P0CT04|IPB2_YEAST                                                                                                                                                                                                                                                   
##  [57] sp|P20436|RPAB3_YEAST                                                                                                                                                                                                                                                  
##  [58] sp|P89105|CTR9_YEAST                                                                                                                                                                                                                                                   
##  [59] sp|P05744|RL33A_YEAST                                                                                                                                                                                                                                                  
##  [60] sp|P41056|RL33B_YEAST                                                                                                                                                                                                                                                  
##  [61] sp|Q07897|CMS1_YEAST                                                                                                                                                                                                                                                   
##  [62] sp|P48234|NOL10_YEAST                                                                                                                                                                                                                                                  
##  [63] sp|P17442|PHO81_YEAST                                                                                                                                                                                                                                                  
##  [64] sp|Q06132|SGD1_YEAST                                                                                                                                                                                                                                                   
##  [65] sp|Q07451|YET3_YEAST                                                                                                                                                                                                                                                   
##  [66] sp|Q08925|MRN1_YEAST                                                                                                                                                                                                                                                   
##  [67] sp|O13516|RS9A_YEAST                                                                                                                                                                                                                                                   
##  [68] sp|P38063|KPR4_YEAST                                                                                                                                                                                                                                                   
##  [69] sp|P47019|ALB1_YEAST                                                                                                                                                                                                                                                   
##  [70] sp|P46654|RSSA2_YEAST                                                                                                                                                                                                                                                  
##  [71] sp|Q06580|MLC2_YEAST                                                                                                                                                                                                                                                   
##  [72] sp|P38066|RIB1_YEAST                                                                                                                                                                                                                                                   
##  [73] sp|P41939|IDHC_YEAST;sp|P53982|IDHH_YEAST                                                                                                                                                                                                                              
##  [74] sp|P47172|YJ9I_YEAST                                                                                                                                                                                                                                                   
##  [75] sp|P40541|SCC3_YEAST                                                                                                                                                                                                                                                   
##  [76] sp|P18852|GBG_YEAST                                                                                                                                                                                                                                                    
##  [77] sp|Q07540|FRDA_YEAST                                                                                                                                                                                                                                                   
##  [78] sp|A5Z2X5|YP010_YEAST                                                                                                                                                                                                                                                  
##  [79] sp|Q07915|RLP24_YEAST                                                                                                                                                                                                                                                  
##  [80] sp|P56628|RL22B_YEAST                                                                                                                                                                                                                                                  
##  [81] sp|P38697|IMDH2_YEAST                                                                                                                                                                                                                                                  
##  [82] sp|P52490|UBC13_YEAST                                                                                                                                                                                                                                                  
##  [83] sp|P53250|TWF1_YEAST                                                                                                                                                                                                                                                   
##  [84] sp|Q99385|VCX1_YEAST                                                                                                                                                                                                                                                   
##  [85] sp|P33448|TOM6_YEAST                                                                                                                                                                                                                                                   
##  [86] sp|Q04773|YMW4_YEAST                                                                                                                                                                                                                                                   
##  [87] sp|Q12273|YO11B_YEAST                                                                                                                                                                                                                                                  
##  [88] sp|Q03434|YM12B_YEAST;sp|Q12141|YG11B_YEAST;sp|P47100|YJ12B_YEAST;sp|P47099|YJ12A_YEAST                                                                                                                                                                                
##  [89] sp|P80967|TOM5_YEAST                                                                                                                                                                                                                                                   
##  [90] sp|P40030|ERG28_YEAST                                                                                                                                                                                                                                                  
##  [91] sp|Q08746|RRS1_YEAST                                                                                                                                                                                                                                                   
##  [92] sp|P11076|ARF1_YEAST                                                                                                                                                                                                                                                   
##  [93] sp|P25642|IMG2_YEAST                                                                                                                                                                                                                                                   
##  [94] sp|P38759|VPS29_YEAST                                                                                                                                                                                                                                                  
##  [95] sp|Q04119|PPN1_YEAST                                                                                                                                                                                                                                                   
##  [96] sp|Q12690|RL13A_YEAST                                                                                                                                                                                                                                                  
##  [97] sp|P53859|CSL4_YEAST                                                                                                                                                                                                                                                   
##  [98] sp|P39726|GCSH_YEAST                                                                                                                                                                                                                                                   
##  [99] sp|Q01590|SED5_YEAST                                                                                                                                                                                                                                                   
## [100] sp|P51862|ROM2_YEAST                                                                                                                                                                                                                                                   
## [101] sp|P38760|MIP6_YEAST                                                                                                                                                                                                                                                   
## [102] sp|P40005|PFD2_YEAST                                                                                                                                                                                                                                                   
## [103] sp|Q06497|ANT1_YEAST                                                                                                                                                                                                                                                   
## [104] sp|Q3E7Y3|RS22B_YEAST                                                                                                                                                                                                                                                  
## [105] sp|P32835|GSP1_YEAST                                                                                                                                                                                                                                                   
## [106] sp|P32905|RSSA1_YEAST                                                                                                                                                                                                                                                  
## [107] sp|P11632|NHP6A_YEAST                                                                                                                                                                                                                                                  
## [108] sp|P11633|NHP6B_YEAST                                                                                                                                                                                                                                                  
## [109] sp|P39002|LCF3_YEAST                                                                                                                                                                                                                                                   
## [110] sp|P53507|TOM7_YEAST                                                                                                                                                                                                                                                   
## [111] sp|Q12284|ERV2_YEAST                                                                                                                                                                                                                                                   
## [112] sp|P32770|NRP1_YEAST                                                                                                                                                                                                                                                   
## [113] sp|P32857|PTM1_YEAST                                                                                                                                                                                                                                                   
## [114] sp|P32477|GSH1_YEAST                                                                                                                                                                                                                                                   
## [115] sp|P47138|DPH4_YEAST                                                                                                                                                                                                                                                   
## [116] sp|Q3E834|YOS1_YEAST                                                                                                                                                                                                                                                   
## [117] sp|Q05123|ARP9_YEAST                                                                                                                                                                                                                                                   
## [118] sp|P07275|PUT2_YEAST                                                                                                                                                                                                                                                   
## [119] sp|P0C2I9|YP12B_YEAST;sp|Q99337|YN12B_YEAST;sp|Q12501|YO22B_YEAST;sp|Q12337|YG21B_YEAST;sp|Q12113|YO21B_YEAST;sp|Q03494|YD22B_YEAST;sp|P25384|YC21B_YEAST;sp|P0CX64|YG22B_YEAST;sp|P0CX63|YF21B_YEAST;sp|P0C2J3|YL21B_YEAST;sp|Q12491|YB21B_YEAST;sp|Q12472|YD21B_YEAST
## [120] sp|P25627|BUD23_YEAST                                                                                                                                                                                                                                                  
## [121] sp|Q06991|PUN1_YEAST                                                                                                                                                                                                                                                   
## [122] sp|P38616|YGP1_YEAST                                                                                                                                                                                                                                                   
## [123] sp|Q03178|PIR1_YEAST                                                                                                                                                                                                                                                   
## [124] sp|Q07746|YD242_YEAST                                                                                                                                                                                                                                                  
## [125] sp|Q3E747|YL363_YEAST                                                                                                                                                                                                                                                  
## [126] sp|Q12030|TAF10_YEAST                                                                                                                                                                                                                                                  
## [127] sp|P40087|DDI1_YEAST                                                                                                                                                                                                                                                   
## [128] sp|O94742|SEM1_YEAST                                                                                                                                                                                                                                                   
## [129] sp|P28496|LAC1_YEAST                                                                                                                                                                                                                                                   
## [130] sp|P36516|RM03_YEAST                                                                                                                                                                                                                                                   
## [131] sp|P53121|FLC3_YEAST                                                                                                                                                                                                                                                   
## [132] sp|Q12213|RL7B_YEAST                                                                                                                                                                                                                                                   
## [133] sp|P05737|RL7A_YEAST                                                                                                                                                                                                                                                   
## [134] sp|Q06508|LOA1_YEAST                                                                                                                                                                                                                                                   
## [135] sp|P33416|HSP78_YEAST                                                                                                                                                                                                                                                  
## [136] sp|P25582|SPB1_YEAST                                                                                                                                                                                                                                                   
## [137] sp|P53847|DSL1_YEAST                                                                                                                                                                                                                                                   
## [138] sp|Q07505|DLHH_YEAST                                                                                                                                                                                                                                                   
## [139] sp|Q12160|YP063_YEAST                                                                                                                                                                                                                                                  
## [140] sp|P32349|RPC3_YEAST                                                                                                                                                                                                                                                   
## [141] sp|P36135|UTH1_YEAST                                                                                                                                                                                                                                                   
## [142] sp|P32478|HS150_YEAST                                                                                                                                                                                                                                                  
## [143] sp|Q12255|NYV1_YEAST                                                                                                                                                                                                                                                   
## [144] sp|Q03305|RMT2_YEAST                                                                                                                                                                                                                                                   
## [145] sp|P05747|RL29_YEAST                                                                                                                                                                                                                                                   
## [146] sp|P18851|GBB_YEAST                                                                                                                                                                                                                                                    
## [147] sp|P53885|MDG1_YEAST                                                                                                                                                                                                                                                   
## [148] sp|P46956|PHO86_YEAST                                                                                                                                                                                                                                                  
## [149] P69905ups|HBA_HUMAN_UPS                                                                                                                                                                                                                                                
## [150] sp|P20049|TYR1_YEAST                                                                                                                                                                                                                                                   
## [151] sp|Q06543|GPN3_YEAST                                                                                                                                                                                                                                                   
## [152] sp|Q3E833|PCC1_YEAST                                                                                                                                                                                                                                                   
## [153] sp|P25039|EFGM_YEAST                                                                                                                                                                                                                                                   
## [154] sp|P81449|ATPJ_YEAST                                                                                                                                                                                                                                                   
## [155] sp|P53890|BNI5_YEAST                                                                                                                                                                                                                                                   
## [156] sp|P06100|NOT2_YEAST                                                                                                                                                                                                                                                   
## [157] sp|Q03020|ISU1_YEAST                                                                                                                                                                                                                                                   
## [158] sp|P25358|ELO2_YEAST                                                                                                                                                                                                                                                   
## [159] sp|P17558|RTPT_YEAST                                                                                                                                                                                                                                                   
## [160] sp|P38985|SRP14_YEAST                                                                                                                                                                                                                                                  
## [161] sp|P34909|NOT4_YEAST                                                                                                                                                                                                                                                   
## [162] sp|P13574|STE12_YEAST                                                                                                                                                                                                                                                  
## [163] sp|P22214|SEC22_YEAST                                                                                                                                                                                                                                                  
## [164] sp|Q08004|BUD20_YEAST                                                                                                                                                                                                                                                  
## [165] sp|P53173|ERV14_YEAST                                                                                                                                                                                                                                                  
## [166] sp|P40043|RGI1_YEAST                                                                                                                                                                                                                                                   
## [167] sp|Q08951|AP3D_YEAST                                                                                                                                                                                                                                                   
## [168] sp|P39938|RS26A_YEAST                                                                                                                                                                                                                                                  
## [169] P41159ups|LEP_HUMAN_UPS                                                                                                                                                                                                                                                
## [170] sp|P34248|YKK0_YEAST                                                                                                                                                                                                                                                   
## [171] sp|P46964|OST2_YEAST                                                                                                                                                                                                                                                   
## [172] sp|P32529|RPA12_YEAST                                                                                                                                                                                                                                                  
## [173] sp|P39013|END3_YEAST                                                                                                                                                                                                                                                   
## [174] sp|P46990|RL17B_YEAST                                                                                                                                                                                                                                                  
## [175] sp|P05740|RL17A_YEAST                                                                                                                                                                                                                                                  
## [176] sp|P53868|ALG9_YEAST                                                                                                                                                                                                                                                   
## [177] sp|P12630|BAR1_YEAST                                                                                                                                                                                                                                                   
## [178] sp|P56508|SNA2_YEAST                                                                                                                                                                                                                                                   
## 1515 Levels: O00762ups|UBE2C_HUMAN_UPS ... sp|Q99385|VCX1_YEAST

Both UPS proteins in this list are only identified by one peptide.

getData(proteinsCPTAC["P69905ups|HBA_HUMAN_UPS"])
##    quant_value        Sequence   run condition lab
## 13    16.34396 VGAHAGEYGAEALER X6B_4        6B   2
## 20    18.98716 VGAHAGEYGAEALER X6C_2        6C   1
## 22    16.89253 VGAHAGEYGAEALER X6C_4        6C   2
## 23    18.10633 VGAHAGEYGAEALER X6C_5        6C   2
## 24    17.52955 VGAHAGEYGAEALER X6C_6        6C   2
## 31    18.97273 VGAHAGEYGAEALER X6D_4        6D   2
## 32    16.95212 VGAHAGEYGAEALER X6D_5        6D   2
## 33    19.29081 VGAHAGEYGAEALER X6D_6        6D   2
## 35    16.09004 VGAHAGEYGAEALER X6D_8        6D   3
## 37    17.82644 VGAHAGEYGAEALER X6E_1        6E   1
## 40    20.41722 VGAHAGEYGAEALER X6E_4        6E   2
## 41    17.83335 VGAHAGEYGAEALER X6E_5        6E   2
getData(proteinsCPTAC["P41159ups|LEP_HUMAN_UPS"])
##    quant_value           Sequence   run condition lab
## 31    15.32290 VTGLDFIPGLHPILTLSK X6D_4        6D   2
## 32    15.85747 VTGLDFIPGLHPILTLSK X6D_5        6D   2
## 40    19.09555 VTGLDFIPGLHPILTLSK X6E_4        6E   2
## 41    20.80418 VTGLDFIPGLHPILTLSK X6E_5        6E   2
## 42    20.65861 VTGLDFIPGLHPILTLSK X6E_6        6E   2
## 43    16.17743 VTGLDFIPGLHPILTLSK X6E_7        6E   3
## 44    17.42877 VTGLDFIPGLHPILTLSK X6E_8        6E   3
## 45    18.18318 VTGLDFIPGLHPILTLSK X6E_9        6E   3

Remove the overparameterized models from the protLM object modelCPTAC_RR and from the data.

modelCPTAC_RR_cleaned <- modelCPTAC_RR[-na_indices]
data <- data[-na_indices]

5. Statistical inference

Construct the contrast matrix L for inferring on the fold changes of interest.

L <- makeContrast(contrasts=c("condition6B-condition6A","condition6C-condition6A","condition6D-condition6A","condition6E-condition6A","condition6C-condition6B","condition6D-condition6B","condition6E-condition6B","condition6D-condition6C","condition6E-condition6C","condition6E-condition6D"),
                  levels=c("condition6A","condition6B","condition6C","condition6D","condition6E"))

Make the DA rankings for the ordinary least squares model (no ridge, no squeezed variances, no M estimation).

#modelCPTAC_Lm <- squeezePars(modelCPTAC_Lm, squeezeVar=FALSE)
lm_normal <- test.protLMcontrast(modelCPTAC_Lm, L)
lm_normal <- prot.p.adjust(lm_normal)
lm_normal <- prot.signif(lm_normal)

Make the DA rankings for the ordinary least squares model with squeezed variances (no ridge, no M estimation).

modelCPTAC_Lm_Sq <- squeezePars(modelCPTAC_Lm, squeezeVar=TRUE)
lm_Sq <- test.protLMcontrast(modelCPTAC_Lm_Sq, L)
lm_Sq <- prot.p.adjust(lm_Sq)
lm_Sq <- prot.signif(lm_Sq)

Make the DA rankings for the ordinary least squares model with squeezed variances and M estimation (no ridge).

modelCPTAC_Lm_M_Sq <- squeezePars(modelCPTAC_Lm_M, squeezeVar=TRUE)
## Warning: 2 very small variances detected, have been offset away from zero
lm_SqM <- test.protLMcontrast(modelCPTAC_Lm_M_Sq, L)
lm_SqM <- prot.p.adjust(lm_SqM)
lm_SqM <- prot.signif(lm_SqM)

Make the DA rankings for the ridge regression model (no squeezed variances, no M estimation).

#modelCPTAC_Ridge <- squeezePars(modelCPTAC_Ridge, squeezeVar=FALSE)
ridge <- test.protLMcontrast(modelCPTAC_Ridge, L)
ridge <- prot.p.adjust(ridge)
ridge <- prot.signif(ridge)

Make the DA rankings for the ridge regression model with squeezed variances (no M estimation).

modelCPTAC_Ridge_Sq <- squeezePars(modelCPTAC_Ridge, squeezeVar=TRUE)
ridgeSq <- test.protLMcontrast(modelCPTAC_Ridge_Sq, L)
ridgeSq <- prot.p.adjust(ridgeSq)
ridgeSq <- prot.signif(ridgeSq)

Make the DA rankings for the ridge regression model with squeezed variances and M estimation. This is our robust ridge (RR) approach (the default MSqRob approach, although we included “lab” here as a random effect and we did not include a “run” effect).

modelCPTAC_Ridge_M_Sq <- squeezePars(modelCPTAC_Ridge_M, squeezeVar=TRUE)
ridgeSqM <- test.protLMcontrast(modelCPTAC_Ridge_M, L)
ridgeSqM <- prot.p.adjust(ridgeSqM)
ridgeSqM <- prot.signif(ridgeSqM)

Our standard approach is to calculate the degrees of freedom via the trace of the Hat matrix. One could as well use custom degrees of freedom, for example in order to work more conservatively. Indeed, if we consider the peptides as pseudo-replicates, our experimental units are the different spike-in conditions in each lab. The number of degrees of freedom is then the number of combinations lab x condition minus the number of parameters estimated for lab (i.e. the number of labs minus 1) minus the number of parameters estimated for condition (i.e. the number of conditions minus 1) minus 1 for the intercept. We show an example for the modelCPTAC_RR model.

custom_dfs <- sapply(data, function(x){return(length(unique(paste0(x$condition,x$lab)))-(length(unique(x$condition))-1)-(length(unique(x$lab))-1)-1)})
custom_dfs
##    [1] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
##   [35] 1 8 8 8 8 8 3 7 8 8 8 8 7 8 5 8 8 8 8 4 8 8 8 8 8 8 8 8 8 8 7 8 8 8
##   [69] 8 8 8 8 8 4 8 8 8 8 8 8 8 4 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
##  [103] 8 8 8 8 8 5 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 7 8 8 8 8 7 8 8 4 8 8 8 8
##  [137] 8 8 8 8 8 8 8 8 8 8 8 8 8 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 5 8 8 8
##  [171] 8 8 8 8 8 6 8 8 8 8 7 7 8 8 8 8 8 7 8 8 8 4 8 8 8 7 8 8 8 8 8 8 8 8
##  [205] 8 8 8 8 8 8 8 2 8 8 8 8 8 8 8 8 8 8 8 8 6 8 8 8 8 4 8 8 8 8 8 8 8 8
##  [239] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 4 8 8 8 8 8 7 8 8 8 8 8 8 8 5 8
##  [273] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 5 8 8 8 8 8 8 8 8 8 8 1 8 8 8 8 7 8
##  [307] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 1 8 8 8 6 4 7 8 8 6 8 8 6 8 7 8 5 5
##  [341] 8 8 8 8 8 8 8 8 1 8 8 8 7 2 8 8 8 8 8 8 5 8 6 8 8 8 7 8 8 8 7 5 8 8
##  [375] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 7 5 8 8 8 8 8 8 8 8 8 8 2 8 8 8 8 3 8 8
##  [409] 3 8 8 8 8 8 8 8 7 3 8 7 0 8 8 1 5 8 7 8 8 8 6 8 8 8 8 8 8 8 8 8 3 8
##  [443] 8 8 8 8 8 6 4 4 8 8 8 8 8 8 8 8 8 8 8 8 4 8 2 8 8 8 8 8 8 8 8 8 8 3
##  [477] 8 8 8 5 8 8 8 7 8 8 8 5 8 4 8 8 8 8 8 5 5 8 5 8 8 8 8 8 5 4 5 8 8 8
##  [511] 8 7 7 7 8 8 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 4 8 8 8 8 7 8 7 8 8
##  [545] 8 7 8 8 8 8 8 6 8 8 8 7 8 8 8 8 8 8 8 8 4 8 8 7 8 8 8 8 8 8 8 8 8 4
##  [579] 4 5 7 8 8 8 8 8 7 8 8 8 8 2 8 8 4 5 8 8 8 8 5 5 8 8 4 8 8 8 8 8 8 8
##  [613] 8 7 4 8 8 8 8 8 7 8 8 7 8 8 8 8 8 8 8 6 7 8 5 8 8 8 8 8 6 8 2 8 8 8
##  [647] 8 7 8 8 8 8 8 8 7 8 7 8 5 8 8 8 8 8 7 8 8 7 8 8 8 8 8 2 2 8 7 8 7 4
##  [681] 8 8 4 8 4 8 8 8 8 8 8 5 8 8 8 8 8 8 8 8 8 8 8 7 8 8 8 8 8 8 8 8 7 8
##  [715] 8 7 8 8 8 6 8 8 8 8 8 8 8 8 0 8 6 7 6 8 8 8 8 8 8 4 4 8 8 4 8 8 7 3
##  [749] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 4 8 8 8 7 8 8 8 8 8 6 8
##  [783] 8 8 8 8 8 8 8 8 7 8 8 8 7 8 6 8 8 8 5 8 8 2 5 8 8 8 6 8 8 8 6 8 8 8
##  [817] 8 8 8 8 5 8 8 8 8 8 5 5 8 7 3 8 8 8 7 8 8 8 8 8 5 8 2 8 8 8 7 8 8 0
##  [851] 6 8 8 3 8 8 6 4 8 5 7 8 8 8 8 8 7 7 5 8 8 7 8 8 8 8 8 7 8 8 6 8 8 8
##  [885] 4 7 8 8 8 8 8 8 8 8 8 1 8 6 8 8 8 8 8 8 7 7 8 8 8 8 8 8 8 8 7 8 7 8
##  [919] 8 8 8 8 8 8 8 7 5 8 7 8 4 8 8 5 7 6 8 8 8 5 8 8 8 8 8 8 2 8 8 4 8 8
##  [953] 8 8 8 8 6 8 8 8 6 7 8 8 8 7 8 2 8 5 8 7 8 8 8 7 8 8 1 8 5 8 7 4 7 5
##  [987] 8 8 8 3 8 8 8 7 8 8 8 5 8 8 8 6 8 1 5 8 8 1 8 4 7 8 8 4 8 8 7 8 8 8
## [1021] 5 1 8 6 4 5 8 8 8 6 5 8 8 8 8 8 8 8 8 5 8 6 8 8 7 7 8 8 2 8 4 8 8 8
## [1055] 8 8 8 8 4 8 8 6 8 8 8 8 8 4 8 8 8 8 4 4 8 8 3 8 8 8 4 8 3 8 8 8 4 2
## [1089] 3 8 7 3 8 7 8 6 4 7 8 8 6 6 7 4 6 7 4 8 7 5 8 4 4 4 4 8 8 8 8 5 8 4
## [1123] 8 6 8 6 8 6 8 8 7 8 7 8 8 6 8 7 4 2 8 8 6 8 2 6 8 8 8 7 8 6 8 8 8 3
## [1157] 8 6 8 8 8 3 3 8 8 8 8 8 8 8 7 7 8 6 8 7 8 8 8 5 8 8 8 8 8 4 7 8 8 4
## [1191] 8 7 4 4 4 4 8 8 4 8 2 7 1 4 8 4 0 6 8 4 5 7 6 3 4 8 8 0 8 0 6 5 8 8
## [1225] 0 1 8 4 8 4 4 8 7 8 4 4 5 8 4 4 5 5 8 8 8 1 8 8 5 7 6 4 8 8 8 7 8 6
## [1259] 6 8 1 2 7 3 6 6 6 8 7 8 8 6 8 3 4 8 8 5 7 7 8 8 2 8 6 4 8 7 7 7 4 4
## [1293] 5 4 5 4 5 1 6 8 5 3 5 8 8 5 6 8 3 8 7 3 8 8 8 8 3 3 8 8 8 1 8 8 8 5
## [1327] 8 5 7 8 4 8 4 6 7 3 8
ridgeSqM_custom <- test.protLMcontrast(modelCPTAC_RR, L, custom_dfs = custom_dfs)
ridgeSqM_custom <- prot.p.adjust(ridgeSqM_custom)
ridgeSqM_custom <- prot.signif(ridgeSqM_custom)

6. Diagnostic plots and tables

In order to be able to make diagnostic plots and tables, load the required functions from our GitHub page.

source("https://raw.githubusercontent.com/statOmics/MSqRob/master/vignettes/plotfunctions.R")
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

6.1. Make diagnostic tables

Extra: add a column containing TRUE if the corresponding protein is a spiked-in UPS1 protein and FALSE if it is a yeast protein. For this, we use the addTPcol function. pattern indicates for the pattern in the protein name that points to a true positive and TPcolname is the name to be given to the added column.

lm_normal <- addTPcol(lm_normal, pattern="UPS", TPcolname="ups")
lm_Sq <- addTPcol(lm_Sq, pattern="UPS", TPcolname="ups")
lm_SqM <- addTPcol(lm_SqM, pattern="UPS", TPcolname="ups")
ridge <- addTPcol(ridge, pattern="UPS", TPcolname="ups")
ridgeSq <- addTPcol(ridgeSq, pattern="UPS", TPcolname="ups")
ridgeSqM <- addTPcol(ridgeSqM, pattern="UPS", TPcolname="ups")

The makeSummary function allows to evaluate the performance of all methods on the CPTAC dataset. For this, we need to create a vector that contains the theoretical log2 fold changes of the spiked-in UPS1 proteins.

upratio <- c(1.565597,3.137504,4.744161,6.321928,1.571906,
             3.178564,4.756331,1.60665,3.18442,1.577767)

Create the summary objects.

summary_lm <- makeSummary(lm_normal, upratio, TPcol="ups")
summary_lmSq <- makeSummary(lm_Sq, upratio, TPcol="ups")
summary_lmSqM <- makeSummary(lm_SqM, upratio, TPcol="ups")
summary_ridge <- makeSummary(ridge, upratio, TPcol="ups")
summary_ridgeSq <- makeSummary(ridgeSq, upratio, TPcol="ups")
summary_ridgeSqM <- makeSummary(ridgeSqM, upratio, TPcol="ups")

Results

summary_lm
##                                bias0       bias1       sd0       sd1
## condition6B-condition6A -0.049856737 -0.26374480 0.2178967 0.5458871
## condition6C-condition6A -0.066602648 -0.35999656 0.2263513 0.6987931
## condition6D-condition6A -0.046040848 -0.33193686 0.3014827 0.8405903
## condition6E-condition6A -0.130322615 -0.46343147 0.3826877 0.8998965
## condition6C-condition6B -0.015086174 -0.02559164 0.1893331 0.3421951
## condition6D-condition6B  0.003024942 -0.01200667 0.3426266 0.4939676
## condition6E-condition6B -0.083155730 -0.15369149 0.4391523 0.7370612
## condition6D-condition6C  0.019111062  0.01573994 0.3271477 0.2814700
## condition6E-condition6C -0.065348662 -0.11069860 0.4329431 0.5299769
## condition6E-condition6D -0.082512359 -0.05807474 0.2894723 0.5719607
##                              mad0      mad1     RMSE0     RMSE1 TP5proc
## condition6B-condition6A 0.1414074 0.3276234 0.2234547 0.5999277      28
## condition6C-condition6A 0.1525443 0.4102281 0.2358719 0.7782683      38
## condition6D-condition6A 0.1965858 0.5441178 0.3048755 0.8939292      39
## condition6E-condition6A 0.2473029 0.4309940 0.4041443 1.0021662      40
## condition6C-condition6B 0.1241324 0.2474419 0.1898682 0.3392508      39
## condition6D-condition6B 0.2323222 0.2924419 0.3425220 0.4884697      42
## condition6E-condition6B 0.2829725 0.3493636 0.4468065 0.7446700      43
## condition6D-condition6C 0.2207689 0.1905404 0.3275930 0.2787697      42
## condition6E-condition6C 0.2857494 0.2645359 0.4376989 0.5356193      44
## condition6E-condition6D 0.1747342 0.1914513 0.3009065 0.5686827      44
##                         FP5proc TN5proc FN5proc
## condition6B-condition6A       1    1451      11
## condition6C-condition6A       5    1447       2
## condition6D-condition6A      40    1413       1
## condition6E-condition6A     292    1154       0
## condition6C-condition6B       1    1452       5
## condition6D-condition6B      56    1397       2
## condition6E-condition6B     308    1136       1
## condition6D-condition6C      46    1407       3
## condition6E-condition6C     262    1182       1
## condition6E-condition6D      30    1420       2
summary_lmSq
##                                bias0       bias1       sd0       sd1
## condition6B-condition6A -0.049856737 -0.26374480 0.2178967 0.5458871
## condition6C-condition6A -0.066602648 -0.35999656 0.2263513 0.6987931
## condition6D-condition6A -0.046040848 -0.33193686 0.3014827 0.8405903
## condition6E-condition6A -0.130322615 -0.46343147 0.3826877 0.8998965
## condition6C-condition6B -0.015086174 -0.02559164 0.1893331 0.3421951
## condition6D-condition6B  0.003024942 -0.01200667 0.3426266 0.4939676
## condition6E-condition6B -0.083155730 -0.15369149 0.4391523 0.7370612
## condition6D-condition6C  0.019111062  0.01573994 0.3271477 0.2814700
## condition6E-condition6C -0.065348662 -0.11069860 0.4329431 0.5299769
## condition6E-condition6D -0.082512359 -0.05807474 0.2894723 0.5719607
##                              mad0      mad1     RMSE0     RMSE1 TP5proc
## condition6B-condition6A 0.1414074 0.3276234 0.2234547 0.5999277      28
## condition6C-condition6A 0.1525443 0.4102281 0.2358719 0.7782683      38
## condition6D-condition6A 0.1965858 0.5441178 0.3048755 0.8939292      39
## condition6E-condition6A 0.2473029 0.4309940 0.4041443 1.0021662      40
## condition6C-condition6B 0.1241324 0.2474419 0.1898682 0.3392508      39
## condition6D-condition6B 0.2323222 0.2924419 0.3425220 0.4884697      42
## condition6E-condition6B 0.2829725 0.3493636 0.4468065 0.7446700      43
## condition6D-condition6C 0.2207689 0.1905404 0.3275930 0.2787697      43
## condition6E-condition6C 0.2857494 0.2645359 0.4376989 0.5356193      44
## condition6E-condition6D 0.1747342 0.1914513 0.3009065 0.5686827      44
##                         FP5proc TN5proc FN5proc
## condition6B-condition6A       1    1451      11
## condition6C-condition6A       4    1448       2
## condition6D-condition6A      41    1412       1
## condition6E-condition6A     279    1167       0
## condition6C-condition6B       0    1453       5
## condition6D-condition6B      53    1400       2
## condition6E-condition6B     299    1145       1
## condition6D-condition6C      45    1408       2
## condition6E-condition6C     255    1189       1
## condition6E-condition6D      26    1424       2
summary_lmSqM
##                                bias0       bias1       sd0       sd1
## condition6B-condition6A -0.051102312 -0.22068964 0.2061248 0.6441525
## condition6C-condition6A -0.076178729 -0.22020142 0.2146940 0.7579452
## condition6D-condition6A -0.070823770 -0.16107843 0.2750647 0.8758196
## condition6E-condition6A -0.178240910 -0.27070587 0.3621340 0.9359432
## condition6C-condition6B -0.023217613  0.06777870 0.1705309 0.2873883
## condition6D-condition6B -0.020434384  0.10905664 0.3149625 0.4290169
## condition6E-condition6B -0.130350435 -0.02728519 0.4244712 0.7205215
## condition6D-condition6C  0.004014907  0.04371379 0.3020534 0.2401626
## condition6E-condition6C -0.104370563 -0.07846443 0.4237050 0.5577448
## condition6E-condition6D -0.106267679 -0.04940243 0.2782478 0.6298645
##                              mad0      mad1     RMSE0     RMSE1 TP5proc
## condition6B-condition6A 0.1226133 0.2620543 0.2122961 0.6730505      32
## condition6C-condition6A 0.1311001 0.4165869 0.2277388 0.7801330      38
## condition6D-condition6A 0.1616557 0.5340900 0.2839445 0.8796759      39
## condition6E-condition6A 0.2146158 0.3733051 0.4035098 0.9630013      40
## condition6C-condition6B 0.1073873 0.1729820 0.1720460 0.2920769      41
## condition6D-condition6B 0.1985114 0.2100466 0.3155165 0.4379107      43
## condition6E-condition6B 0.2674615 0.2322210 0.4438944 0.7128091      43
## condition6D-condition6C 0.1944814 0.1533858 0.3019761 0.2414689      43
## condition6E-condition6C 0.2610014 0.1585251 0.4362279 0.5570665      44
## condition6E-condition6D 0.1681527 0.1582347 0.2977604 0.6249363      44
##                         FP5proc TN5proc FN5proc
## condition6B-condition6A       6    1446       7
## condition6C-condition6A      17    1435       2
## condition6D-condition6A      96    1356       1
## condition6E-condition6A     576     870       0
## condition6C-condition6B       2    1451       3
## condition6D-condition6B     114    1339       1
## condition6E-condition6B     542     902       1
## condition6D-condition6C      79    1374       2
## condition6E-condition6C     480     964       1
## condition6E-condition6D     180    1270       2
summary_ridge
##                                bias0       bias1       sd0       sd1
## condition6B-condition6A -0.022821204 -0.28651011 0.1285001 0.5510196
## condition6C-condition6A -0.031517517 -0.40657348 0.1363416 0.7167196
## condition6D-condition6A -0.025406053 -0.40840787 0.1898219 0.8732607
## condition6E-condition6A -0.058529496 -0.56970162 0.2660547 0.9688576
## condition6C-condition6B -0.007310000 -0.08452214 0.1225707 0.4156455
## condition6D-condition6B -0.004071671 -0.11608482 0.2272722 0.6692130
## condition6E-condition6B -0.039083209 -0.29294585 0.3178031 0.9618865
## condition6D-condition6C  0.003774039 -0.03025730 0.2192206 0.3371939
## condition6E-condition6C -0.030789519 -0.19193587 0.3110181 0.6221030
## condition6E-condition6D -0.033137803 -0.09782654 0.1959141 0.5637166
##                               mad0      mad1     RMSE0     RMSE1 TP5proc
## condition6B-condition6A 0.02986928 0.2967893 0.1304674 0.6147564      28
## condition6C-condition6A 0.04068911 0.4511841 0.1398914 0.8161782      38
## condition6D-condition6A 0.05174847 0.5286015 0.1914499 0.9541051      39
## condition6E-condition6A 0.09720540 0.4315930 0.2723269 1.1134531      40
## condition6C-condition6B 0.02228200 0.3069350 0.1227465 0.4194982      37
## condition6D-condition6B 0.05551943 0.2833159 0.2272306 0.6716721      42
## condition6E-condition6B 0.09570841 0.3440744 0.3200883 0.9949950      43
## condition6D-condition6C 0.04909078 0.1920756 0.2191778 0.3347963      42
## condition6E-condition6C 0.09226521 0.2562150 0.3124314 0.6443999      43
## condition6E-condition6D 0.05357493 0.1897926 0.1986304 0.5660727      42
##                         FP5proc TN5proc FN5proc
## condition6B-condition6A       0    1456      11
## condition6C-condition6A       1    1454       2
## condition6D-condition6A      27    1429       1
## condition6E-condition6A     178    1271       0
## condition6C-condition6B       0    1456       7
## condition6D-condition6B      31    1425       2
## condition6E-condition6B     190    1257       1
## condition6D-condition6C      28    1428       3
## condition6E-condition6C     148    1299       2
## condition6E-condition6D      17    1437       4
summary_ridgeSq
##                                bias0       bias1       sd0       sd1
## condition6B-condition6A -0.022821204 -0.28651011 0.1285001 0.5510196
## condition6C-condition6A -0.031517517 -0.40657348 0.1363416 0.7167196
## condition6D-condition6A -0.025406053 -0.40840787 0.1898219 0.8732607
## condition6E-condition6A -0.058529496 -0.56970162 0.2660547 0.9688576
## condition6C-condition6B -0.007310000 -0.08452214 0.1225707 0.4156455
## condition6D-condition6B -0.004071671 -0.11608482 0.2272722 0.6692130
## condition6E-condition6B -0.039083209 -0.29294585 0.3178031 0.9618865
## condition6D-condition6C  0.003774039 -0.03025730 0.2192206 0.3371939
## condition6E-condition6C -0.030789519 -0.19193587 0.3110181 0.6221030
## condition6E-condition6D -0.033137803 -0.09782654 0.1959141 0.5637166
##                               mad0      mad1     RMSE0     RMSE1 TP5proc
## condition6B-condition6A 0.02986928 0.2967893 0.1304674 0.6147564      28
## condition6C-condition6A 0.04068911 0.4511841 0.1398914 0.8161782      38
## condition6D-condition6A 0.05174847 0.5286015 0.1914499 0.9541051      39
## condition6E-condition6A 0.09720540 0.4315930 0.2723269 1.1134531      40
## condition6C-condition6B 0.02228200 0.3069350 0.1227465 0.4194982      38
## condition6D-condition6B 0.05551943 0.2833159 0.2272306 0.6716721      42
## condition6E-condition6B 0.09570841 0.3440744 0.3200883 0.9949950      43
## condition6D-condition6C 0.04909078 0.1920756 0.2191778 0.3347963      42
## condition6E-condition6C 0.09226521 0.2562150 0.3124314 0.6443999      43
## condition6E-condition6D 0.05357493 0.1897926 0.1986304 0.5660727      43
##                         FP5proc TN5proc FN5proc
## condition6B-condition6A       0    1456      11
## condition6C-condition6A       1    1454       2
## condition6D-condition6A      28    1428       1
## condition6E-condition6A     175    1274       0
## condition6C-condition6B       0    1456       6
## condition6D-condition6B      30    1426       2
## condition6E-condition6B     182    1265       1
## condition6D-condition6C      27    1429       3
## condition6E-condition6C     152    1295       2
## condition6E-condition6D      16    1438       3
summary_ridgeSqM
##                                bias0        bias1        sd0       sd1
## condition6B-condition6A -0.021546334 -0.221222067 0.11027082 0.5890038
## condition6C-condition6A -0.033838468 -0.247990270 0.13424914 0.7634001
## condition6D-condition6A -0.039822113 -0.208430810 0.17090693 0.8892894
## condition6E-condition6A -0.089816147 -0.342742366 0.25703729 0.9904412
## condition6C-condition6B -0.010325975  0.009247665 0.09530226 0.3901085
## condition6D-condition6B -0.019616744  0.007723056 0.21193367 0.6578738
## condition6E-condition6B -0.071722048 -0.139601184 0.30795780 0.9434899
## condition6D-condition6C -0.008452197  0.001254246 0.20423518 0.3253406
## condition6E-condition6C -0.059762726 -0.133145228 0.30770773 0.6064819
## condition6E-condition6D -0.050029492 -0.071096844 0.19257281 0.5499110
##                               mad0      mad1      RMSE0     RMSE1 TP5proc
## condition6B-condition6A 0.02580259 0.2811846 0.11231895 0.6220685      32
## condition6C-condition6A 0.03648585 0.4242615 0.13840335 0.7935423      38
## condition6D-condition6A 0.05080025 0.5296974 0.17542781 0.9025010      39
## condition6E-condition6A 0.09455430 0.3857245 0.27219389 1.0363020      39
## condition6C-condition6B 0.02085484 0.2004131 0.09582749 0.3857609      41
## condition6D-condition6B 0.05118631 0.2270338 0.21276713 0.6504009      42
## condition6E-condition6B 0.09630768 0.2854367 0.31609574 0.9430962      43
## condition6D-condition6C 0.04454148 0.1574384 0.20433992 0.3217079      43
## condition6E-condition6C 0.08489230 0.1408196 0.31335315 0.6143078      44
## condition6E-condition6D 0.05284178 0.1725075 0.19890131 0.5485279      44
##                         FP5proc TN5proc FN5proc
## condition6B-condition6A       2    1454       7
## condition6C-condition6A       4    1451       2
## condition6D-condition6A      45    1411       1
## condition6E-condition6A     351    1098       1
## condition6C-condition6B       1    1455       3
## condition6D-condition6B      52    1404       2
## condition6E-condition6B     343    1104       1
## condition6D-condition6C      39    1417       2
## condition6E-condition6C     286    1161       1
## condition6E-condition6D      74    1380       2

6.2. Make ROC curves

colors <- c(1,2,3,4,5,6)
resultobjectlist <- list(lm_normal,lm_Sq,lm_SqM,ridge,ridgeSq,ridgeSqM)

truth <- lm_normal[[1]][,"ups"]
names(truth) <- rownames(lm_normal[[1]])
summary_objectlist <- lapply(resultobjectlist, function(x){return(makeSummary(x, upratio))})

plotROC(resultobjectlist, summary_objectlist=summary_objectlist, proteins=proteinsCPTAC, truth=truth, colors=colors, plotSVG=FALSE)

## $AUC
##                              [,1]      [,2]      [,3]      [,4]      [,5]
## condition6B-condition6A 0.9239222 0.9255424 0.9715589 0.9612919 0.9624542
## condition6C-condition6A 0.9895704 0.9907045 0.9731271 0.9941409 0.9948282
## condition6D-condition6A 0.9956044 0.9961882 0.9897493 0.9979224 0.9982658
## condition6E-condition6A 0.9957557 0.9962905 0.9913043 0.9975155 0.9979296
## condition6C-condition6B 0.9904158 0.9928509 0.9952391 0.9812531 0.9815497
## condition6D-condition6B 0.9871379 0.9901349 0.9926480 0.9802853 0.9807380
## condition6E-condition6B 0.9847333 0.9872620 0.9839323 0.9811522 0.9816234
## condition6D-condition6C 0.9746642 0.9769994 0.9839133 0.9730464 0.9735653
## condition6E-condition6C 0.9794671 0.9814943 0.9767642 0.9782078 0.9789757
## condition6E-condition6D 0.9711441 0.9725943 0.9763471 0.9757341 0.9766013
##                              [,6]
## condition6B-condition6A 0.9851367
## condition6C-condition6A 0.9841409
## condition6D-condition6A 0.9939732
## condition6E-condition6A 0.9929607
## condition6C-condition6B 0.9827672
## condition6D-condition6B 0.9818619
## condition6E-condition6B 0.9811208
## condition6D-condition6C 0.9747405
## condition6E-condition6C 0.9781924
## condition6E-condition6D 0.9751809
## 
## $pAUC
##                               [,1]       [,2]       [,3]       [,4]
## condition6B-condition6A 0.08252325 0.08310440 0.09002536 0.08475979
## condition6C-condition6A 0.09580756 0.09616838 0.09625430 0.09659794
## condition6D-condition6A 0.09725275 0.09783654 0.09708104 0.09751030
## condition6E-condition6A 0.09720497 0.09773982 0.09680814 0.09689441
## condition6C-condition6B 0.09496753 0.09563874 0.09710602 0.09640360
## condition6D-condition6B 0.09543581 0.09581044 0.09688749 0.09598214
## condition6E-condition6B 0.09584092 0.09618647 0.09632783 0.09588804
## condition6D-condition6C 0.09336081 0.09365079 0.09455128 0.09368132
## condition6E-condition6C 0.09501651 0.09503187 0.09506258 0.09501651
## condition6E-condition6D 0.09435739 0.09474613 0.09507506 0.09468632
##                               [,5]       [,6]
## condition6B-condition6A 0.08505917 0.09131093
## condition6C-condition6A 0.09676976 0.09673540
## condition6D-condition6A 0.09785371 0.09708104
## condition6E-condition6A 0.09730849 0.09680814
## condition6C-condition6B 0.09659091 0.09665335
## condition6D-condition6B 0.09615385 0.09630994
## condition6E-condition6B 0.09623359 0.09494566
## condition6D-condition6C 0.09386447 0.09467338
## condition6E-condition6C 0.09500115 0.09500115
## condition6E-condition6D 0.09486574 0.09443215

References

Argentini, A., Goeminne, L. J. E., Verheggen, K., Hulstaert, N., Staes, A., Clement, L. and Martens, L. (2016) moFF: a robust and automated approach to extract peptide ion intensities. Nature Methods 13, pp 964–966.

Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. (2003) A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance. Bioinformatics 19(2), pp 185-193. http://bmbolstad.com/misc/normalize/normalize.html

Goeminne, L. J. E., Argentini, A., Martens, L., and Clement, L. (2015) Summarization vs Peptide-Based Models in Label-Free Quantitative Proteomics: Performance, Pitfalls, and Data Analysis Guidelines. Journal of Proteome Research 14(6), pp 2457-2465.

Goeminne, L. J. E., Gevaert, K., and Clement, L. (2016) Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics. Molecular & Cellular Proteomics 15(2), pp 657-668.

Griss, J., Jones, A.R., Sachsenberg, T., Walzer, M., Gatto, L., Hartler, J., Thallinger, G.G., Salek, R.M., Steinbeck, C., Neuhauser, N., Cox, J., Neumann, S., Fan, J., Reisinger, F., Xu, Q.W., Del Toro, N., Pérez-Riverol, Y., Ghali, F., Bandeira, N., Xenarios, I., Kohlbacher, O., Vizcaíno, J.A. & Hermjakob, H. (2014) The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Molecular and Cellular Proteomics 13(10), pp 2765-75.

Navarro P., Kuharev J., Gillet, L. C., Bernhardt, O. M., MacLean, B., Röst, H. L., Tate, S. A., Tsou, C., Reiter, L., Distler, U., Rosenberger, G., Perez-Riverol, Y., Nesvizhskii, A. I., Aebersold, R., and Tenzer, S. (2016) A multicenter study benchmarks software tools for label-free proteome quantification. Nature Biotechnology.

Paulovich AG, Billheimer D, Ham A-JL, et al. (2010) Interlaboratory Study Characterizing a Yeast Performance Standard for Benchmarking LC-MS Platform Performance. Molecular & Cellular Proteomic 9(2), pp 242-254.

Ramond, E., Gesbert, G., Guerrera, I. C., Chhuon, C., Dupuis, M., Rigard, M., Henry, T., Barel, M., and Charbit, A. (2015) Importance of host cell arginine uptake in Francisella phagosomal escape and ribosomal protein amounts. Molecular & Cellular Proteomics 14, pp 870–881.