Bernhard Kluger, Christoph Bueschl, Nora KN. Neumann, Maria Doppler, Gerhard G. Thallinger, Rudolf Krska, Rainer Schuhmacher
Stable isotopic labelling-assisted untargeted metabolic profiling reveals novel conjugates of the mycotoxin deoxynivalenol in wheat
Kluger et al. (2013) Anal and Bioanal Chem. 405(15):5031-6. DOI:10.1007/s00216-012-6483-8
A novel stable isotope labelling assisted workflow for improved untargeted LC–HRMS based metabolomics research
Bueschl, Kluger et al. (2014) Metabolomics. 10(4):754-69 DOI:10.1007/s11306-013-0611-0
Automated LC-HRMS(/MS) approach for the annotation of fragment ions derived from stable isotope labeling-assisted untargeted metabolomics
Neumann et al. (2014) Analytical Chemistry. 86(15):7320-7 DOI: 10.1021/ac501358z
Untargeted profiling of tracer derived metabolites using stable isotopic labeling and fast polarity switching LC-ESI-HRMS
Kluger, Bueschl et al. (2014) Analytical Chemistry. 86(23):11533-7 DOI: DOI: 10.1021/ac503290j
MetExtract II: A software suite for stable isotope assisted untargeted metabolomics
Kluger et al. (year) journal. DOI: DOI:
Follow the instructions of the setup. It will also ask to install and configure R automatically.
Note: Only required if not performed during the automated setup.
Required R-packages will automatically be installed upon the first use of MetExtract II
An environment variable named 'R_HOME' may cause problems with MetExtract II. Please rename this variable when using MetExtract II and undo the renaming after using MetExtract II.
For sample data please refer to the publication DOI: TODO and the homepage of MetExtract II at https://metabolomics-ifa.boku.ac.at/metextractII
Stable isotope labeling (SIL) assisted and LC-HRMS based experiments have become increasingly popular in targeted and untargeted metabolomics research. Applications include fluxomics approaches, true biology derived metabolite discovery, improved unknown metabolite annotation, relative and absolute quantification using internal standardisation, and precursor metabolism studies among many others. Despite the methods many possible applications, only few data processing tools are available as of yet.
For example, Chokkathukalam et al. (2013) published the R-package mzMatch-ISO, which creates a comprehensive summary for all metabolites detected in a SILL experiment and Huang et al. (2014) presented X13CMS, which is an extension to the popular XCMS package (Tautenhahn et al. (2008)) and aimed at detecting 13C-labeled compounds besides non-labeled metabolites in SIL assisted, untargeted metabolomics experiments. Other tools are Allocator (Kessler et al. (2014)), HiTIME (Lemming et al. (2015)) and NTFD (Hiller et al. (2013)) among others.
The presented software package is designed for the comprehensive detection of truly biology-derived metabolites or tracer-derived biotransformation products in liquid chromatography high resolution mass spectrometry (LC-HRMS) full scan data. The software requires mixtures of uniformly or partially labeled and non-labeled (i.e. isotopically unmodified) biological samples. Such mixtures form specific and highly unique isotopolog patterns for all metabolites or biotransformation products or a studied tracer substance (see figure 1). Each such detected substance is annotated with the total number of labeled atoms. Moreover, the software also supports LC-HRMS/MS data and facilitates spectral cleaning and improved annotation of unknown metabolite ions and their fragments.
The presented software suite consists of 3 modules designed for different workflows:
The following notation and terms are used in MetExtract II:
MetExtract II requires sample material of native and labeled samples to be concurrently analyzed in the same LC-HRMS(/MS) run. The labeled samples may either be uniformly labeled (i.e. all atoms of the labeling element are replaced with the labeling isotope) or partially labeled (only a constant part of a metabolite is labeled with the labeling isotope). Moreover, the isotopic enrichment with the labeling isotope must be equal in all atoms that may be labeled. The enrichment with the used labeling-isotope must be very high (above 98%) and the isotope patterns of the native and the labeled metabolite forms must be clearly separated. Figure 1 shows different native and 13C-labeled molecules of the metabolite deoxynivalenol-GSH (different isotopologs are not shown).
An MS signal or ion signal corresponds to a centroided peak in a mass spectrum and consists of an m/z value and an abundance value. Additionally, for each MS signal its scan number, retention time and ionization polarity is known.
A feature denotes a chromatographic peak of a single isotopolog. It has a defined m/z window and a retention time window. All MS signals present within these windows belong to that feature and thus the chromatographic peak. Different isotopologs of an ion are not convoluted to one feature but are represented as individual features.
Native metabolites mixed with labeled metabolites (e.g. 13C) form typical isotope patterns in LC-HRMS(/MS) data. As 13C is suggested as the main labeling element the isotope patterns are exemplified with 13C-labeling:
Any non-labeled substance consisting of carbon atoms may show an descending isotope pattern towards higher m/z values (grey peaks in figure 2). The ratios of these different isotopolog signals depend on the total number of carbon atoms present in the substance. In MetExtract II these isotopologs are termed as following:
Any labeled part of a metabolite (either fully labeled or partially labeled) shows an ascending isotope pattern towards higher m/z values in LC-HRMS full scan data. The ratios of these different isotopolog signals depend on the total number of labeled carbon atoms present in the substance as well as the enrichment with 13C. These isotopologs of a labeled metabolite are termed as follows:
In a uniformly labeled metabolite all carbon atoms are replaced with 13C. No carbon atom position specific labeling are allowed
A biotransformation product of a tracer substance may contain carbon atoms that originate from the labeled tracer and others that originate from a part conjugated to the tracer. These conjugated moieties always come from the biological system under investigation and are not labeled. Thus, such a metabolite consists of labeled and a non-labeled part/non-labeled parts. As a result the isotope pattern of the partially 13C-labeled biotransformation product is a combination of two isotope patterns and defined as following:
Additionally to these isotopologs, mixed forms such as M'-2+1 or M'-1+1, which would have the same m/z value as M'-1 and M', of partially labeled biotransformation products are possible. However, the relative abundances of such isotopologs are usually very low and thus these can be ignored for the purpose presented here
The isotopolog ratios are calculated relative to the principal isotopolog of either a native (M) or a partially or uniformly labeled metabolite (M') ions. Such ratios are stated as M'-1/M'. Brackets are left out for the sake of simplicity
Two or three isotope patterns originating from non-labeled and uniformly/partially labeled metabolite ions as depitcted in figures 2b and 2c are called a feature pair. Each feature pair consists of the monoisotopic non-labeled feature, the uniformly or partially labeled feature as well as their isotopologs. All participating features posess a highly similar chromatographic peak shape. Each feature pair is annotated with the calculated number of labeled atoms
A feature group represents all feature pairs detected for the same metabolite. Since the individual monoisotopic non-labeled features of a particular metabolite (e.g. adducts, in-source fragments) are generated in the ion source of the mass spectrometer after the chromatographic separation, a retention time window and the Pearson correlation coefficients are used to convolute different feature pairs into feature groups. Each feature group represents an individual metabolite
MetExtract is designed to work with centroid LC-HRMS(/MS) data and does not support profile mode data. It requires the natural and labeled compounds to be analyzed as mixtures present in the same sample and does not support the evaluation of data obtained from separate, successively measured non-labeled and labeled samples. It does not support fluxomics experiments or other non-uniformly labeled metabolites as it requires a high isotopic enrichment with the isotope used for labeling
MetExtract requires the LC-HRMS(/MS) data in the mzXML (Pedrioli et al. 2004 DOI:10.138/nbt1031) or mzML format (Martens et al. 2011 DOI:10.1074/mcp.R110.000133). Some instrument manufacturers provide tools for exportint their proprietary formats to mzXML or mzML; for others open source software can be used for the conversion.
Besides the mzXML format, MetExtract also requires centroid LC-HRMS(/MS) data since it does not support profile mode data.
ReAdW, a command line tool, is capable of converting Thermo Fischer Raw files (Orbitrap format) into mzXML. It can be downloaded from http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW.
The option –c is used to centroid data from profile mode using the Thermo Fischer centroidation algorithm. The X-Calibur software suite from Thermo Fisher Scientific must be installed and accessible for ReAdW.
The program msConvert is a part of the ProteoWizard software package and has a graphical user interface, multi core support, and is capable of importing different MS vendor data formats. It can be downloaded from http://proteowizard.sourceforge.net/.
Before the conversion is started, add "peakPicking" filter (options 1-), select the "mzXML" output format and discard the "scanHeader" filter. Set the 'Binary Encoding Precision' to 32-bit as MetExtract II currently does not support 64-bit encoding. In case MSMS experiments are also recorded in the respective measurement files, do not use the "msLevel" filter. Compression of mzXML files is not supported by MetExtract II as of yet.
An open source viewer for converted mzXML files is TOPPView from the OpenMS package. It can be downloaded from http://www.openms.de/ (Sturm et al. (2008)). The viewer allows easy navigation among the LC-HRMS(/MS) files and also provides a convenient way to evaluate m/z value differences or estimate labeling-enrichment parameters.
Upon MetExtract II start, a dialog displaying the three MetExtract modules will be shown. Select the respective module (AllExtract, TracExtract, or FragExtract) by clicking on the image. A new window with the selected module will be shown.
Once a module has been started, it tries to load the default settings saved in the configuration file <PathToExecuteable>/defaultSettings.ini. These settings may be overwritten by the user to reflect the most common settings for the used LC-HRMS(/MS) instrument.
The graphical user interfaces of MetExtract II modules are divided into the following three sections:
AllExtract and TracExtract allow to briefly describe the performed and analyzed metabolomics experiment. This information is not used for any processing steps stored with any created data processing result. This information can be entered in the Experiment, Operator, Experiment ID, and Comments controls.
AllExtract and TracExtract are designed to work with centroid full scan LC-HRMS(/MS) measurement files. Such files are loded in groups each representing a certain, biological condition (e.g. treatment or control). After a group has been defined, MetExtract will automatically try parsing the specified LC-HRMS(/MS) files to check if they are valid (see section "File validation").
To define a new group click on the button "Add group". A dialog will appear where the group name and the respective LC-HRMS(/MS) files need to be specified. With a click on "Add file(s)" the user can select the mzXML files for this group. Files already loaded can be selected and removed with the "Remove selected" button.
During the process of adding a group the user can specify how often a particular feature has to be found within this group to be accepted for further data processing (Option "Minimum found"). If "omit features" is checked, all features not fulfilling the "Minimum found" option will be rejected for the final data matrix.
When the dialog is closed, the newly defined group is displayed in the "Groups" list of the input tab. Moreover, MetExtract will try to parse all LC-HRMS(/MS) files of this new group to check, if they can be read and match with the remaining groups and LC-HRMS(/MS) files.
See figure 3 for an example of the group dialog with two mzXML files added to the respective group.
Filenames of mzXML files should only contain letters (a-Z; upper and lower case), numbers (0-9) or an underline character ('-'). Files having other characters will not be imported. A dialog will be displayed promting to resolve these issues.
To edit a previously defined group double click on its name in the "Groups" list in the input tab. Files can be added or deleted from the group. Additionally, Parameter settings such as for "minimum found" and "Omit features" as well as the group name can be changed.
To delete an already defined group, select it and click on "Delete group".
If an experiment consists of large groups (10 or more files) and/or many different groups (10 or more groups), the file validation step after each group manipulation (creation, deletion or modification of a group) may take a while. To inactivate this step, check the "Don’t update" box. MetExtract will then not update and validate the LC-HRMS(P/MS) files after any group operation. With a click on "Update" this verification step can be initiated at any time and should be done once all groups have been defined/changed.
The file validation is a functionality, which has to be used before data processing is started. It checks if all specified input files are valid according to the mzXML schema and can be imported by MetExtract. If one or several files are incorrectly formatted or cannot be read, MetExtract will display an error. Additionally, all MS scan types will be extracted and displayed in the drop-down menues "Scan event(s) Positive" and "Scan event(s) Negative". Only if all measurement files share at least one common scan event, the data processing may be started in the "Calulcation page".
MetExtract supports quasi-parallel processing of fast polarity switching ionization. For this, select the respective scan event (m/z range, resolution) once for the positive and once for the negative ionization mode. MetExtract will analyze the two modes independently and convolute the individual feature pairs from both modes.
Note: Processing of LC-HRMS(/MS) data obtained from different MS scan types is not supported. All input files have to share at least one scan type. For example, LC-HRMS(/MS) data from an experiment using polarity switching should not be processed conjointly with data obtained from a positive-mode only LC-HRMS(/MS) analysis. Moreover, it is not supported to process data from different LC-HRMS instruments at the same time.
The ion modes of LC-HRMS scans in all loaded mzXML files will be displayed in two combo boxes each showing either those scan events of the positive or negative ionization mode (figure 4). Only such scan events present in all loaded mzXML files will be shown. LC-HRMS/MS scan events will not be included.
mzXML files with either positive or negative ionization mode can also be loaded but not be processed in parallel.
A defined experiment (i.e. description and LC-HRMS(/MS) data files) may be saved or loaded using the buttons 'Save groups' and 'Load groups'. The path information for the LC-HRMS(/MS) files will be save relative to the specified group file. This allows copying the group file and associated mzXML files easily to different locations or PCs.
FragExtract will try to automatically match corresponding MSMS scan events of the native and the uniformly 13C-labeled precursors. This match is based on the precursor m/z, which must be a multiple of 1.00335 between the two MS/MS scan events as well as similar retention time for both scan events.
This automatic match will fail, if two MS/MS targets share the same m/z values and scan events but elute at a different retention time. In such a case right click on the name of the target with the correct match and select 'Duplicate'. The target will be duplicated.
When using the automated MS/MS scan event match of FragExtract, the m/z values of the precursor ions recorded by the instrument will automatically be used as the m/z values of the generated targets. Depending on the mass deviation these might be quite high and could results in problems during calculation. Make sure these are either correct or correct them accordingly. Moreover, the start and stop retention time of the targets should be narrowed down as these are the start and stop times of the MS/MS scan events for the respective target.
For each MS/MS target to be analyzed with FragExtract, the following information need to be provided:
FragExtract does not perform chromatographic peak picking and therefore requires the "Min. Rt [min]" and the "Max. Rt [max]" of its targets to be specified. Only the most abundant scan inside this retention time window will be used for analysis. In case of several chromatographic peaks for the same precursor ion m/z values, several MS/MS targets with different target names and retention times must be specified.
MetExtract supports drag'n'drop. To create a new group with one or several LC-HRMS(/MS) measurement files, simply select the files in an explorer window and drag'n'drop them anywhere onto the module window. A new group dialog will automatically pop up with the respective LC-HRMS(/MS) files.
An already created group file (file extension .grp) or settings file (file extension .ini) may also be dropped onto MetExtract windows. In this case the measurement compilation or the saved settings will be automatically loaded.
The process page is used for specifying all parameters required for data extraction and feature pair bracketing among different LC-HRMS(/MS) files. The page is divided in the following three sections which also correspond to the general workflow of the data processing pipeline:
Settings may be loaded or saved (e.g. different settings for Orbitrap and Q-Tof instruments). To save a current setting layout to a file (extension .ini) goto File->Save Settings and select a settings file in the file dialog. To load previously defined settings either goto File->Load Settings and select a settings file or drag'n'drop a respective settings file anywhere onto MetExtract. A settings file may also be loaded via drag'n'drop.
Processing many files at the same time (e.g. more than 50 samples analyzed with an LTQ Orbitrap instrument) may cause a memory error. This is a bug in the current implementation and will be resolved in further version of the software.
If this error occurs, please consider processing the data during the steps Individual files processing and Re-integration during multiple files processing sequentially for some files (e.g. less than 50). The Bracketing of feature pairs step must, however, be performed with al LC-HRMS files together.
Alternatively, the Command line parameters may also be used for automated processing.
To perform feature extraction for each file separately, the checkbox entitled "Individual files processing" has to be checked. MetExtract will process each file separately and extract corresponding 12C and 13C features and feature pairs. Also, all parameter settings used for this feature pair extraction are located in the "Individual files processing" section of the "Process tab page".
Most elements consist of more than one isotope. For example, carbon consists of the two stable isotopes 12C and 13C. The relative abundances of 12C and 13C in natural environments are 98.93% and 1.1% respectively. The other radioactive isotopes of carbon can be neglected in this respect as their relative abundances are too low. When a molecule consisting of at least one carbon atom is recorded with LC-HRMS, its isotopolog distribution will be visible. The relative abundances of this isotopolog distribution depend on the number of carbon atoms in the respective molecule as well as the abundances of the two stable isotopes. The following formula is used to calculate the theoretical isotopolog distribution for carbon (a is the total number of carbon atoms in the substance, s is the number of 13C substitutions (i.e. M+s) and p is the relative abundance of 12C (e.g. 0.9893 for native molecules):
`f(a,s,p)= (p^(a-s)*(1-p)^s*({::}_s^a))/(p^a)`
This formula can be rewritten and used for determining the enrichment of the labeling-isotope in SIL experiments. The enrichment of 13C is calculated using a highly abundant feature pair in the recorded LC-HRMS(/MS) data. For this feature pair the number of carbon atoms (a) can directly be calculated from the m/z difference between M and M'. The ratio (r) of [M'-s]/[M'] is further required as well as the number of exchanges (s), for which usually 1 is chosen. With the rewritten formula the enrichment of 13C in the labeled isotopologs is determined:
`p= (({::}_s^a)^(1/s))/(({::}_s^a)^(1/s)+r^(1/s))`
Calculating the isotopic enrichment in the labeled metabolite form is exemplified with an uniformly 13C-labeled metabolite shown in figure 5. It has 22 carbon atoms in total and is fully 13C-labeled. Using the ratio M+1/M (rn=0.23) the native 12C enrichment is calculated to be 99.96%, which is in good agreement with the natural 12C enrichment of 98.93%. Consequently, using the ratio of M'-1/M' (rl=0.38) the 13C enrichment in the uniformly labeled metabolite is determined to be 98.3%.
The most informative labeling element in SIL-assisted metabolomics approaches is carbon as it is a constituent of virtually any metabolite. Labelling with 13C isotopes will result in the typical SIL associated pattern shown in figure 2. Other elements (e.g. N, S, P) will result in very different isotopolog patterns (see figure 6). For this experimental setup a different extraction method may be used which deduces the feature pairs not based on the isotopic patterns of the labeling element but rather on identical natural carbon isotopic distributions. This option is only available if carbon is not used as the labeling element.
To define a tracer under investigation, press the button "Tracer setup" in the labeling-section. A new window will open where the used tracers and labeling associated parameters can be specified. Figure 7 shows the tracer configuration dialog with DON (deoxynivalenol) as a configured tracer.
Each row in the dialog represents one tracer. For each tracer the following parameters have to be specified:
To detect putative native and labeled metabolite ions and fragments, MetExtract tests a certain range of possible carbon atoms. This range needs to be specified by the user:
Full metabolome labeling
The range can be arbitrary but should not include a too low atom count (e.g. less than 3) or a too high atom count (e.g. more than 65 if the enrichment with 13C is approximately 98.5%)
Tracer-fate studies
For tracer metabolisation experiments the maximum number of labeled atoms can roughly be chosen as twice the number of labeled atoms present in the tracer. E.g. Since DON has 15 carbon atoms, the search for metabolisation products can be restricted to a maximum of 30 carbon atoms, which will include metabolites containing two DON tracer units)
The following parameter settings are located in the section MZ picking. These settings are related to the mass accuracy and expected isotopolog ratio accuracy of the used mass spectrometer.
The time interval of the chromatogram during which MetExtract inspects mass spectra and EIC chromatograms, is specified in the input fields "Start (min)" and "End (min)". Only MS spectra and EIC peaks between these time points of the LC-HRMS run are considered during data processing.
An optional intensity threshold can be specified in the field "Intensity threshold". All centroided MS signals above this intensity threshold are assumed to be putative MS signals of monoisotopic ions with natural isotopic composition or putative MS signals of a labeled ion. Required isotopolog signals may also be lower than this threshold to be recognised by MetExtract.
The input field "Max. number of charges" specifies the max. allowed charge state of an ion to be considered for data processing by MetExtract. An ion's charge state is determined using its carbon isotope pattern.
The field "Max. mass deviation (+/- ppm)" specifies the maximum tolerated mass deviation of M' as well as M+1 and M'-1 in ppm, relative to the calculated m/z value of the observed MS signal M. E.g. in case of 13C labeling M' = M + n * 1.00335 / z.
The parameters "Isotopologs N" and "Isotopologs L" specify how many isotopologs of the natural and the labeled isotopologs must be present and are verified in the respective patterns (e.g. a value of 2 for "Isotopologs N" means that the isotopologs M and M+1 must be present for the native ion form). If "1" is used, MetExtract will only check for and use the two corresponding MS signals M and M' for confirmation.
The parameters "Intensity abundance error" for "Non-labeled ion (±)" and "Labelled ion (±)" specify the maximium relative deviation between the expected and the observed ion intensity ratios of M+y relative to M and M`-x relative to M'. A parameter setting of e.g. 0.2 corresponds to a maximum tolerated relative isotopic abundance error of 20% compared to the theoretically expected one. In case of a tracer-fate experiment, the ratios of M'-y to M' are corrected by the factor M'+y to M' to account for possible moieties conjugated to the studied tracer substance.
MetExtract II tries to find valid pairs of M and M' signals. However, it can also happen that M+1, M+2, M'-2, M'-1 signals are incorrectly paired leading to false-positives. To reduce the number of such pairings, MetExtract II only accepts MS signals, if no M-1 or M'+1 signal is detected in the same scan (the latter verification is suspended when TracExtract is used).
The isotope patterns of native and partly or uniformly labeled metabolites have characteristic abundances relative to their monoisotopic or consistently labeled isotopologs. Depending on the abundance of the metabolite in a particular sample, these signals of the respective isotopologs may be present in the LC-HRMS data. But in some cases (e.g. low number of carbon atoms and / or low abundance of the monoisotopic or labeled metabolite) these signals can hardly be observed and thus incorrectly not detected using the strict isotopolog filtering system of MetExtract II (parameters "Isotopologs N" and "Isotopologs L").
To circumvent this problem, the option "Consider isotopolog abundance" may be used. When checked, it will instruct the isotope matching algorithm of MetExtract II to consider the observed signal intensity of putative isotopologs. Only when the calculated signal abundance of an isotopolog is above the specified intensity threshold (parameter "Isotopologs threshold (≥)") the presence of this isotopolog is verified. If the calculated abundance is lower than the set intensity threshold, the respective isotopolog signal is not required to be present in the LC-HRMS data.
With this option active, the metabolite detection algorithm of MetExtract II is more sensitive. At the same time the number of incorrectly matched signals is also increased, which leads to a higher number of false-positives. Thus, this option should be used with caution and the generated results should be carefully reviewed before further analysis of the dataset. To counteract these problems, the following settings may be adapted:
Additionally, in many LC-HRMS systems the chromatographic peak shapes of low abundant ions tend to show increased noise. This complicates the automated feature pair grouping mechanisms of MetExtract II resulting in not correctly convoluted feature groups.
See figure 9 for an illustration and table 1 for a comparison of the results for the provided datasets DON_in_Wheat.grp and DON_in_Wheat_LAC.grp
Processing settings | Total metabolites | Unique metabolites | False-positivly detected feature pairs | Processing time |
---|---|---|---|---|
DON_in_Wheat.grp / 10_CM_DON.mzXML | 8 | 0 | 0 | 2.1 min |
DON_in_Wheat_LAC.grp / 10_CM_DON.mzXML | 12 | 4 | 20 | 8.2 min |
Table 1: Comparison of the detected number of metabolites detected in the datasets DON_in_Wheat.grp and DON_in_Wheat_LAC.grp
Detected MS signal pairs consiting of M and M' signals in individual scans are clustered using hierarchical clustering with euclidean distance and average linkage. The parameters found in the section "MZ clustering" are used for this step.
The parameter "Clustering ppm (±)" specifies the maximum ppm deviation present within a MS signal cluster. All cluster exceeding this threshold are separated in two or more subclusters fulfilling this parameter.
The field "Min. spectra (≥)" is used to discard those subclusters that consist of less MS signals than specified with this paramter. This is especially helpful to discard low abundant Fourier transform artefacts.
The next step in the AllExtract and TracExtract modules detect and separate different chromatographic peaks of metabolites having the same m/z value but a different retention time. An example is illustrated in figure 10. The following steps are carried out for each remaining subcluster of MS signal from the previous processing step:
The parameter "EIC width (±)" defined the ppm width that is used for extracting the EICs of M and M' for each remaining subcluster from the previous data processing steps.
An optional EIC smoothing step may be used. This can be activated with the field "Smoothing window". Several options are available. If used, the smoothing window size must also be specified with the parameter "Window size". Note: the option "Window size" is only available when smoothing is used.
The minimum and maximum width of chromatographic peaks may be specified with the parameters "Min. scale" and "Max. scale".
The maxmimul allowed peak center error is specified with the field "Center error" in the "Peak matching" category". Like the peak scale parameters this value also uses number of scand between two peak centers.
The minimum Pearson correlation between two chromatographic peaks of a native and labeled metabolite ion is specified with the parameter "Min. corr (≥)".
Each such detected and verified feature pair is considered a metabolite ion derived from a native and a labeled ion of the same metabolite.
After feature pairs have been detected they are further annotated with putative hetero atoms and convoluted into feature groups each representing a metabolite.
After feature pairs have been detected, MetExtract searches for isotopolog mass peaks originating from other elements than the one used for labeling. Such hetero atoms must consist of at least two main isotopes. Chloride, for example, consists of the two main-isotopes 35Cl and 37Cl. 35Cl, with an relative abundance of 75%, is the more abundant isotope expected to be predominantly incorporated into molecules. However, since the less frequent 37Cl has a quite high abundance of 24% it is recorded in most mass spectra for substances with chloride atoms.
Isotopes having a positive m/z offset compared to the most abundant isotope of the respective element are searched at M’+isotope_m/z, while isotopes with a negative m/z offset (e.g. 54Fe) the search for respective mass peaks is performed at M-isotope_m/z
Several hetero isotopes show similar m/z value offsets and similar relative isotopolog abundances. Depending on the HRMS device, these may or may not be separated. MetExtract will report all possible hetero isotopologs.
The maximum allowed ratio error for a hetero atom isotopolog is specified with the parameter "Intensity error (±)"
Background noise signals may also be incorrectly interpreted as MS signals originating from hetero atom isotopologs, since isotopologs are generally very low abundant MS signals (e.g. 34S). Consequently, they need to be present in several consecutive MS scans to be correctly found. The parameter "Min. scans" defines the number of times a hetero atom isotopolog must be detected to be reported.
Hetero atoms are configured by clicking on the "Hetero atoms configuration" button. The following information is required for every possible hetero isotope:
Ionisation by ES may give rise to several ion species for the same metabolite (e.g. adducts, in-source fragments or dimers). To convolute all ion species of a particular metabolite into a feature group, the Pearson correlation coefficient is used for finding feature pairs with highly similar retention time and chromatographic peak shapes. The parameter "Min. corr. (±)" in the section "Feature convolution" specifies the minimun Pearson correlation the two chromatographic peaks of the monoisotopcic isotopolog of two feature pairs must have in order to be convoluted into one feature group.
Close co-eluting metabolites may be incorrectly merged into a feature group consiting of different metabolites (see figure 11a and 11b). To split a feature group consisting of incorrectly convoluted feature pairs, HCA is used. Only such subclusters, which show a large number of highly correlated feature pairs are not further split. In the example in figure 11b, 5 feature groups are incorrectly convoluted. Using the parameter 'Min. connections' (relative amount of highly connected feature pairs in a feature group) this feature group was split into 5 feature groups. As shown in figure 11c, the feature pairs in each of these new feature groups show better co-elution.
Subsequently, each feature group is annotated with commonly observed adducts and in-source fragments. This is done by comparing the m/z value of M as well as the determined number of labeled atoms between all feature pairs of a feature group. If the difference between two feature pairs cannot be explained with two adducts observed in the respective ionization modes, putative neutral losses of in-source fragments are calculated using the Seven Golden Rules algoritm (Kind and Fiehn 2007). Adducts and elements that shall be used for the generation of neutral losses can be specified in a dialog that is accessed using the button "Relationship configuration".
Biotransformation products of a tracer may have conjugated moieties that have additional atoms of the labeling element. A neutral loss may not reduce the number of labeled atoms for two features but the neutral loss may still contain the heavier isotope of the labeling element. Thus, the number of labeled atoms is used only in AllExtract for the generation of putative neutral losses. In TracExtract the number of labeled atoms is ignored.
Results of the individual files/measurements may be saved in various formats. Each output file is saved as <FileName>.<extension>
The detected feature pairs and convoluted feature groups are stored in a matrix format. Each row represents a feature pair having the following information:
For each input file a PDF file may be created which contains the same information as the TSV output file. Each detected feature pair is additionally visualised (mass spectra, EICs of M and M') on a PDF page. Moreover, all chromatographic peaks of M for all convoluted feature groups are illustrated.
This option allows saving the detected feature pairs in a new mzXML file. This mzXML file will only contain MS signals of feature pairs detected with MetExtract. All other signals that do not originate from native and labeled substances and their ions are not saved. Depending on the selected options (M, M+1, M'-1, M'), isotopologs of the detected feature pairs may also be included.
Saving the results to a new mzXML file may only work for certain input data. If this option does not work with a certain instrument, please contact the authors. Currently, this export functionality has only been tested the following instrument(s):
The section "Multiple files annotation" provides parameters for matching extracted feature pairs in all processed LC-HRMS data files.
The matching is performed using the m/z value and retention time of M as well as the determined number of labeled atoms. The field "Max. m/z width (± ppm)" defines the maximum allowed difference in m/z of M for two feature pairs detected in different LC-HRMS files to be matched. The matching is performed using HCA. Those subclusters, which have less ppm difference between their highest and lowest m/z value of M are kept. It is recommended to set this parameter to a multiple (2-3) of the parameter "Mass deviation (± ppm)". Subsequently, all chromatographic peaks detected within this m/z cluster are again clustered using their retention time. All clusters of less than "Time window (min)" belong to the same chromatographic peak.
For slight retention time shift, an optional alignment step may be performed. This alignment is realised with the R-package PTW (Polynomial time warping). By default, no alignment is performed.
To fill up the created data matrix, a semi-targeted re-integration step is performed after bracketing of feature pairs. This step searches for chromatographic peaks that were not detected in some of the analyzed LC-HRMS files but successfully detected in other. Most ofter, low abundant features are missed due to the even lower abundance of the required isotopolog MS signals.
The output of feature pair bracketing is a two-dimensional data matrix consisting of all detected feature pairs and their abundances in the different LC-HRMS files. Moreover, each detected feature pair is annotated with:
The IDs of feature pairs as well as of feature groups (<FileName>) are unique within a certain LC-HRMS analysis. They are not matched across files and thus cannot be compared
To start the data processing, press the "Start" button located on the right bottom of the calulcate page. MetExtract will ask to save the current file/group compilation as well as the specified settings for data processing. Once started, a progress dialog will display the status and the current operation per file.
Since many operations of MetExtract can be parallelized (e.g. feature pair detection in individual LC-HRMS(/MS) data files), multi core systems and paralled computation are supported. The default number of cores MetExtract will use is one less than available on the executing PC to not block the computer for office work. To use all available cores uncheck the box "Keep one core unused"
During the processing of the specified LC-HRMS files, a dialog will be shown. It will show the overall process (first label and progress bar) of all files. Then, for each CPU core, a separate label and progress bar will be shown, which illustrate the status of the currently processed files. At the botton of the dialog, a tabular overview will be displayed showing which files have already been processed successfully or are currently being processed. Figure 12 shows an example of this dialog with 4 experimental groups and 7 parallel file processings
Results of data processing can be visualized in the third tab named "Sample results". Select a processed file from the drop-down list on the left top of the page. The results of the selected file will be presented in the list underneath. Once a result is clicked on, the headers of the list will change accordingly. This list is divided into the following five categories:
To view the raw-mzXML file in an external viewer use the button "open mzXML file externally". It will be opened with the default application for mzXML files.
The category "MZs" shows all MS signals from all scans of the original data file that showed the mirror symmetric isotopic patterns and fulfilled all parameter settings for this processing step. The only plot available for this result category is the main plot. It shows a scatter plot of all native MS singals detected. If a specific ion signal is selected, it will be coloured.
The category "MZ bins" shows all clusters remaining after MS signal clustering. Only the main plot is available for a graphical illustration of these bins. A selected bin will be colour highlighted among all extracted MZ bins.
The category "Feature pairs" shows detected feature pairs. For a feature pair two plots are available. The main plot shows the chromatographic peaks of the natural and labeled ion forms. The intensities of the labeled metabolite ions have negative intensities while the native metabolite ions are depicted with positive intensities. The second plot for each feature is its mass spectrum (see figure 13).
The category "Feature groups" shows convoluted feature groups. For each feature group three plots are available. The main plot shows an overlay of all feature pairs convoluted into the selected feature group. Intensities of differently abundant feature pairs are noarmalised to 1 if the field Normalise is selected. The second plot shows an illustration the correlation and similarity of all feature pairs in this feature group. The third plot depicts an annotated MS scan for this feature group (see figure 14)
MetExtract offers the possibility to filter all results for certain criteria. The filter can be entered in the text field "Filter" in the "Sample results" tab. It is applied to each result category. The entered text is first split into chunks using the space character. Each i-th chunk is a filter for the i-th column in the results tab.
EXAMPLE 1: If the user wants to search for feature pairs with a m/z value of 297.1333 for M, the text "297.1333" is entered in the field "Filter". MetExtract will then only show results having the text 297.1333 in the first column. If the search shall be further restricted to 15 carbon atoms, the user must enter "297.1333 15". This will also search in the second column for the text 15 and only show those feature pairs, which have 15 or 115 or 1500 carbon atoms.
EXAMPLE 2: If the user wants to search for 297.1333 +- 1 amu the filter must be set to "297.1333+-1". This will instruct MetExtract to search for m/z values of M within the range of 296.1333-298.1333 amu. It is also possible to search within a specific ppm window. The filter must then be set to "297.1333+-5ppm".
FragExtract shows analyzed LC-HRMS/MS targets and detected fragment peaks in form of a hierarchical tree. The first level of the tree are the LC-HRMS/MS targets (for example CouAgm in the green box in figure 15). It includes the name (column "Target name"), m/z value of the native precursor ion (column "MZ"), its total number of labeled carbon atoms (column "Cn"), either user defined or generated sum formulas of the parent (column "Sum formula"), the charge of the precursor ion (column "Charge"), the scan number of the MS/MS scan used for analyis of the native precursor ion (column "Native scan num"), the scan number of the MS/MS scan used for analysis of the U-13C-labeled precursor ion (column "Labelled scan num"), The scan event of the LC-HRMS full scan(s), and the scan events of the LC-HRMS/MS scans of the native and the U-13C-labeled precursor ions.
The next level lists detected MS/MS fragment peaks found for both the native and the U-13C-labeled precursor ions (blue box in figure 15). Only those peaks present as both forms with approximately the same relative intensity and a defined Δm/z value corresponding to the total number of carbon atoms in the respective fragment ion, are shown. Each such fragment peak is annotated with its assigned peak number (column "Target name", form "Peak XX"), the m/z value of the fragment peak derived from the monoisotopic precursor ion (column "MZ"), the determined number of carbon atoms for this fragment peak (column "Cn") followed by the calculated m/z value of the U-13C-labeled fragment ion pendant, a generated sum formula for the fragment peak (column "Sum formula"), the adduct of the generated sum formula (column "Charge"), the neutral loss of the fragment in respect to the used precursor ion (column "Native scan num"), and the relative intensity of the fragment peak in the LC-HRMS/MS scan of the native precursor ion (column "Labelled scan num"). If several putative sum formulas could be calculated for one fragment ion, a start ("*") is shown instead of the sum formula (orange box in figure 15, first row). All generated sum formulas are then shown one level beneath the annotated fragment peak (redundant information is not shown).
Results of FragExtract processing can easily be exported in tab-separated table format. To copy the results of all MS/MS targets in a LC-HRMS/MS file, right-click on the top-level item named "MSMS targets" and select "Copy". To copy the results of a single target, right-click on the target name and select "Copy". The copied information can then be transfered to e.g. Excel via the paste operation (Strg+v). The table contains information about the precursor-ions and the extracted MS/MS peaks as well as generated sum formulas.
The modules AllExtract and TracExtract can be started from the command line and support batch-wise processing. The following parameters may be used for this:
Note: If directories with wide space characters (e.g. space) are used the respective values of the parameters (-g and -l) need to be specified under quotation marks
For example, the command line
> MExtract.exe -m TracExtract -g 'E:\Cdata_15\_1K.grp' -l 'E:\Cdata_15\_1K.grp' -x -s -e
starts a new TracExtract instance (parameter -m TracExtract) and loads the group and settings file 'E:\Cdata_15\_1K.grp'. While parameter -g loads only the groups from the file, parameter -l only loads the settings stored in the respective file. It is not possible to load the settings and the groups with a single parameter. The output is reduced (parameter -x), calculations are started automatically (parameter -s) and once all files have been processed, TracExtract quits (parameter -x).
MetExtract II was implemented using Python 2.6.6 (r266:84297, Aug 24 2010, 18:46:32) [MSC v.1500 32 bit (Intel)] on win32. It uses the following packages:
Additionally, MetExtract II uses R version 2.15.2 (2012-10-26) Platform: i386-w64-mingw32/i386 (32-bit). The following R packages are required (and will be automatically installed if missing):