Bioinformatics Service Line


Analysis of all data acquired through both the Discovery and Targeted Proteomics Service Lines will be performed by the Bioinformatics Service Line located at UAMS. Importantly, our sophisticated bioinformatics analyses are included for each sample analysis performed by the resource. The output of this Service Line is publication quality data specific for the user needs. This provides a unique opportunity for IDeA investigators to have state-of-the-art proteomics data analyzed with cutting-edge bioinformatic approaches – providing data to support NIGMS relevant research and publications.


Services

The Discovery and Targeted Proteomics Service Lines follow a similar quantitative workflow for data analysis. This includes performing a database search using one of the search algorithms against a specific protein database, quality control, normalization, differential expression analysis, pathway and gene set enrichment analysis, visualization, and preparing data for publication.

Label-free intensity-based quantification: Quantification by precursor intensity is performed using MaxQuant and determines peptide and protein abundances directly from raw data. This intensity based quantitation is calculated using the iBAQ algorithm, which sums the intensities of observed peptides and normalizes against the number of peptides in the protein that are predicted to be observed in a mass spectrometry experiment

Label-free spectral counting quantification: High-resolution tandem mass spectrometric data can be analyzed by spectral counting. Spectral counting is a form of relative quantitation limited to large changes between comparative samples, such as from affinity purifications where specific pulldowns show strong enrichment vs controls. Spectral counts are normalized to account for the total amount of protein in a given sample by calculating the Normalized Spectral Abundance Factor (NSAF).

TMT Quantification: The basis for a TMT approach is to label up to 11 different biological samples with isobaric TMT tags that will allow deconvolution of multiplexed samples by indexed tags. Data will be searched in-house through MaxQuant to identify proteins and extract MS3 reporter ion intensities. The data is then analyzed through the proteiNorm and ProteoViz in-house tools for complete analysis. We can also perform TMT analyses for metaproteomics.

Phosphoproteomics: Both the total protein lysate and the phosphopeptide enriched lysates are analyzed. The analysis then accounts for changes at the protein level versus the PTM level.

Histone PTMs: Histone PTM analysis requires special consideration. The analysis must account for changes at the protein level versus changes at the PTM level. A relative abundance is calculated to provide a percentage of how much of the protein is modified by a specific modification. We also calculate beta and m-values used in DNA methylation experiments.

Targeted analysis: The total response for each protein is calculated as the geometric mean of all peptides measured for that protein, typically 2. Amount of each protein in the sample, in pmol, is determined by dividing the total response of the protein by the total response of the bovine serum albumin and multiplying by the amount of albumin added. Finally, concentration as pmol/µg total protein is calculated by dividing by the amount of protein taken for analysis.

Statistical and Pathway Analysis: There are multiple statistical approaches for proteomics data that are dependent on the user’s research question and the data distribution (e.g., t-tests, ANOVA, linear mixed models such as Limma, ROTS, non-parametric alternatives, data mining algorithms). Pathway and gene set enrichment analysis (e.g., Ingenuity Pathway Analysis (IPA), EGSEA, PTMSig) will also be employed to discover interactions among the proteins and how they relate to biological pathways. Protein interaction networks can be identified using STRING.

Data deliverables: Proteomics database search results will be delivered as a Scaffold file. The quantitative results will be delivered through a link to the core’s shinyapps.io by RStudio account. The user only needs to click on the link to start interacting with their data. Alternatively, users can run the scripts provided on the core’s GitHub repository locally on their own computers. We will also provide tables listing all proteins identified, the raw and normalized intensities and/or counts, and the statistical results. We will also provide publication quality figures and methods for manuscripts.


Software Utilized

Major Software: The facility utilizes free open-source software contributed by the scientific community (such as Skyline, MaxQuant, Pantherdb, STRING, etc.) and packages from R/Bioconductor (such as Limma, ComBat, Normalyzer, BoxCox, EGSEA, PTMSig, etc.). Commercial software includes Mascot (Matrix Science), Scaffold Q+S (Proteome Software), PEAKS (Bioinformatics Solutions Inc.), and Ingenuity Pathway Analysis (Qiagen).

proteiNorm: We have developed and now routinely use this tool for quality testing, filtering of peptides/proteins, testing various normalization methods, and performing differential analysis directly from database search results.

ProteoViz: To facilitate the process of interpreting statistical results of phosphoproteomics studies, we have developed a set of tools, ProteoViz, for exploration of the processed data.  Following data acquisition, the tool performs data preprocessing, differential analysis, pathway and gene set enrichment analysis, identifies motifs and kinases, and provides visualization of all the results using an interactive shiny dashboard.

PTMViz for histone PTM analysis: For analysis of histone PTMs, we have developed a shiny dashboard that allows a user to upload the histone PTM MS1 intensity values, perform quality control, normalization, differential expression, and visualization. The histone PTM intensities are converted to relative abundance, beta, and m-values in order to account for changes at the protein level.