GenoSuite: The Complete Guide for Researchers and Clinicians
What GenoSuite is
GenoSuite is an automated proteogenomics pipeline (originally developed by Dhirendra Kumar and colleagues) that integrates mass-spectrometry proteomics data with genome sequences to discover and classify translated peptides and improve genome annotation. It was designed primarily for prokaryotic proteogenomic analysis and bundles multiple open-source peptide-identification engines.
Key features
- Multi-algorithm search: Configured to run OMSSA, X!Tandem, InsPecT and MassWiz (any combination can be used).
- Combined FDRScore integration: Merges results across search engines at the PSM level to control false discovery rates after integration.
- Novel peptide/protein detection: Identifies peptides not present in existing annotations and reports novel proteins or annotation changes.
- Strict protein-level filtering: Reports proteins with ≥2 peptides, or single-peptide proteins supported by multiple significant PSMs to limit protein-level FDR.
- Visualization & genomic context: Visualizes spectral matches and maps novel peptides to genomic coordinates (BED output) for genome-browser inspection.
- Prokaryote-focused tools: Includes utilities (e.g., ORF_mapper) for ORF/gene-prediction inputs (supports GFF) and a Prokaryotic Proteogenomic Tool (PPT) for novel-translation discovery.
- Contaminant list support (newer versions): Ability to add contaminant proteins to the search database to improve specificity.
Typical use cases
- Genome annotation or re-annotation for bacteria.
- Discovery of novel translation events (novel peptides/proteoforms).
- Integrating proteomics evidence to validate or correct gene models.
- Producing browser-ready genomic mappings of peptide evidence for publication or curation.
Inputs and outputs
- Inputs: Tandem mass-spec raw/converted spectra, genome sequence (translated database or six-frame translation), gene predictions (GFF), configurable search parameters, optional contaminant lists.
- Outputs: PSM-level identifications integrated across engines, lists of novel peptides/proteins, spectral visualizations, BED files/genomic mappings, and protein/peptide reports suitable for downstream analysis and citation.
Practical considerations
- Best suited for datasets with high-quality MS/MS spectra and matched genomic sequences.
- Combined multi-engine searches increase coverage but require more compute and careful FDR control (GenoSuite’s combined FDRScore addresses this).
- Primarily demonstrated on prokaryotes in publications
Leave a Reply