Question

Tool:Just released VEBA 2.0 in Nucleic Acids Research, a modular genome-resolved metagenomics/metatranscriptomics software suite that can handle prokaryotes, (micro)eukaryotes, and viruses

2

Entering edit mode

11 months ago

O.rka ▴ 740

enter image description here

I've been developing the VEBA software suite during the beginning of COVID lockdown. Had a 2022 release in BMC Bioinformatics for VEBA 1.0 but this new release is much more expansive and reflects the full version I initially envisioned. In total, I've spent hundreds of hours developing, breaking/rebuilding, case studies, and writing/drafting this work and I hope it'll be as useful to the community as it has been for my research. For context, I was analyzing a lot of marine and human microbiomes running the same workflow over and over manually. At times I would miss a step and have to repeat an analysis and the bottleneck was often the in-between manual prep between key steps (e.g., aligning reads to assembly then sorting the assembly to be able to run a binning algorithm). I've basically compiled ~8 years of experience into a single modular software suite where I tried to think 2 steps ahead of an analysis when building each pipeline (e.g., I want to build a custom HUMAnN database, oh cool, I already have the UniRef annotations, genome clusters, and taxonomic classifications). I've been able to reduce the time it takes me to run a full metagenomics analysis (e.g., 96 MiSeq samples) from about a month to 2-4 days.

Here's the publication:

Espinoza JL, Phillips A, Prentice MB, Tan GS, Kamath PL, Lloyd KG, Dupont CL. Unveiling the microbial realm with VEBA 2.0: a modular bioinformatics suite for end-to-end genome-resolved prokaryotic, (micro)eukaryotic and viral multi-omics from either short- or long-read sequencing. Nucleic Acids Res. 2024 Jun 22:gkae528. doi: 10.1093/nar/gkae528. PMID: 38909293.

Here's the GitHub:

https://github.com/jolespin/veba

Here's all the walkthroughs:

https://github.com/jolespin/veba/blob/main/walkthroughs/README.md

Here's the YouTube channel with guided walkthroughs:

https://www.youtube.com/@VEBA-Multiomics

Here's the abstract:

The microbiome is a complex community of microorganisms, encompassing prokaryotic (bacterial and archaeal), eukaryotic, and viral entities. This microbial ensemble plays a pivotal role in influencing the health and productivity of diverse ecosystems while shaping the web of life. However, many software suites developed to study microbiomes analyze only the prokaryotic community and provide limited to no support for viruses and microeukaryotes. Previously, we introduced the Viral Eukaryotic Bacterial Archaeal (VEBA) open-source software suite to address this critical gap in microbiome research by extending genome-resolved analysis beyond prokaryotes to encompass the understudied realms of eukaryotes and viruses. Here we present VEBA 2.0 with key updates including a comprehensive clustered microeukaryotic protein database, rapid genome/protein-level clustering, bioprospecting, non-coding/organelle gene modeling, genome-resolved taxonomic/pathway profiling, long-read support, and containerization. We demonstrate VEBA’s versatile application through the analysis of diverse case studies including marine water, Siberian permafrost, and white-tailed deer lung tissues with the latter showcasing how to identify integrated viruses. VEBA represents a crucial advancement in microbiome research, offering a powerful and accessible software suite that bridges the gap between genomics and biotechnological solutions.

Here's some key highlights of VEBA in general:

Iterative consensus prokaryotic binning (unbinned contigs get sent back into the pipeline). Uses CheckM2 for quality assessment.
Eukaryotic gene modeling via MetaEuk using an expansive yet targeted clustered microeukaryotic protein database (e.g., doesn't contain prokaryotes or vertebrates or land plants, etc). Uses BUSCO for quality assessment.
Viral/plasmid identification, classification, and quality assessment with geNomad and CheckV.
Biosynthetic module reformats antiSMASH genbanks into tabular and fasta formats then performs some clustering in protein and nucleotide space
Phylogeny module uses concatenated protein alignments to build trees based on custom marker sets
Clustering is performed at the genome-level and protein-level to build pan genomes. These clustered results can be used downstream in the annotation, classification, mapping, profiling, etc. modules.
Since it's modular, you can hop in at any point. For example, at the beginning with raw reads, assemblies from somewhere else, genomes from somewhere else, or proteins from somewhere else. Or you can mix and match genomes/proteins recovered from VEBA or ones downloaded elsewhere.
Interoperable with other pipelines (no VEBA-specific binary files)

Here's some key updates between VEBA 1.0 -> VEBA 2.0:

VEBA Modules:

Expanded functionality, streamlined user-interface, and Docker containerization
Fast and memory-efficient genome- and protein-level clustering (Skani, MMseqs2/Diamond DeepClust)
Automatic calculation of feature compression ratios
Large/complex metagenomes and long-read technology support (SPAdes/metaSPAdes/rnaSPAdes/MEGAHIT, Flye/metaFlye)
Bioprospecting and natural product discovery support (antiSMASH)
Ribosomal RNA, transfer RNA, and organelle support (Barrnap, tRNAscan-SE, Tiara)
Genome-resolved taxonomic and pathway profiling (Sylph, HUMAnN)
Identification and classification of mobile genetic elements (geNomad, CheckV)
Native support for candidate phyla radiation quality assessment and memory- efficient genome classification (CheckM2)
Standalone support for generalized multi-split binning
Automated phylogenomic functional category feature engineering support
Visualizations of hierarchical data and phylogenies (FastTree2, VeryFastTree, IQTREE2, ETE3)
Added minimum alignment fraction threshold for genome clustering
Faster HMM protein annotations with PyHMMER

VEBA Database (VDB_v7):

Completely rebuilt VEBA's Microeukaryotic Protein Database to produce a clustered database MicroEuk100/90/50 similar to UniRef100/90/50. Available on doi:10.5281/zenodo.10139450.
Expanded protein annotation database
Updated GTDB r214.1 to GTDB r220

If you're interested in running it or have any questions, feel free to reach out here or my e-mail: jespinoz@jcvi.org

enter image description here

metatranscriptomics metagenomics microbiome veba • 1.4k views

ADD COMMENT • link 11 months ago by O.rka ▴ 740

0

Entering edit mode

Veba means plauge in Turkish so its a fitting name.

ADD REPLY • link 11 months ago by barslmn ★ 2.4k