The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,
Ultra-efficient, unified discovery from microbial sequencing with SPLASH and precise statistical assembly | bioRxiv (www.biorxiv.org)
Here, we show that SPLASH, our published approach for reference-free discovery and analysis directly from raw reads, and an improved statistical assembly algorithm, compactors, unify diverse tasks in microbial sequence analysis: discovering new mobile elements and CRISPR arrays missing from any reference, and generating rapid, metadata-free strain typing of diverse bacteria.
submitted by: Istvan Albert
edgeR 4.0: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets | bioRxiv (www.biorxiv.org)
This article announces edgeR version 4, which includes new developments across a range of application areas. Infrastructure improvements include support for fractional counts, implementation of model fitting in C++, and a new statistical treatment of the quasi-likelihood pipeline that improves accuracy for small counts. The revised package has new functionality for differential methylation analysis, differential transcript expression, differential transcript and exon usage, testing relative to a fold-change threshold and pathway analysis.
submitted by: Istvan Albert
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btae038/7585532
Here we present BCFtools/liftover, a tool to convert genomic coordinates across genome assemblies for variants encoded in the variant call format with improved support for indels represented by different reference alleles across genome assemblies and full support for multi-allelic variants. It further supports variant annotation fields updates whenever the reference allele changes across genome assemblies.
submitted by: Istvan Albert
Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses (journals.asm.org)
Considering it is common to find as much as 100-fold variation in the number of 16S rRNA gene sequences across samples in a study, researchers need to control for the effect of uneven sequencing effort. How to do this has become a contentious question. Some have argued that rarefying or rarefaction is “inadmissible” because it omits valid data. [...] Although all methods of controlling for uneven sequencing effort had an acceptable false detection rate when samples were randomly assigned to two treatment groups, rarefaction was consistently able to control for differences in sequencing effort when sequencing depth was confounded with treatment group. Finally, the statistical power to detect differences in alpha and beta diversity metrics was consistently the highest when using rarefaction. These simulations underscore the importance of using rarefaction to normalize the number of sequences across samples in amplicon sequencing analyses.
submitted by: Istvan Albert
Fulgor: a fast and compact k-mer index for large-scale matching and color queries | Algorithms for Molecular Biology (link.springer.com)
We describe an efficient colored de Bruijn graph index, arising as the combination of a k-mer dictionary with a compressed inverted index. The proposed index takes full advantage of the fact that unitigs in the colored compacted de Bruijn graph are monochromatic (i.e., all k-mers in a unitig have the same set of references of origin, or color) [...] We implement these methods in a tool called Fulgor, and conduct an extensive experimental analysis to demonstrate the improvement of our tool over previous solutions. For example, compared to Themisto—the strongest competitor in terms of index space vs. query time trade-off—Fulgor requires significantly less space (up to 43% less space for a collection of 150,000 Salmonella enterica genomes), is at least twice as fast for color queries, and is 2–6 faster to construct.
submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription