The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad614/7295550?login=false
The alevin-fry ecosystem provides a robust and growing suite of programs for single-cell data processing. However, as new single-cell technologies are introduced, as the community continues to adjust best practices for data processing, and as the alevin-fry ecosystem itself expands and grows, it is becoming increasingly important to manage the complexity of alevin-fry’s single-cell preprocessing workflows while retaining the performance and flexibility that make these tools enticing. We introduce simpleaf, a program that simplifies the processing of single-cell data using tools from the alevin-fry ecosystem, and adds new functionality and capabilities, while retaining the flexibility and performance of the underlying tools.
submitted by: Istvan Albert
ChatGPT use shows that the grant-application system is broken (www.nature.com)
The fact that artificial intelligence can do much of the work makes a mockery of the process. It’s time to make it easier for scientists to ask for research funding.
submitted by: Istvan Albert
Reproducibility trial: 246 biologists get different results from same data sets (www.nature.com)
In a massive exercise to examine reproducibility, more than 200 biologists analysed the same sets of ecological data — and got widely divergent results. The first sweeping study1 of its kind in ecology demonstrates how much results in the field can vary, not because of differences in the environment, but because of scientists’ analytical choices
submitted by: Istvan Albert
The microbiome field is hit and spicy. Bring it on. Last week was cancer. This week is heritability. https://t.co/poemd9Bdiz
— Seth Bordenstein (@Symbionticism) October 12, 2023
The microbiome field is hit and spicy. Bring it on. Last week was cancer. This week is heritability. https://t.co/poemd9Bdiz
— Seth Bordenstein (@Symbionticism) October 12, 2023submitted by: Istvan Albert
Relative abundance data can misrepresent heritability of the microbiome | Microbiome | Full Text (microbiomejournal.biomedcentral.com)
We derived an analytical approximation for the heritability that one obtains when using such relative, and not absolute, abundances, based on an underlying quantitative genetic model for absolute abundances. Based on this, we uncovered three problems that can arise when using relative abundances to estimate microbiome heritability: (1) the interdependency between taxa can lead to imprecise heritability estimates. This problem is most apparent for dominant taxa. (2) Large sample size leads to high false discovery rates. With enough statistical power, the result is a strong overestimation of the number of heritable taxa in a community. (3) Microbial co-abundances lead to biased heritability estimates.
submitted by: Istvan Albert
Page not available - PMC (www.ncbi.nlm.nih.gov)
Copy-number variations (CNVs) have important clinical implications for several diseases and cancers. We reviewed 50 popular CNV calling tools and included 11 tools for benchmarking in a reference cohort encompassing 39 whole genome sequencing (WGS) samples paired current clinical standard—SNP-array based CNV calling. For nine samples we also performed whole exome sequencing (WES), to address the effect of sequencing protocol on CNV calling. Furthermore, we included Gold Standard reference sample NA12878, and tested 12 samples with CNVs confirmed by multiplex ligation-dependent probe amplification (MLPA). Tool performance varied greatly in the number of called CNVs and bias for CNV lengths. Some tools had near-perfect recall of CNVs from arrays for some samples, but poor precision. We suggest combining the best tools also based on different methodologies: GATK gCNV, Lumpy, DELLY, and cn.MOPS.
submitted by: Istvan Albert
Unraveling the functional dark matter through global metagenomics | Nature (www.nature.com)
Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3
submitted by: Istvan Albert
Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats | bioRxiv (www.biorxiv.org)
Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. Unfortunately, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. To address this shortcoming, we developed Allo, a new approach to allocate multi-mapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multi-mapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multi-mapping read assignment.
submitted by: Istvan Albert
A comparison of short-read, HiFi long-read, and hybrid strategies for genome-resolved metagenomics | bioRxiv (www.biorxiv.org)
Our results suggest that while long-read sequencing significantly improves the quality of reconstructed bacterial genomes, it is more expensive and requires deeper sequencing than short-read approaches to recover a comparable amount of reconstructed genomes. The most optimal strategy is study-specific, and depends on how researchers assess the tradeoff between the quantity and quality of recovered genomes.
submitted by: Istvan Albert
Our paper debunking the 2020 cancer microbiome results is now published, in @mbiojournal. We'll see if @nature (or the authors) will retract these deeply flawed results. h/t @ProfBootyPhD @EricTopol https://t.co/hBvfGCvyGR
— Steven Salzberg 💙💛 (@StevenSalzberg1) October 9, 2023
Our paper debunking the 2020 cancer microbiome results is now published, in @mbiojournal. We'll see if @nature (or the authors) will retract these deeply flawed results. h/t @ProfBootyPhD @EricTopol https://t.co/hBvfGCvyGR
— Steven Salzberg 💙💛 (@StevenSalzberg1) October 9, 2023submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription