The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from GenoMax, Istvan Albert, and was edited by Istvan Albert,
Vega Benchtop Sequencer (www.pacb.com)
New benchtop Hi-Fi sequencer from PacBio.
submitted by: GenoMax
Vcfexpress: flexible, rapid user-expressions to filter and format VCFs | bioRxiv (www.biorxiv.org)
Here, we introduce vcfexpress, a new, high-performance toolset for the analysis of VCF files, written in the Rust programming language. It is nearly as fast as BCFTools, but adds functionality to execute user expressions in the lua programming language for precise filtering and reporting of variants from a VCF or BCF file. We demonstrate performance and flexibility by comparing vcfexpress to other tools using the vembrane benchmark.
code: https://github.com/brentp/vcfexpress
submitted by: Istvan Albert
Experience of irreproducibility as a risk factor for poor mental health in biomedical science doctoral students: A survey and interview-based study | PLOS ONE (journals.plos.org)
High rates of irreproducibility and of poor mental health in graduate students have been reported in the biomedical sciences in the past ten years, but to date, little research has investigated whether these two trends interact. In this study, we ask whether the experience of failing to replicate an expected finding impacts graduate students’ mental health. [...] We found that almost all participants had experience with irreproducibility: 84% had failed to replicate their own results, 70% had failed to replicate a colleague’s finding, and 58% had failed to replicate a result from the published literature. Participants reported feelings of self-doubt, frustration, and depression while experiencing irreproducibility, and in 24% of cases, these emotional responses were strong enough to interfere with participants’ eating, sleeping, or ability to work. A majority (82%) of participants initially believed that the anomalous results could be attributed to their own error.
submitted by: Istvan Albert
Initialization is critical for preserving global data structure in both t-SNE and UMAP | Nature Biotechnology (www.nature.com)
One of the most ubiquitous analysis tools in single-cell transcriptomics and cytometry is t-distributed stochastic neighbor embedding (t-SNE)1, which is used to visualize individual cells as points on a two-dimensional scatterplot such that similar cells are positioned close together2. A related algorithm, called uniform manifold approximation and projection (UMAP)3, has attracted substantial attention in the single-cell community [...] Here we show that this alleged superiority of UMAP can be entirely attributed to different choices of initialization in the implementations used by Becht et al.: the t-SNE implementations by default used random initialization, while the UMAP implementation used a technique called Laplacian eigenmaps (LE)5 to initialize the embedding. We show that UMAP with random initialization preserves global structure as poorly as t-SNE with random initialization, while t-SNE with informative initialization performs as well as UMAP with informative initialization.
submitted by: Istvan Albert
Prof. Nikolai Slavov on Twitter (x.com)
UMAP was introduced to biology with the claim that it 'preserves the global structure of the data' better than t-SNE.
This matters arising article shows that the claimed global structure preservation can be entirely attributed to different choices of initialization.
submitted by: Istvan Albert
UCSC Genome Browser on Twitter (x.com)
We are pleased to announce the release of the Genome in a Bottle Problematic Regions tracks for the hg38 and hs1 human assemblies.
Learn more about the release from the following news post:
https://genome.ucsc.edu/goldenPath/newsarch.html#110424
submitted by: Istvan Albert
The GIAB genomic stratifications resource for human reference genomes | Nature Communications (www.nature.com)
Despite the growing variety of sequencing and variant-calling tools, no workflow performs equally well across the entire human genome. Understanding context-dependent performance is critical for enabling researchers, clinicians, and developers to make informed tradeoffs when selecting sequencing hardware and software. Here we describe a set of “stratifications,” which are BED files that define distinct contexts throughout the genome. We define these for GRCh37/38 as well as the new T2T-CHM13 reference, adding many new hard-to-sequence regions which are critical for understanding performance as the field progresses.
submitted by: Istvan Albert
BioArt: Build figures, presentations, and illustrations (bioart.niaid.nih.gov)
Mentioned by @Jared Andrews in biostars slack.
No-cost alternative to Biorender for making pretty presentations with scientific illustrations.
submitted by: GenoMax
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription