The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Mensur Dlakic, Istvan Albert, and was edited by Istvan Albert,
If you're studying #bioinformatics, there comes a time when you have to decide between:
- staying in academia
- going into industry
- give up #bioinformtics and do something else
Well, let me tell you what it was like for me to work as a #bioinformatics in the biotech… pic.twitter.com/svmQBdfLgt
— Liz Tseng (@Magdoll) February 25, 2024
If you're studying #bioinformatics, there comes a time when you have to decide between:
- staying in academia
- going into industry
- give up #bioinformtics and do something else
Well, let me tell you what it was like for me to work as a #bioinformatics in the biotech… pic.twitter.com/svmQBdfLgt
submitted by: Istvan Albert
Commonly used software tools produce conflicting and overly-optimistic AUPRC values | bioRxiv (www.biorxiv.org)
Do we trust our tools unconditionally?
submitted by: Mensur Dlakic
Significant Updates Coming to the NCBI Datasets APIs and Command-Line Tools - NCBI Insights (ncbiinsights.ncbi.nlm.nih.gov)
As part of our ongoing effort to enhance your experience, we are updating the NCBI Datasets application programming interfaces (APIs). Beginning in June 2024, the v2alpha APIs will be promoted to the stable v2 version. At this time, the v1 API, the command-line interface (CLI) version 13 and older versions, and the Python library v1 will be deprecated and thus no longer supported for bug fixes or updates. Effective December 31, 2024, these will no longer be available for use.
submitted by: Istvan Albert
And Bioinformatics would be ~735% better if formats that are primarily processed by computer programs rather than people were designed and written in sane, efficient, and easily parsable machine-oriented formats rather than ill-posed "human readable" formats :). https://t.co/8Kf9L4bYC1
— 𝕐 (@rob@genomic.social) (@nomad421) February 20, 2024
And Bioinformatics would be ~735% better if formats that are primarily processed by computer programs rather than people were designed and written in sane, efficient, and easily parsable machine-oriented formats rather than ill-posed "human readable" formats :). https://t.co/8Kf9L4bYC1
— 𝕐 (@rob@genomic.social) (@nomad421) February 20, 2024submitted by: Istvan Albert
https://academic.oup.com/bioinformatics/article/35/3/421/5055585
General-purpose processors can now contain many dozens of processor cores and support hundreds of simultaneous threads of execution. To make best use of these threads, genomics software must contend with new and subtle computer architecture issues. We discuss some of these and propose methods for improving thread scaling in tools that analyze each read independently, such as read aligners.
We implement these methods in new versions of Bowtie, Bowtie 2 and HISAT. We greatly improve thread scaling in many scenarios, including on the recent Intel Xeon Phi architecture. We also highlight how bottlenecks are exacerbated by variable-record-length file formats like FASTQ and suggest changes that enable superior scaling.
submitted by: Istvan Albert
A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset | BMC Biology | Full Text (bmcbiol.biomedcentral.com)
Here we report an open-source high-performance computing genome variant calling workflow (HPC-GVCW) for GATK that can run on multiple computing platforms from supercomputers to desktop machines. We benchmarked HPC-GVCW on multiple crop species for performance and accuracy with comparable results with previously published reports (using GATK alone). Finally, we used HPC-GVCW in production mode to call SNPs on a “subpopulation aware” 16-genome rice reference panel with ~ 3000 resequenced rice accessions. The entire process took ~ 16 weeks and resulted in the identification of an average of 27.3 M SNPs/genome and the discovery of ~ 2.3 million novel SNPs that were not present in the flagship reference genome for rice (i.e., IRGSP RefSeq).
submitted by: Istvan Albert
The publication of the whole genomes from the US @AllofUsResearch cohort is great to see, but the choice of how to represent an overview of the genetic relationships has (rightly) drawn controversy, in particular how the concepts of ethnicity and race are mapped to it.
— Ewan Birney (@ewanbirney) February 20, 2024
The publication of the whole genomes from the US @AllofUsResearch cohort is great to see, but the choice of how to represent an overview of the genetic relationships has (rightly) drawn controversy, in particular how the concepts of ethnicity and race are mapped to it.
— Ewan Birney (@ewanbirney) February 20, 2024submitted by: Istvan Albert
This paper from @AllofUsResearch needs to be retracted by @nature immediately. Under pretext of inclusivity it features a scientifically invalid representation of genetic diversity and race that is going to feature in racist literature for decades. https://t.co/5RQ46dZgkV
— Michael Eisen (@mbeisen) February 20, 2024
This paper from @AllofUsResearch needs to be retracted by @nature immediately. Under pretext of inclusivity it features a scientifically invalid representation of genetic diversity and race that is going to feature in racist literature for decades. https://t.co/5RQ46dZgkV
— Michael Eisen (@mbeisen) February 20, 2024submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription