The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,
x.com (x.com)
The importance of peer-reviewed bioinformatics methods: a short rant about a recent paper by some leading scientists in my field. Most scientists would agree that well-engineered computational methods are critically important in genomics... 1/8
submitted by: Istvan Albert
Winsorization greatly reduces false positives by popular differential expression methods when analyzing human population samples | Genome Biology | Full Text (genomebiology.biomedcentral.com)
A recent study found severely inflated type I error rates for DESeq2 and edgeR, two dominant tools used for differential expression analysis of RNA-seq data. Here, we show that by properly addressing the outliers in the RNA-Seq data using winsorization, the type I error rate of DESeq2 and edgeR can be substantially reduced, and the power is comparable to Wilcoxon rank-sum test for large datasets. Therefore, as an alternative to Wilcoxon rank-sum test, they may still be applied for differential expression analysis of large RNA-Seq datasets.
Editors note:
Three papers from Genome Biology:
- First claims that DeSeq and edger suffer from inflated FDRs and that we should all be using Wilcoxon rank tests instead
- The second states that the inflated FDRs reported in the first paper is an artifact of incorrect data generation and that the Wilcoxon test is actually worse
- Third paper states that we can fix the the inflated FDRs reported in the first paper to make them like the Wilcoxon tests by applying a winsorization - an outlier replacement strategy
... there you have it - even less clarity than before
submitted by: Istvan Albert
Exaggerated false positives by popular differential expression methods when analyzing human population samples | Genome Biology | Full Text (genomebiology.biomedcentral.com)
When identifying differentially expressed genes between two conditions using human population RNA-seq samples, we found a phenomenon by permutation analysis: two popular bioinformatics methods, DESeq2 and edgeR, have unexpectedly high false discovery rates. Expanding the analysis to limma-voom, NOISeq, dearseq, and Wilcoxon rank-sum test, we found that FDR control is often failed except for the Wilcoxon rank-sum test. Particularly, the actual FDRs of DESeq2 and edgeR sometimes exceed 20% when the target FDR is 5%. Based on these results, for population-level RNA-seq studies with large sample sizes, we recommend the Wilcoxon rank-sum test.
submitted by: Istvan Albert
Neglecting the impact of normalization in semi-synthetic RNA-seq data simulations generates artificial false positives | Genome Biology | Full Text (genomebiology.biomedcentral.com)
A recent study reported exaggerated false positives by popular differential expression methods when analyzing large population samples. We reproduce the differential expression analysis simulation results and identify a caveat in the data generation process. Data not truly generated under the null hypothesis led to incorrect comparisons of benchmark methods. We provide corrected simulation results that demonstrate the good performance of dearseq and argue against the superiority of the Wilcoxon rank-sum test as suggested in the previous study.
submitted by: Istvan Albert
Using Bactopia with AllTheBacteria Assemblies - Bactopia (bactopia.github.io)
AllTheBacteria (ATB) is a collection of nearly 2,000,000 bacterial assemblies. In this post you'll learn how to use Bactopia to seamlessly analyze these assemblies with the available Bactopia Tools.
submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription
just FYI, anyone that doesn't have an twitter/X login can no longer view x.com threads. it would be nice if everyone left that toxic platform but in the meantime, it would also be nice if we could see the threads in some other back up form
The platform formerly known as Twitter is hardly appropriate for being linked to from a forum like this any longer given the readily apparent supremacist and regressive beliefs of the platform's owner and their supporters.
What's wrong with using Twitter for bioinformatics? If you don't like someone's political views, just don't read their posts.
tl;dr Twitter is full of Nazis and other supremacists whose abhorrent views are being shoved down everyone's throats whether these other people want this or not.
This is no longer really possible on the platform formerly known as Twitter, given that supremacist and regressive broadcasts are forced into the feeds of all users, whether they want it or not. Please refer to relevant reporting here for example: https://fortune.com/2024/10/30/study-shows-elon-musk-tweets-pro-trump-appear-x-users-feeds-within-2-sessions/ .
I will also note that the "political opinions" you suggest one tolerate, as it pertains to the supremacist and regressive beliefs peddled by Twitter's owner and their supporters, are not (and never were) benign. What they collectively peddle is also anti-scientific, and seeks to encourage discriminating against individuals on the basis of (essentially harmless) traits such as gender and phenotypic make up.
That is not the kind of discourse, nor set of outcomes, that people seeking to engage in the pursuit of science should seek to even inadvertently support. These are not "opinions" that should simply be ignored and must instead be condemned and protested. Not participating in platforms and spaces hijacked to spread these supremacist and regressive beliefs is one form of condemnation and protest.
We should all be trying to come together, despite our differences (chosen or assigned at random) and not support those that seek to split us apart on the basis of such differences.
Is this the second paper: https://doi.org/10.1186/s13059-024-03231-9 ?
Also, is Winsorization here basically trading false positives for false negatives?