The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,
Exaggerated false positives by popular differential expression methods when analyzing human population samples | Genome Biology | Full Text (genomebiology.biomedcentral.com)
When identifying differentially expressed genes between two conditions using human population RNA-seq samples, we found a phenomenon by permutation analysis: two popular bioinformatics methods, DESeq2 and edgeR, have unexpectedly high false discovery rates. Expanding the analysis to limma-voom, NOISeq, dearseq, and Wilcoxon rank-sum test, we found that FDR control is often failed except for the Wilcoxon rank-sum test. Particularly, the actual FDRs of DESeq2 and edgeR sometimes exceed 20% when the target FDR is 5%. Based on these results, for population-level RNA-seq studies with large sample sizes, we recommend the Wilcoxon rank-sum test.
submitted by: Istvan Albert
The HiFi difference - true long reads vs. synthetic long reads (www.pacb.com)
I have always been a huge fan of PacBio Hifi sequencing and strongly recommend it. Leaps and bounds better than the alternatives.
submitted by: Istvan Albert
A thread on differential expression testing with large sample sizes. The null hypothesis of popular methods such as limma, edgeR, DESeq2 etc. is that there is absolutely no difference between the two groups (or, in more general designs, no significant effect in a linear model) /1
— Wolfgang Huber πΊπ¦ (@wolfgangkhuber) March 17, 2022
A thread on differential expression testing with large sample sizes. The null hypothesis of popular methods such as limma, edgeR, DESeq2 etc. is that there is absolutely no difference between the two groups (or, in more general designs, no significant effect in a linear model) /1
— Wolfgang Huber πΊπ¦ (@wolfgangkhuber) March 17, 2022submitted by: Istvan Albert
Urgent need for consistent standards in functional enrichment analysis (journals.plos.org)
Gene set enrichment tests (a.k.a. functional enrichment analysis) are among the most frequently used methods in computational biology. Despite this popularity, there are concerns that these methods are being applied incorrectly and the results of some peer-reviewed publications are unreliable. These problems include the use of inappropriate background gene lists, lack of false discovery rate correction and lack of methodological detail. [...] Using seven independent RNA-seq datasets, we show misuse of enrichment tools alters results substantially. In conclusion, most published functional enrichment studies suffered from one or more major flaws, highlighting the need for stronger standards for enrichment analysis.
submitted by: Istvan Albert
Computational Biologists: What are your most-used and favorite software tools?
reply in this thread or dm me!
— Han Spinner (@H__Spinner) March 15, 2022
Computational Biologists: What are your most-used and favorite software tools?
reply in this thread or dm me!
submitted by: Istvan Albert
CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments [PeerJ] (peerj.com)
We present a comprehensive, user-friendly MSA trimming tool with multiple visualisation options. Our highly customisable command line tool aims to give intervention power to the user by offering various options, and outputs graphical representations of the alignment before and after processing to give the user a clear overview of what has been removed.
submitted by: Istvan Albert
Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery | Genome Biology | Full Text (genomebiology.biomedcentral.com)
Accurate discovery of somatic variants is of central importance in cancer research. However, count statistics on discovered somatic insertions and deletions (indels) indicate that large amounts of discoveries are missed because of the quantification of uncertainties related to gap and alignment ambiguities, twilight zone indels, cancer heterogeneity, sample purity, sampling, and strand bias. We provide a unifying statistical model whose dependency structures enable accurate quantification of all inherent uncertainties in short time. Consequently, false discovery rate (FDR) in somatic indel discovery can now be controlled at utmost accuracy, increasing the amount of true discoveries while safely suppressing the FDR.
submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription