Question

Herald:The Biostar Herald for Tuesday, March 22, 2022

1

Entering edit mode

3.0 years ago

Biostar 3.4k

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,

Exaggerated false positives by popular differential expression methods when analyzing human population samples | Genome Biology | Full Text (genomebiology.biomedcentral.com)

When identifying differentially expressed genes between two conditions using human population RNA-seq samples, we found a phenomenon by permutation analysis: two popular bioinformatics methods, DESeq2 and edgeR, have unexpectedly high false discovery rates. Expanding the analysis to limma-voom, NOISeq, dearseq, and Wilcoxon rank-sum test, we found that FDR control is often failed except for the Wilcoxon rank-sum test. Particularly, the actual FDRs of DESeq2 and edgeR sometimes exceed 20% when the target FDR is 5%. Based on these results, for population-level RNA-seq studies with large sample sizes, we recommend the Wilcoxon rank-sum test.

submitted by: Istvan Albert

The HiFi difference - true long reads vs. synthetic long reads (www.pacb.com)

I have always been a huge fan of PacBio Hifi sequencing and strongly recommend it. Leaps and bounds better than the alternatives.

submitted by: Istvan Albert

A thread on differential expression testing with large sample sizes. The null hypothesis of popular methods such as limma, edgeR, DESeq2 etc. is that there is absolutely no difference between the two groups (or, in more general designs, no significant effect in a linear model) /1
— Wolfgang Huber 🇺🇦 (@wolfgangkhuber) March 17, 2022

submitted by: Istvan Albert

Urgent need for consistent standards in functional enrichment analysis (journals.plos.org)

Gene set enrichment tests (a.k.a. functional enrichment analysis) are among the most frequently used methods in computational biology. Despite this popularity, there are concerns that these methods are being applied incorrectly and the results of some peer-reviewed publications are unreliable. These problems include the use of inappropriate background gene lists, lack of false discovery rate correction and lack of methodological detail. [...] Using seven independent RNA-seq datasets, we show misuse of enrichment tools alters results substantially. In conclusion, most published functional enrichment studies suffered from one or more major flaws, highlighting the need for stronger standards for enrichment analysis.

submitted by: Istvan Albert

Computational Biologists: What are your most-used and favorite software tools?

reply in this thread or dm me!
— Han Spinner (@H__Spinner) March 15, 2022

submitted by: Istvan Albert

CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments [PeerJ] (peerj.com)

We present a comprehensive, user-friendly MSA trimming tool with multiple visualisation options. Our highly customisable command line tool aims to give intervention power to the user by offering various options, and outputs graphical representations of the alignment before and after processing to give the user a clear overview of what has been removed.

submitted by: Istvan Albert

Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery | Genome Biology | Full Text (genomebiology.biomedcentral.com)

Accurate discovery of somatic variants is of central importance in cancer research. However, count statistics on discovered somatic insertions and deletions (indels) indicate that large amounts of discoveries are missed because of the quantification of uncertainties related to gap and alignment ambiguities, twilight zone indels, cancer heterogeneity, sample purity, sampling, and strand bias. We provide a unifying statistical model whose dependency structures enable accurate quantification of all inherent uncertainties in short time. Consequently, false discovery rate (FDR) in somatic indel discovery can now be controlled at utmost accuracy, increasing the amount of true discoveries while safely suppressing the FDR.

submitted by: Istvan Albert

Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription

herald • 902 views

ADD COMMENT • link updated 3.0 years ago by dariober 15k • written 3.0 years ago by Biostar 3.4k

score 1 · Answer 1 · 2022-03-22

Not sure if this is the right place for this comment... I only skimmed through the paper Exagerated false positives....

Figure 1A is surprising because it shows that you get considerably more genes with low FDR in the shuffled datasets (on average) than in the correctly labeled dataset:

enter image description here

I struggle to see how this could be... Here's a thought experiment: In the original dataset test for DE between patients with Odd vs Even month of birth. No gene should pop up but the figure shows you should expect more genes with low FDR than if testing the real condition... How can this be? "Odd vs Even month of birth" is a valid comparison that doesn't violate any assumption.

Exaggerated false positives by popular differential expression methods when analyzing human population samples | Genome Biology | Full Text (genomebiology.biomedcentral.com)

The HiFi difference - true long reads vs. synthetic long reads (www.pacb.com)

Urgent need for consistent standards in functional enrichment analysis (journals.plos.org)

Computational Biologists: What are your most-used and favorite software tools?reply in this thread or dm me!— Han Spinner (@H__Spinner) March 15, 2022

CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments [PeerJ] (peerj.com)

Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery | Genome Biology | Full Text (genomebiology.biomedcentral.com)

Computational Biologists: What are your most-used and favorite software tools?

reply in this thread or dm me!
— Han Spinner (@H__Spinner) March 15, 2022