The peer review process is broken.
Once an article is published, there is no means of bringing corrections to the published article.
There is no mechanism to correct any error detected in a published article.
Neither the corresponding author, nor the editor, will generally respond to emails pointing out problems in published articles.
In 2016, every published article should at least have a comments section where the content of the article can be debated.
In frustration at the lack of any means to point out errors in published articles, I have started this thread to point out egregious errors in published articles. Inevitably, I will be asked to reproduce the results of the single article that used an incorrect analysis method, since this will be the article that got the desired results.
I'll admit that I myself do occasionally make errors, including perhaps in my interpretation of the articles below, but there really has to be some mechanism to point out mistakes in published papers. At the very least, there should be a forum where controversial points can be debated.
Article: "MicroRNA and mRNA Cargo of Extracellular Vesicles from Porcine Adipose Tissue-Derived Mesenchymal Stem Cells"
In the article cited, Erin et al. claim to have “comprehensively characterized the mRNA and miRNA expression profile of EVs derived from porcine adipose tissue-MSCs”. Yet, they prepared their library using “poly-A mRNA, purified from total RNA using oligo dT magnetic beads”. So, only poly-adenylated RNAs with a length of at least 50 bases should be present in their libraries. No mature miRNAs should be present.
The programs they use in their analysis, CAP-miRSeq and mirDeep2, were designed specifically for small RNA-Seq libraries. Erin et al. do not mention anywhere in their article any adaptation they brought to the programs to run them on long RNA-Seq libraries. I don’t see how a comprehensive miRNA analysis can be performed with MirDeep2 on a long RNA-Seq library, without any short RNAs.
Also, I cannot find a link to the raw or processed data, which should really be obligatory to provide with any NGS study.
Article: "Genome-wide profiling of the cardiac transcriptome after myocardial infarction identifies novel heart-specific long non-coding RNAs."
Sequence analysis of long RNA reads 100nt paired-end reads from 8 samples (4 Sham, 4 LAD) were mapped to mm9 reference genome using Tophat software version 2.0.5 (Trapnell et al., 2012) with option “Gene model” -G, using mm9 UCSC reference genes GTF (Karolchik et al., 2003). An ab initio transcript reconstruction was performed using Cufflinks, version 2.0.2 (Trapnell et al., 2012). The option “masking” (–G) was used, using mm9 UCSC reference genes GTF. The other parameters were default. The resulting GTFs were merged using Cuffmerge, version 2.0.2 (Roberts et al., 2011), using option –g with mm9 UCSC GTF as reference, allowing distinguishing known and novel transcripts.
WTF?? The Cufflinks option -G
is for masking?
Am I making a mistake in my comprehension of how Cufflinks works?
I believe this is called a typo. These things are supposed to be caught, but sometimes they don't. It really isn't that uncommon to spot typos like that in papers.
Agreed. The article may have more problems than just a typo though. The fact that they did not pick up on this typo in the Cufflinks parameter could point to more serious problems in the bioinformatics analysis. I'm still struggling to understand how they identified the strand of the novel lncRNAs identified by Cufflinks with unstranded RNA-Seq data.
When they count the reads, they treat the reads as unstranded. "Read counts were then calculated per gene from the alignment bam files using HTSeq (v0.5.4p2) with options -m union --stranded no."
However, when they identify novel lncRNAs, they claim that they can identify the strand to which the long ncRNA belongs, that is if the novel lncRNA is on the same strand or the opposite strand of the closest gene. http://eurheartj.oxfordjournals.org/highwire/filestream/626822/field_highwire_adjunct_files/6/ehu180supp_table5.pdf
I wrote the post because I couldn't understand their methodology, and the glaring typo concerning the Cufflinks parameter made me question the entire article.
The entire article is based on the output of Cufflinks, a notoriously unreliable tool to identify novel transcripts, so you would except them to be rigorous when discussing the Cufflinks parameters used, and the output of Cufflinks.
Basically, I was given the dataset by a researcher who asked me to identify novel lncRNAs in their dataset, using the same methodology they used, but with the mm10 reference genome and in a region of specific interest to him.
I think you're being a bit melodramatic. The foundation of science is not crumbling beneath your feet. It is a typo, and a small one at that. I would not write off an entire article because of that. In being critical and detail oriented you're doing exactly what one should do when reading an article. However, a typo, the authors not explaining something to your personal satisfaction, and fundamental problems or errors are all different things.
I think you're touching on a larger issue with how scientific work is shared and the fact that journal articles are becoming increasingly cumbersome in cases where highly complex methods were used. Journals don't really provide a means for authors to give complete protocols. Materials and methods sections aren't designed to be a step by step protocol.
Trying to perform an experiment (wet or dry) using the materials and methods section of a paper, unless it is a Nature Protocols article or something like JOVI, is generally impossible and involves a fair amount of head scratching or trial and error. Even with something as detailed Nature Protocols, it can be difficult to implement a protocol. The authors of Cufflinks have a Nature Protocols paper, yet this website is filled with questions about cuff*.
Honestly, I think materials and methods sections are only good for providing context for the data presented in the paper. If I want to try and perform the protocol on my own, I need to contact the authors. If they don't come back, and I need that exact protocol, I'm on my own.
It sounds like you might want to have a conversation with the person who tasked you with this experiment. I'd discuss with them that the article is missing some details and you need to hear back from the authors about those details before you're able to move forward. You may also want to ask if the person wants the same type of analysis done (i.e. generate the same results) or if they truly want that exact analysis implemented verbatim. They may have some insight into the paper and associated methods that you don't. They may be unaware of the challenges involved in retooling a protocol. Either way, having an line of dialog with the people you're working with is important.
I think melodramatic really catches it. Reproducing wet-lab protocols must be much more difficult than informatics protocols, because those can be copy-pasted or shared on git-hub. We cannot share culture media like that. Ofc the computational environment also matters but much less than with lab protocols.
In my experience at least, wet lab is tougher. Often many of the nuts and bolts values (volume of x), but it is also there's a ton of implicit things in the mechanics that are never in the paper, and can be tough to get from a protocol. Nothing beats trying several times without success to have someone go "oh you did x? Yeah, you never do x, you were supposed to do y".
Still, the other challenge is learning that there can often be disconnects between their goals and your goals. This can make retooling protocols tougher.
I half suspect that their sequencing was stranded but they used unstranded counting to try and get more counts (this usually doesn't work).
I agree the reproducibility and inter-analyst concordance in bioinformatics need rigorous evaluation. Sometimes mistakes like these can happen, and it's often inadvertently missed by everyone in the authors list. The authors can quickly submit a corrigendum / erratum and fix these.
Services like CrossMark, f1000 etc. offers tools for continous publication and easy post-publication updates.
See also: Pubmed Commons : A System That Enables Researchers To Share Their Opinions About Scientific Publications