Question

Forum:Bioinformatics analyses & errors: need for better post-publication tools?

2

Entering edit mode

8.5 years ago

ablanchetcohen ★ 1.2k

The peer review process is broken.

Once an article is published, there is no means of bringing corrections to the published article.

There is no mechanism to correct any error detected in a published article.

Neither the corresponding author, nor the editor, will generally respond to emails pointing out problems in published articles.

In 2016, every published article should at least have a comments section where the content of the article can be debated.

In frustration at the lack of any means to point out errors in published articles, I have started this thread to point out egregious errors in published articles. Inevitably, I will be asked to reproduce the results of the single article that used an incorrect analysis method, since this will be the article that got the desired results.

I'll admit that I myself do occasionally make errors, including perhaps in my interpretation of the articles below, but there really has to be some mechanism to point out mistakes in published papers. At the very least, there should be a forum where controversial points can be debated.

Article: "MicroRNA and mRNA Cargo of Extracellular Vesicles from Porcine Adipose Tissue-Derived Mesenchymal Stem Cells"

In the article cited, Erin et al. claim to have “comprehensively characterized the mRNA and miRNA expression profile of EVs derived from porcine adipose tissue-MSCs”. Yet, they prepared their library using “poly-A mRNA, purified from total RNA using oligo dT magnetic beads”. So, only poly-adenylated RNAs with a length of at least 50 bases should be present in their libraries. No mature miRNAs should be present.

The programs they use in their analysis, CAP-miRSeq and mirDeep2, were designed specifically for small RNA-Seq libraries. Erin et al. do not mention anywhere in their article any adaptation they brought to the programs to run them on long RNA-Seq libraries. I don’t see how a comprehensive miRNA analysis can be performed with MirDeep2 on a long RNA-Seq library, without any short RNAs.

Also, I cannot find a link to the raw or processed data, which should really be obligatory to provide with any NGS study.

Article: "Genome-wide profiling of the cardiac transcriptome after myocardial infarction identifies novel heart-specific long non-coding RNAs."

Sequence analysis of long RNA reads 100nt paired-end reads from 8 samples (4 Sham, 4 LAD) were mapped to mm9 reference genome using Tophat software version 2.0.5 (Trapnell et al., 2012) with option “Gene model” -G, using mm9 UCSC reference genes GTF (Karolchik et al., 2003). An ab initio transcript reconstruction was performed using Cufflinks, version 2.0.2 (Trapnell et al., 2012). The option “masking” (–G) was used, using mm9 UCSC reference genes GTF. The other parameters were default. The resulting GTFs were merged using Cuffmerge, version 2.0.2 (Roberts et al., 2011), using option –g with mm9 UCSC GTF as reference, allowing distinguishing known and novel transcripts.

WTF?? The Cufflinks option -G is for masking?

Am I making a mistake in my comprehension of how Cufflinks works?

next-gen RNA-Seq • 4.5k views

ADD COMMENT • link updated 19 months ago by Ram 44k • written 8.5 years ago by ablanchetcohen ★ 1.2k

5

Entering edit mode

WTF?? The Cufflinks option -G is for masking? Am I making a mistake in my comprehension of how Cufflinks works?

I believe this is called a typo. These things are supposed to be caught, but sometimes they don't. It really isn't that uncommon to spot typos like that in papers.

ADD REPLY • link 8.5 years ago by pld 5.1k

1

Entering edit mode

Agreed. The article may have more problems than just a typo though. The fact that they did not pick up on this typo in the Cufflinks parameter could point to more serious problems in the bioinformatics analysis. I'm still struggling to understand how they identified the strand of the novel lncRNAs identified by Cufflinks with unstranded RNA-Seq data.

When they count the reads, they treat the reads as unstranded. "Read counts were then calculated per gene from the alignment bam files using HTSeq (v0.5.4p2) with options -m union --stranded no."

However, when they identify novel lncRNAs, they claim that they can identify the strand to which the long ncRNA belongs, that is if the novel lncRNA is on the same strand or the opposite strand of the closest gene. http://eurheartj.oxfordjournals.org/highwire/filestream/626822/field_highwire_adjunct_files/6/ehu180supp_table5.pdf

I wrote the post because I couldn't understand their methodology, and the glaring typo concerning the Cufflinks parameter made me question the entire article.

The entire article is based on the output of Cufflinks, a notoriously unreliable tool to identify novel transcripts, so you would except them to be rigorous when discussing the Cufflinks parameters used, and the output of Cufflinks.

Basically, I was given the dataset by a researcher who asked me to identify novel lncRNAs in their dataset, using the same methodology they used, but with the mm10 reference genome and in a region of specific interest to him.

ADD REPLY • link 8.5 years ago by ablanchetcohen ★ 1.2k

6

Entering edit mode

I think you're being a bit melodramatic. The foundation of science is not crumbling beneath your feet. It is a typo, and a small one at that. I would not write off an entire article because of that. In being critical and detail oriented you're doing exactly what one should do when reading an article. However, a typo, the authors not explaining something to your personal satisfaction, and fundamental problems or errors are all different things.

I think you're touching on a larger issue with how scientific work is shared and the fact that journal articles are becoming increasingly cumbersome in cases where highly complex methods were used. Journals don't really provide a means for authors to give complete protocols. Materials and methods sections aren't designed to be a step by step protocol.

Trying to perform an experiment (wet or dry) using the materials and methods section of a paper, unless it is a Nature Protocols article or something like JOVI, is generally impossible and involves a fair amount of head scratching or trial and error. Even with something as detailed Nature Protocols, it can be difficult to implement a protocol. The authors of Cufflinks have a Nature Protocols paper, yet this website is filled with questions about cuff*.

Honestly, I think materials and methods sections are only good for providing context for the data presented in the paper. If I want to try and perform the protocol on my own, I need to contact the authors. If they don't come back, and I need that exact protocol, I'm on my own.

It sounds like you might want to have a conversation with the person who tasked you with this experiment. I'd discuss with them that the article is missing some details and you need to hear back from the authors about those details before you're able to move forward. You may also want to ask if the person wants the same type of analysis done (i.e. generate the same results) or if they truly want that exact analysis implemented verbatim. They may have some insight into the paper and associated methods that you don't. They may be unaware of the challenges involved in retooling a protocol. Either way, having an line of dialog with the people you're working with is important.

ADD REPLY • link 8.5 years ago by pld 5.1k

1

Entering edit mode

I think melodramatic really catches it. Reproducing wet-lab protocols must be much more difficult than informatics protocols, because those can be copy-pasted or shared on git-hub. We cannot share culture media like that. Ofc the computational environment also matters but much less than with lab protocols.

ADD REPLY • link 8.5 years ago by Michael 55k

1

Entering edit mode

In my experience at least, wet lab is tougher. Often many of the nuts and bolts values (volume of x), but it is also there's a ton of implicit things in the mechanics that are never in the paper, and can be tough to get from a protocol. Nothing beats trying several times without success to have someone go "oh you did x? Yeah, you never do x, you were supposed to do y".

Still, the other challenge is learning that there can often be disconnects between their goals and your goals. This can make retooling protocols tougher.

ADD REPLY • link 8.5 years ago by pld 5.1k

1

Entering edit mode

I half suspect that their sequencing was stranded but they used unstranded counting to try and get more counts (this usually doesn't work).

ADD REPLY • link 8.5 years ago by Devon Ryan 104k

1

Entering edit mode

I agree the reproducibility and inter-analyst concordance in bioinformatics need rigorous evaluation. Sometimes mistakes like these can happen, and it's often inadvertently missed by everyone in the authors list. The authors can quickly submit a corrigendum / erratum and fix these.

Services like CrossMark, f1000 etc. offers tools for continous publication and easy post-publication updates.

ADD REPLY • link 8.5 years ago by Khader Shameer 18k

4

Entering edit mode

See also: Pubmed Commons : A System That Enables Researchers To Share Their Opinions About Scientific Publications

ADD REPLY • link 8.5 years ago by Michael 55k

score 7 · Answer 1 · 2016-05-11

Once an article is published, there is no means of bringing corrections to the published article.

While I agree with the above...

I have started this thread to point out egregious errors in published articles.

...I would be careful with naming & shaming errors. This could push in the opposite direction since authors would release just the least necessary detail to publish.

Rather (and maybe this is your intent), I would encourage an environment where releasing little detail is frown upon but making (honest) mistakes is just "normal" and inevitable to some extent and therefore acceptable. Problem is for published work there is no issue tracker, bug reports and revision system like on github et al.

I'll admit that I myself do occasionally make errors

Man, I would be happy with "occasionally". I bet most of the scripts and stuff I have has at least some bug of some sort!

score 4 · Answer 2 · 2016-05-11

4

Entering edit mode

8.5 years ago

Lemire ▴ 940

Just go to https://pubpeer.com/

ADD COMMENT • link 8.5 years ago by Lemire ▴ 940

1

Entering edit mode

Very cool. I hope the website really takes off.

There is a huge problem in science. Those that take the most shortcuts are the most likely to get published. A crowd of readers is much more likely to detect errors in published papers, than just a few reviewers.

ADD REPLY • link 8.5 years ago by ablanchetcohen ★ 1.2k

1

Entering edit mode

You should post the comments on Pubpeer, and will also appear on the Pubmed.

ADD REPLY • link 8.5 years ago by Chirag Nepal ★ 2.4k

score 4 · Answer 3 · 2016-05-11

4

Entering edit mode

8.5 years ago

Michael 55k

We need to get the facts right before blaming:

Once an article is published, there is no means of bringing corrections to the published article.

This is absolutely false, there are publications that have corrections, and there are articles that have been retracted, if the problem is severe you can send letters to the editor after trying to contact the author.

Certainly there can be errors and mistakes, a single questionable parameter might warrant a correction or comment (-G does not exist, should be -M)

ADD COMMENT • link 8.5 years ago by Michael 55k

1

Entering edit mode

Yes , probably if it is a printing mistake from the press. But where is it written in the article, I could not find it , part of the cufflinks

ADD REPLY • link 8.5 years ago by ivivek_ngs ★ 5.2k

1

Entering edit mode

It's on page 6 of the supplementary material PDF file. http://eurheartj.oxfordjournals.org/highwire/filestream/626822/field_highwire_adjunct_files/0/ehu180supp.pdf

Admittedly, it's a very minor point. I'm just trying to reproduce the analysis for a researcher, using their dataset, and I'm having some difficulties following their methodology. I'm mostly trying to determine how they studied lncRNA on RNA-Seq data that does not appear to be stranded. "Read counts were then calculated per gene from the alignment bam files using HTSeq (v0.5.4p2) with options -m union --stranded no."

I'm still at the early stage of the analysis, but I went off on one of my rants when I saw the apparent error in the Cufflinks parameter. It's not clear to me how they treat overlapping lncRNA and protein-coding genes on opposite strands if their data is not stranded. But, I'm still working on it.

I've just been asked three times in the past month to reproduce non-sensical analyses, so I've become frustrated. The worse papers are the ones that rise to the top, since they are the one with the "exceptional" findings.

ADD REPLY • link 8.5 years ago by ablanchetcohen ★ 1.2k

1

Entering edit mode

I've just been asked three times in the past month to reproduce non-sensical analyses, so I've become frustrated

Were the analyses non-sensical to begin with or is that what you concluded after trying to reproduce them?

ADD REPLY • link 8.5 years ago by GenoMax 146k

0

Entering edit mode

Both. Generally, you trust published results and methodologies in respectable journals. It's only when you start delving deeper that the problems become clearer.

For example, I do not see how a comprehensive miRNA analysis can be done on a standard (long) RNA-Seq dataset from which small RNAs were filtered out during the standard library preparation. They just ran a software which was designed to do a comprehensive miRNA analysis on a dataset that did not contain any mature miRNAs.

I think this problem is far greater than generally acknowledged. The papers dating back to 2010-2012, when the analysis tools for NGS data were not well-established, are the most likely to contain errors.

ADD REPLY • link 8.5 years ago by ablanchetcohen ★ 1.2k

4

Entering edit mode

Generally, you trust published results and methodologies in respectable journals.

Ah to be young and naive again :)

ADD REPLY • link 8.5 years ago by Devon Ryan 104k

1

Entering edit mode

It is a long and difficult route.

First, your credentials will be questioned. For example, I don't have a PhD, so I've been told my criticisms would not be taken seriously, and that my criticisms would have more weight if a principal investigators expressed them.

Second, the general response when sending emails to corresponding authors or editors is complete silence.

Third, being able to confidently contradict published findings, requires a considerable investment in time.

Fourth, there is really no reward in pointing out errors. On the contrary, you worry about being labeled a trouble-maker.

It's only worth it if you find a fatal flaw in a very significant paper.

Several interesting solutions have been proposed in this thread to provide feedback on published articles. No consensus solution has yet been proposed though. I will give Pubpeer a try though, and hope that a concensus will eventually emerge on the best way of providing feedback on published articles. Biostars certainly already has a very effective system for providing feedback on posts. :)

ADD REPLY • link 8.5 years ago by ablanchetcohen ★ 1.2k

1

Entering edit mode

Ofc it is difficult, and one could say it needs to be easier to comment and authors and editors should be more responsive, your credibility should not depend on the PhD but only on the facts. However, you wrote, that there is no way to make a correction at all, and this is obviously not true because it happens. Also, there is at least one way to comment, but I understand your frustration.

ADD REPLY • link 8.5 years ago by Michael 55k

1

Entering edit mode

Second, the general response when sending emails to corresponding authors or editors is complete silence.

Are you saying you did contact these authors with the questions you raise in the post or are you making a general statement?

ADD REPLY • link 8.5 years ago by SES 8.6k

1

Entering edit mode

For the first paper, yes, I did email the corresponding author. I did not get any answer. I didn't contact the editor, since if my conclusions are correct, this would require the paper to be retracted or at least completely rewritten. This is a very drastic step. I would have nothing to gain by destroying their paper, and would probably make myself some enemies.

For the second paper, not yet. The error I have found up to now is relatively minor. I'm still in the process of verifying whether this minor error is representative of a larger problem with their study. Their entire paper is based on the novel lncRNAs they claim to have discovered, yet they make an egregious error in the description of the Cufflinks parameters they used, so I wouldn't be surprised if there were more serious problems with their study.

I will definitely be posting these comments on these papers in Pubpeer. If the entire scientific community does not agree to use the same tool to debate article, it won't be very useful though. Only the few people using Pubpeer will be able to debate the merits of published articles.

ADD REPLY • link 8.5 years ago by ablanchetcohen ★ 1.2k

score 4 · Answer 4 · 2016-05-11

I'll admit that I myself do occasionally make errors, including perhaps in my interpretation of the articles below

I think it would be wise to reflect more on this point. Science is hard and there are constraints on the length of a manuscript, so we can relate to your frustration with not understanding the methodology of a paper. I would recommend a little humility though, you have to be patient and learn to ask for help instead of going on a rant. Remember that everyone is busy so you might not get an immediate response from an author. The type of response you get, or whether you get a response at all, might have to do with your tone (judging by this post).

To be honest, the way you posed your questions here is unprofessional (esp. the title). Saying the publication system is broken, a paper needs to be rewritten/retracted, calling people out by name, and stating that the editor should be contacted over a question about a command-line parameter is an extreme overreaction. The way to get help is to contact the authors after you have searched online and, of course, ask on some forum after searching the web.

Real conflicts do arise as others have mentioned. If you have done a thorough scientific study and demonstrated there is a contradiction with some previous findings, then fire away. However, there is a way this type of issue should be handled. If you find a clear result, those results should be presented in a professional manner if you want the respect of your peers. It sounds like you are at an early stage of finding out what was done though, so I would personally focus on the questions rather than putting your opinions out there. If you look for fault in something I think you can always find it. That goes both ways, so keep in mind you will be opening yourself up to scrutiny as well.

score 3 · Answer 5 · 2016-05-11

I would also caution against the "name and shame" type of approaches, theses problems are endemic and pervasive. Ranting allows one to vent frustrations yet all that does is call out the symptoms but do nothing about the illness.

IMHO scientific data analyses should be distributed via a GitHub + Biostar like system that takes the best features of each. A system built to support interacting with versioned code and with the people that are interested in using it. But it should be built as distributed system that anyone could run and no one could "own". For that reason it should be built outside of either GitHub (or NCBI for that matter) for many reasons.

GitHub and its ilk are a commercial entities that may or may not exists with the same functionality in ten or twenty years (looking at you Google Code, Sourceforge etc) hence it would be ill advised to use that as the foundation of science.
The problem with centralized systems is that they create monopolistic entities that then go on to become "too big to fail", we are stuck with them and everyone else that could innovate will be locked out.

score 3 · Answer 6 · 2016-05-11

I agree that the thread title could have been more diplomatic. In my defense, I was tired and frustrated. My point is that there should be a clear mechanism to debate the merits of an article post-publication. The problems I find in published articles go beyond command line parameters.

Regarding my anonymity, I would have chosen a different username if I wished my posts to be anonymous, which seems rather cowardly.

The point is not to destroy published papers, but to encourage lively debate around them, just as there on any online forum. Just as the merits of my posts are dissected on Biostars, it should be possible to dissect the merits of an article online. Readers should be allowed to debate published papers, and have their claims either rejected or approved by the other readers, just like on Biostars, the people posting are either upvoted or criticized. I was not aware of the existence of Pubpeer. Although this is along the lines of the type of feature that I was thinking of, I'm not sure how widely used Pubpeer is, and therefore how effective a tool it is.

I did not expect this to be such a contentious thread, so I think I'll just go back to my analyses. I think I understand better now why the founder of Pubpeer wished to stay anonymous at first. :) http://www.sciencemag.org/news/2015/08/pubpeer-s-secret-out-founder-controversial-website-reveals-himself

I'll try not post any more on this thread, other than to provide an update on the strandedness of the library in the second article. I've had several excellent suggestions regarding the best means of providing feedback on articles, which I will be sure to follow up on.