It seems like every week I find at least one newly published paper that I'm excited to read after scanning the abstract (normally science focused or statistical papers, as opposed to bioinformatics methods papers) but after reading the methods section, I notice too many bioinformatics flaws that have me scratching my head as to why these issues weren't raised during peer review. Most of these articles appear in respected journals.
Without calling out any specific papers, some of the common RNA-Seq issues that bother me the most are:
- Not using a splice-aware aligner (such as TopHat) when aligning to the genome.
- Aligning to hg18 instead of hg19 (which was released over 4.5 years ago!), or other appropriately old annotation
- Using a "new" one-off method for differential expression analysis without comparing to commonly used DE tools or explaining why those tools weren't used.
- Not reporting version numbers for annotations and software tools used.
Am I just being too critical, or are others noticing a rise in these types of flawed bioinformatics analyses, especially with respect to RNA-Seq? My reasoning for why these errors not being caught is that the reviewers are experts in their respective field (biology, medicine, genetics, statistics, CS, etc), but they themselves are not involved in the regular processing/analysis of the data, so they are unfamiliar with all of the analysis details.
Should the peer review process change to account for the increasingly complex bioinformatics that are required in processing/analyzing sequencing data? Some journals that I've reviewed for ask if the article under review (1) involves statistical methods and (2) whether I am qualified to review these statistical methods. Should journals start asking the same questions, but for bioinformatics? Would this help ensure that all of the methods are appropriately reviewed by bioinformatics experts? Any other ideas to solve this issue?
I agree that doing such a thorough review is taxing, and fixing most of the mistakes that I see will likely have minor impact on the results/conclusions. But I've also seen "groundbreaking" papers where shotty bioinformatics is responsible for the seemingly novel results, which is pretty disturbing.
For a good example, see this Science article and comments: http://www.ncbi.nlm.nih.gov/pubmed?term=widespread[Title]%20AND%20rna[Title]%20AND%20dna[Title]%20AND%20differences[Title]%20AND%20human[Title]