Hi Everyone,
We have commissioned RNA-seq and analysis by a company, which provided us with raw fastq files, BAM files, and a count matrix. They used hard clipping and Tophat for the alignment to GRCm38/Mm10. I have attempted to recreate their analysis with HISAT2 (same reference genome), using simply the default parameters and no separate trimming/clipping. I have used samtools to convert the SAM files to BAM files and compared the results from the company's analysis ("Tophat") with my own ("HISAT2") using IGV. The results are very confusing to me. The majority of genes I have (randomly) inspected look highly similar between both sets of BAM files. See this example gene (Tophat in blue, HISAT2 in red):
So far, so good. However, there are also multiple instance where one analysis picked up good reads, while the other did not. This is true in both directions. See these two example genes:
And, finally, there are some genes in which one alignment just looks weirdly skewed. For instance:
Does anyone know what might account for these differences? Or which alignment I should use for downstream analysis? I'd be grateful for any feedback!
Thomas
Please consider this.
You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.
But even with that in mind, it's still odd that something like Gapdh (non-repetitive sequence that is well annotated) would be so different.
Have a look at GAPDH in the UCSC genome browser, don't forget to activate the track with segmental duplications ("Segmental Dups").
It's clean for mm10 (I am assuming this is mouse based on the genes in the screenshot).
Yes, it's GRCm38/Mm10. I should have mentioned that (just edited the original entry to include the information). So you think this is an acceptable level of quality for Mm10 alignments?
There is not enough information in the original post to draw a solid conclusion. Yes there appear to be differences but we have no idea if one or both analyses have some problems associated with them.
Right, exactly my problem! I can't tell if either of these is better than the other, if one is seriously flawed or if both of them are acceptable. What additional information would be helpful in determining this?