I analysed my data with Salmon. Truncated result table is below. I am comparing different methods (Bowtie, Hisat, Rsubread and Salmon), I am using Ensembl annotation. My protein/transcript of interest has 6 splice variants ("ENST00000337653.7", "ENST00000395562.2", "ENST00000351556.7", "ENST00000339797.5", "ENST00000395559.6", "ENST00000640822.1" – or the same without the suffix after the dot). For Salmon index, I am using pre-built index from refgenomes.databio.org, which should be, if I understand it correctly, from NCBI. However, in the result table, there are only 5 ENSTs (Image 1) and furthermore, one of them is wrong? It is written as a ENST00000337653.6, though it should be ENST00000337653.7. Plus, the salmon result table (image 2) shows completely different transcript present in my sample (compare images 1 and 2).
1) Why is one transcript missing completely (ENST00000395562.2)?
2) How can there be a mistake in transcript name?
3) Can salmon be wrong when Rsubread, Bowtie2 and Hisat2 show different results?
- salmon
- Bowtie, Hisat, Rsubread
Correct, salmon will only include transcripts in the output that appear in the input file. No transcript is ever created/invented/assembled by salmon itself. On the other hand, depending on how the index has been built, sequence identical transcripts could have been collapsed.
To add...
Same transcript, different versions (look at the number after the period; one is version 6 the other is version 7 even though the actual transcript ID, i.e. ENST00000337653, is the same). This clearly indicates you aren't using the same version of the genome between the different tools (i.e. your "pre-built index" was constructed from a different version of the genome/annotation).
Thank you, I will try to make my own index then. I don't want to bother you, but how would you explain very different results from bowtie/hisat/rsubread showing one result and salmon another? Which one would you trust more?
Answers in this thread should clarify differences between the program groups referred to above: Could you explain the difference between STAR, KALLISTO, SALMON etc. to experimental Biologist/non-bioinformatician Results should not be vastly different no matter what program you use .. as long as your data is of good quality and there are biological replicates.
Thank you very much