Non-existent transcript as a result from Salmon
1
0
Entering edit mode
4 months ago
Vojtěch ▴ 10

I analysed my data with Salmon. Truncated result table is below. I am comparing different methods (Bowtie, Hisat, Rsubread and Salmon), I am using Ensembl annotation. My protein/transcript of interest has 6 splice variants ("ENST00000337653.7", "ENST00000395562.2", "ENST00000351556.7", "ENST00000339797.5", "ENST00000395559.6", "ENST00000640822.1" – or the same without the suffix after the dot). For Salmon index, I am using pre-built index from refgenomes.databio.org, which should be, if I understand it correctly, from NCBI. However, in the result table, there are only 5 ENSTs (Image 1) and furthermore, one of them is wrong? It is written as a ENST00000337653.6, though it should be ENST00000337653.7. Plus, the salmon result table (image 2) shows completely different transcript present in my sample (compare images 1 and 2).

1) Why is one transcript missing completely (ENST00000395562.2)?

2) How can there be a mistake in transcript name?

3) Can salmon be wrong when Rsubread, Bowtie2 and Hisat2 show different results?

  • salmon

Salmon results

  • Bowtie, Hisat, Rsubread

Alignment tools results

RNAseq salmon • 658 views
ADD COMMENT
3
Entering edit mode
4 months ago
ATpoint 86k

I do not follow. salmon does not magically make up transcripts. It uses exactly what is in the fasta you give it to build the index but, important!, it by default removes exact sequence duplicates, keeping only the first one in the order of the fasta file. Hence:

1) be sure your testing uses the exact same references everywhere so sources and versions 2) check whether that sequence might be some of duplicate that was removed, the indexing log documents that 3) consider to rebuild the salmon index yourself you're unsure how it was built

ADD COMMENT
2
Entering edit mode

Correct, salmon will only include transcripts in the output that appear in the input file. No transcript is ever created/invented/assembled by salmon itself. On the other hand, depending on how the index has been built, sequence identical transcripts could have been collapsed.

ADD REPLY
2
Entering edit mode

To add...

ENST00000337653.6, though it should be ENST00000337653.7

Same transcript, different versions (look at the number after the period; one is version 6 the other is version 7 even though the actual transcript ID, i.e. ENST00000337653, is the same). This clearly indicates you aren't using the same version of the genome between the different tools (i.e. your "pre-built index" was constructed from a different version of the genome/annotation).

ADD REPLY
0
Entering edit mode

Thank you, I will try to make my own index then. I don't want to bother you, but how would you explain very different results from bowtie/hisat/rsubread showing one result and salmon another? Which one would you trust more?

ADD REPLY
1
Entering edit mode

Answers in this thread should clarify differences between the program groups referred to above: Could you explain the difference between STAR, KALLISTO, SALMON etc. to experimental Biologist/non-bioinformatician Results should not be vastly different no matter what program you use .. as long as your data is of good quality and there are biological replicates.

ADD REPLY
0
Entering edit mode

Thank you very much

ADD REPLY

Login before adding your answer.

Traffic: 2125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6