Question

Non-existent transcript as a result from Salmon

0

Entering edit mode

4 months ago

Vojtěch ▴ 10

I analysed my data with Salmon. Truncated result table is below. I am comparing different methods (Bowtie, Hisat, Rsubread and Salmon), I am using Ensembl annotation. My protein/transcript of interest has 6 splice variants ("ENST00000337653.7", "ENST00000395562.2", "ENST00000351556.7", "ENST00000339797.5", "ENST00000395559.6", "ENST00000640822.1" – or the same without the suffix after the dot). For Salmon index, I am using pre-built index from refgenomes.databio.org, which should be, if I understand it correctly, from NCBI. However, in the result table, there are only 5 ENSTs (Image 1) and furthermore, one of them is wrong? It is written as a ENST00000337653.6, though it should be ENST00000337653.7. Plus, the salmon result table (image 2) shows completely different transcript present in my sample (compare images 1 and 2).

1) Why is one transcript missing completely (ENST00000395562.2)?

2) How can there be a mistake in transcript name?

3) Can salmon be wrong when Rsubread, Bowtie2 and Hisat2 show different results?

salmon

Salmon results

Bowtie, Hisat, Rsubread

Alignment tools results

RNAseq salmon • 658 views

ADD COMMENT • link 3 months ago by Vojtěch ▴ 10

score 3 · Accepted Answer · 2024-08-17

3

Entering edit mode

4 months ago

ATpoint 86k

I do not follow. salmon does not magically make up transcripts. It uses exactly what is in the fasta you give it to build the index but, important!, it by default removes exact sequence duplicates, keeping only the first one in the order of the fasta file. Hence:

1) be sure your testing uses the exact same references everywhere so sources and versions 2) check whether that sequence might be some of duplicate that was removed, the indexing log documents that 3) consider to rebuild the salmon index yourself you're unsure how it was built

ADD COMMENT • link 4 months ago by ATpoint 86k

2

Entering edit mode

Correct, salmon will only include transcripts in the output that appear in the input file. No transcript is ever created/invented/assembled by salmon itself. On the other hand, depending on how the index has been built, sequence identical transcripts could have been collapsed.

ADD REPLY • link 4 months ago by Rob 6.9k

2

Entering edit mode

To add...

ENST00000337653.6, though it should be ENST00000337653.7

Same transcript, different versions (look at the number after the period; one is version 6 the other is version 7 even though the actual transcript ID, i.e. ENST00000337653, is the same). This clearly indicates you aren't using the same version of the genome between the different tools (i.e. your "pre-built index" was constructed from a different version of the genome/annotation).

ADD REPLY • link 4 months ago by dsull ★ 7.0k

0

Entering edit mode

Thank you, I will try to make my own index then. I don't want to bother you, but how would you explain very different results from bowtie/hisat/rsubread showing one result and salmon another? Which one would you trust more?

ADD REPLY • link 4 months ago by Vojtěch ▴ 10

1

Entering edit mode

Answers in this thread should clarify differences between the program groups referred to above: Could you explain the difference between STAR, KALLISTO, SALMON etc. to experimental Biologist/non-bioinformatician Results should not be vastly different no matter what program you use .. as long as your data is of good quality and there are biological replicates.