Question

Low mapping rate with Salmon

0

Entering edit mode

7 months ago

Patadu94 • 0

Hi,

I am working with a RNAseq deriving from fungus and at the moment I have used Salmon for quantification. However, I have noticed that between STAR and Salmon, the mapping rate has lowered consistently. In fact, in Salmon, the mapping rate is almost the half than the one in STAR.

I could get higher mapping rate using fmd compared to quasi. Nevertheless, I think the percentage is still to low. I have been reading online that it might be due to contamination with rRNA and gDNA. Do you know how I could check that?

This is the code I used for indexing and for quantification:

source ~//miniconda3/etc/profile.d/conda.sh
conda activate Salmon

/media/bulk_01/users/guest21/miniconda3/envs/Salmon/bin/salmon index -t /media/scratchpad_01/guest21/Verticillium_dahliae_VdLs.17_ASM15067v2/data/GCA_000150675.2/VdLS17_gtf.fa \
-i /media/scratchpad_01/guest21/Verticillium_dahliae_VdLs.17_ASM15067v2/data/VdLS17_Indexed_Salmon \
fmd

/media/bulk_01/users/guest21/miniconda3/envs/Salmon/bin/salmon quant -i /media/scratchpad_01/guest21/Verticillium_dahliae_VdLs.17_ASM15067v2/data/VdLS17_Indexed_Salmon \
--libType A -1 /media/scratchpad_01/guest21/Output_RNAseq_30mpi/Vd_CTRL_30mpi_1/Trimmed_Data_Output/Vd_CTRL_R1_forward_paired.fastq.gz \
-2 /media/scratchpad_01/guest21/Output_RNAseq_30mpi/Vd_CTRL_30mpi_1/Trimmed_Data_Output/Vd_CTRL_R2_reverse_paired.fastq.gz \
-o /media/scratchpad_01/guest21/Output_RNAseq_30mpi/Salmon_CTRL_30mpi_1_transcripts.salmon \
--validateMappings

Note: as you can see I am not using decoys, as this is just a reccomended step by Salmon and I also do not know how to obtain the file.

RNA-seq Salmon Quantification • 1.4k views

ADD COMMENT • link updated 7 months ago by i.sudbery 20k • written 7 months ago by Patadu94 • 0

0

Entering edit mode

In fact, in Salmon, the mapping rate is almost the half than the one in STAR.

Those two programs are not doing the same thing. Please see @Devon's answer here --> Alignment and mapping

I am not using decoys, as this is just a reccomended step by Salmon and I also do not know how to obtain the file

If you have the genome sequence of your organism available, you simply append that at the end of your transcriptome. The process for doing that is described in salmon manual --> https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode

ADD REPLY • link 7 months ago by GenoMax 147k

0

Entering edit mode

Thanks for the link GenoMax. But I was wondering, should not alignment and mapping have a similar rate and thus be correlated? In my case I know that 90% of my read aligned to the genome (so I know precisely that there is a correspondence between my reads and the genome I use) but I only know that 45% are mapped to my genome (based on Devon's "where did they come from"). Does that mean that the reads that are not mapped to my trascriptome are not exons/coding genes? I know it might be a silly question, but I really would like to understand that.

Also, I have tried to update salmon to the newest version, 1.10.2 and the mapped reads slightly increased.

ADD REPLY • link 7 months ago by Patadu94 • 0

0

Entering edit mode

Does that mean that the reads that are not mapped to my trascriptome are not exons/coding genes?

Reads are likely aligning in regions where there is no expressed sequence known to be present. The reads could represent novel RNA's that you did not know about or at worse that could be contamination in your prep (DNA, that is an extreme example).

ADD REPLY • link 7 months ago by GenoMax 147k

0

Entering edit mode

How would I check if these reads are aligning to regions where there is no expressed sequence know? Should I follow the suggestion of i.sudbery? For DNA contamination, isn't checked when the company performs RNAseq?

ADD REPLY • link 7 months ago by Patadu94 • 0

0

Entering edit mode

I would note that even on a good, polyA selected, RNA-seq run, I would only expect 60-75% of mapped reads to map to protein coding exons.

ADD REPLY • link 7 months ago by i.sudbery 20k

0

Entering edit mode

I'll try to make the decoy file again but I remember you also told me that it is only recommended and not mandatory for running salmon. I think I once tried to make the decoy file, but I had some names discrepancies between the files and thus it returned an error.

ADD REPLY • link 7 months ago by Patadu94 • 0

0

Entering edit mode

I would start by checking what fraction of the reads mapped by STAR map to genes and not intergentic/intronic sequence.

ADD REPLY • link 7 months ago by i.sudbery 20k

0

Entering edit mode

For doing this, should I just look at the log.out.file?

ADD REPLY • link 7 months ago by Patadu94 • 0

0

Entering edit mode

Take your STAR alignment and sum all the counts for each gene. You can either do this by providing an annotation to STAR, or by running feature counts on the BAM file that comes out of STAR.

If you start with 30M reads, but when you add up all the counts that map to all the genes, and find only 15M reads, you'll know that 50% of your reads arn't mapping to annotated genes.