Salmon low mapping rates
1
0
Entering edit mode
6 months ago
L.Chang • 0

Hi, I have previously used STAR for RNASeq and wanted to try using salmon. I find that the mapping rate is rather low, ~40% for every sample. This total RNA has been extracted from whole blood and rRNA depleted. Is the 40% mapping rate due to the rRNA depletion? I have tried lower the k and using different reference file but it doesn't seem to change the mapping rate much. Fastqc of the data looks fine.

I have some of my code below,

TRANSCRIPTOME=$RESOURCEDIR/gencode.v46.transcripts.fa.gz
GENOME=$RESOURCEDIR/GRCh38.p14.genome.fa.gz

echo "Creating decoy file"
grep "^>" <(gunzip -c $GENOME) | cut -d " " -f 1 > decoys_gencode.txt
sed -i.bak -e 's/>//g' decoys_gencode.txt

echo "Concatenating transcriptome and genome"
cat $TRANSCRIPTOME $GENOME > gentrome_gencode.fa.gz

Index the reference fasta
echo "Indexing"
salmon index -t gentrome_gencode.fa.gz -d decoys_gencode.txt -k 31 -p 12 -i salmon_index_gencode --gencode

Alignment
for fn in *_trimmed_R1_001.fastq.gz;
do
SAMPLE=`basename ${fn} | sed 's/_trimmed_R1_001.fastq.gz//g'`
INDEX=$RESOURCEDIR/salmon_index_gencode/
echo "Processing sample ${SAMPLE}"
salmon quant --index $INDEX --libType A \
         -1 ${SAMPLE}_trimmed_R1_001.fastq.gz \
         -2 ${SAMPLE}_trimmed_R2_001.fastq.gz \
         --threads 12 --validateMappings --output quants/${SAMPLE}_quant

done

If anyone could share some insight and whether or not I can just proceed to DESEQ2 that would be greatly appreciated.

Reference Mapping Salmon RNA-seq • 364 views
ADD COMMENT
0
Entering edit mode
1 day ago
jnechacov • 0

Hi! The ~40% mapping rate you’re observing with Salmon could indeed be related to the rRNA depletion process, especially since you’re working with total RNA extracted from whole blood. Even with rRNA depletion, residual ribosomal RNA (rRNA) can remain and reduce the mapping rate to the transcriptome, as Salmon is designed to quantify transcript-level sequences.

Here are a few suggestions to improve your mapping rate:

Confirm rRNA Depletion Efficiency: If a significant amount of rRNA remains, it can interfere with accurate quantification. You can visualize this by aligning reads against known rRNA sequences and checking for residual rRNA contamination.

Check for Globin mRNA: Blood samples often contain abundant globin mRNA, which can also reduce the effective mapping rate to the transcriptome.

Use PureRec DSN for Improved rRNA and Globin Depletion: Zymo Research’s PureRec Duplex-Specific Nuclease (DSN) can efficiently reduce both residual rRNA and abundant globin transcripts prior to library preparation. DSN selectively digests double-stranded RNA-DNA hybrids, greatly enhancing the quality of RNA-Seq libraries by removing unwanted high-abundance transcripts. This can lead to better mapping rates and a more comprehensive representation of low-abundance transcripts.

Evaluate Indexing Parameters: You mentioned lowering the k parameter, which is a good step. Ensure that the combined transcriptome and genome (gentrome) indexing process is optimized for your dataset.

Proceeding to DESeq2: If the mapping rate remains low, double-check the quality of the quantified data before proceeding with differential expression analysis using DESeq2. Low mapping rates might skew downstream results, so resolving this issue first is advisable.

In summary, addressing rRNA and globin depletion with a tool like PureRec DSN could significantly improve your mapping rate and the overall quality of your RNA-Seq data.

I hope this helps—good luck with your analysis!

ADD COMMENT

Login before adding your answer.

Traffic: 2642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6