can't find why certain reads are unmapped by salmon
0
0
Entering edit mode
5.8 years ago
vaushev ▴ 20

I am trying to perform standard qunatification by salmon, using its default quasi-mapping mode. The index was built on the latest GENCODE:

salmon index -t gencode.v29.transcripts.fa.gz -i gencode_transcriptome_index_salmon

Then I run mapping:

salmon quant -i gencode_transcriptome_index_salmon -l SF -r mate1.fq.gz -o quant1 --writeUnmappedNames --validateMappings

(in reality I have paired-end readings, but for simplified example I show as a stranded single-end). In the output, I get >30% unmapped reads. After manual BLAST, it seems that most abundant of them actually belong to ENST00000316193.12 - transcript that looks quite ordinary, is present in the gencode, etc. From the 150nt of the read, there's just a single mismatch, with all others being a perfect match - so I really don't understand why it is unmapped. Below is a piece of the actual source fastq file with such "unmappable" reads, and then alignment from manual BLAST:

@A00261:111:HFJ5KDSXX:3:1101:28700:5259 1:N:0:AACAACCA+GGTGCGAA
CCCCGAACCACTCAGGGTCCTGTGGACAGCTCACCTAGTGGCAATGGCTCCAGGCTCCCGGACGTCCCTGCTCCTGGCTTTTGCCCTGCTCTGCCTGCCCTGGCTTCAAGAGGCTGGTGCCGTCCAAACCGTTCCGTTATCCAGGCTTTT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00261:111:HFJ5KDSXX:3:1102:25473:13667 1:N:0:AACAACCA+GGTGCGAA
GGACAGCTCACCTAGTGGCAATGGCTCCAGGCTCCCGGACGTCCCTGCTCCTGGCTTTTGCCCTGCTCTGCCTGCCCTGGCTTCAAGAGGCTGGTGCCGTCCAAACCGTTCCGTTATCCAGGCTTTTTGACCACGCTATGCTCCAAGCCC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFFFFFF:FFFFFFFFFFFF,FFFFFFFFFFFF

_

Query  1    CCCCGAACCACTCAGGGTCCTGTGGACAGCTCACCTAGTGGCAATGGCTCCAGGCTCCCGGACGTCCCTGCTCCTGGCTTTTGCCCTGCTCTGCCTGCCCTGGCTTCAAGAGGCTGGTGC  120
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  108  CCCCGAACCACTCAGGGTCCTGTGGACAGCTCACCTAGTGGCAATGGCTCCAGGCTCCCGGACGTCCCTGCTCCTGGCTTTTGCCCTGCTCTGCCTGCCCTGGCTTCAAGAGGCTGGTGC  227

Query  121  CGTCCAAACCGTTCCGTTATCCAGGCTTTT  150
            ||||||||||||||| ||||||||||||||
Sbjct  228  CGTCCAAACCGTTCCCTTATCCAGGCTTTT  257

_

Query  1    GGACAGCTCACCTAGTGGCAATGGCTCCAGGCTCCCGGACGTCCCTGCTCCTGGCTTTTGCCCTGCTCTGCCTGCCCTGGCTTCAAGAGGCTGGTGCCGTCCAAACCGTTCCGTTATCCA  120
            |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||
Sbjct  131  GGACAGCTCACCTAGTGGCAATGGCTCCAGGCTCCCGGACGTCCCTGCTCCTGGCTTTTGCCCTGCTCTGCCTGCCCTGGCTTCAAGAGGCTGGTGCCGTCCAAACCGTTCCCTTATCCA  250

Query  121  GGCTTTTTGACCACGCTATGCTCCAAGCCC  150
            ||||||||||||||||||||||||||||||
Sbjct  251  GGCTTTTTGACCACGCTATGCTCCAAGCCC  280

I would be grateful if someone could explain what happens here and why such reads cannot be mapped.

salmon RNA-Seq • 976 views
ADD COMMENT

Login before adding your answer.

Traffic: 1148 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6