Entering edit mode
6 weeks ago
Nicholas
•
0
Salmon makes a number of modifications to its output in response to different biases it detects. Is it bad, then to remove certain classes of reads by using another aligner and removing, say, non coding sequences before using salmon? If that isn't okay, is there an alternative way to do that afterwards?
I'm sorry I didn't note I am using ribo-seq data. I did a little experiment and it seems that removing noncoding sequences before processing with Salmon results in approximately half the number of transcripts mapped to a particular gene. I wonder if that is because, the reads being so short, they easily map to certain genes even if they don't belong? Whereas if the noncoding sequences are taken away, there are fewer transcripts to mistakenly map?
I think the more likely explanation is it affects how Salmon does bias correction. I used --gcBias --seqBias --validateMappings
In full, I first take my files and map them to a refseq reference of noncoding sequences, I only pass to Salmon the transcripts are not filtered out.
It can be a good idea to align against a reference of noncoding sequences using something like bowtie2 and then discarding those reads. It’s what I do before using either pseudoalignment or STAR if I know I have significant noncoding elements like rRNA, tRNA, LINE/SINE, etc.