Pre-pseudoalignment preprocessing
1
0
Entering edit mode
18 days ago
Nicholas • 0

Salmon makes a number of modifications to its output in response to different biases it detects. Is it bad, then to remove certain classes of reads by using another aligner and removing, say, non coding sequences before using salmon? If that isn't okay, is there an alternative way to do that afterwards?

salmon • 552 views
ADD COMMENT
1
Entering edit mode
18 days ago
ATpoint 85k

I do not see the point in doing such a filtering. If later you want to focus on certain gene biotypes then filter the count table for it, but suggested upstream processing is non-standard and laborious, while a subset on a count table literally takes seconds.

ADD COMMENT
0
Entering edit mode

I'm sorry I didn't note I am using ribo-seq data. I did a little experiment and it seems that removing noncoding sequences before processing with Salmon results in approximately half the number of transcripts mapped to a particular gene. I wonder if that is because, the reads being so short, they easily map to certain genes even if they don't belong? Whereas if the noncoding sequences are taken away, there are fewer transcripts to mistakenly map?

I think the more likely explanation is it affects how Salmon does bias correction. I used --gcBias --seqBias --validateMappings

In full, I first take my files and map them to a refseq reference of noncoding sequences, I only pass to Salmon the transcripts are not filtered out.

ADD REPLY
1
Entering edit mode

It can be a good idea to align against a reference of noncoding sequences using something like bowtie2 and then discarding those reads. It’s what I do before using either pseudoalignment or STAR if I know I have significant noncoding elements like rRNA, tRNA, LINE/SINE, etc.

ADD REPLY

Login before adding your answer.

Traffic: 1586 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6