Question

Pre-pseudoalignment preprocessing

0

Entering edit mode

5 months ago

Nicholas • 0

Salmon makes a number of modifications to its output in response to different biases it detects. Is it bad, then to remove certain classes of reads by using another aligner and removing, say, non coding sequences before using salmon? If that isn't okay, is there an alternative way to do that afterwards?

salmon • 780 views

ADD COMMENT • link updated 5 months ago by dsull ★ 7.4k • written 5 months ago by Nicholas • 0

score 1 · Answer 1 · 2024-11-05

1

Entering edit mode

5 months ago

ATpoint 87k

I do not see the point in doing such a filtering. If later you want to focus on certain gene biotypes then filter the count table for it, but suggested upstream processing is non-standard and laborious, while a subset on a count table literally takes seconds.

ADD COMMENT • link 5 months ago by ATpoint 87k

0

Entering edit mode

I'm sorry I didn't note I am using ribo-seq data. I did a little experiment and it seems that removing noncoding sequences before processing with Salmon results in approximately half the number of transcripts mapped to a particular gene. I wonder if that is because, the reads being so short, they easily map to certain genes even if they don't belong? Whereas if the noncoding sequences are taken away, there are fewer transcripts to mistakenly map?

I think the more likely explanation is it affects how Salmon does bias correction. I used --gcBias --seqBias --validateMappings

In full, I first take my files and map them to a refseq reference of noncoding sequences, I only pass to Salmon the transcripts are not filtered out.

ADD REPLY • link 5 months ago by Nicholas • 0

1

Entering edit mode

It can be a good idea to align against a reference of noncoding sequences using something like bowtie2 and then discarding those reads. It’s what I do before using either pseudoalignment or STAR if I know I have significant noncoding elements like rRNA, tRNA, LINE/SINE, etc.

ADD REPLY • link 5 months ago by dsull ★ 7.4k