Question

Pre-pseudoalignment preprocessing

0

Entering edit mode

6 weeks ago

Nicholas • 0

Salmon makes a number of modifications to its output in response to different biases it detects. Is it bad, then to remove certain classes of reads by using another aligner and removing, say, non coding sequences before using salmon? If that isn't okay, is there an alternative way to do that afterwards?

salmon • 623 views

ADD COMMENT • link updated 5 weeks ago by dsull ★ 7.0k • written 6 weeks ago by Nicholas • 0

score 1 · Answer 1 · 2024-11-05

1

Entering edit mode

6 weeks ago

ATpoint 86k

I do not see the point in doing such a filtering. If later you want to focus on certain gene biotypes then filter the count table for it, but suggested upstream processing is non-standard and laborious, while a subset on a count table literally takes seconds.

ADD COMMENT • link 6 weeks ago by ATpoint 86k

0

Entering edit mode

I'm sorry I didn't note I am using ribo-seq data. I did a little experiment and it seems that removing noncoding sequences before processing with Salmon results in approximately half the number of transcripts mapped to a particular gene. I wonder if that is because, the reads being so short, they easily map to certain genes even if they don't belong? Whereas if the noncoding sequences are taken away, there are fewer transcripts to mistakenly map?

I think the more likely explanation is it affects how Salmon does bias correction. I used --gcBias --seqBias --validateMappings

In full, I first take my files and map them to a refseq reference of noncoding sequences, I only pass to Salmon the transcripts are not filtered out.

ADD REPLY • link 5 weeks ago by Nicholas • 0

1

Entering edit mode

It can be a good idea to align against a reference of noncoding sequences using something like bowtie2 and then discarding those reads. It’s what I do before using either pseudoalignment or STAR if I know I have significant noncoding elements like rRNA, tRNA, LINE/SINE, etc.

ADD REPLY • link 5 weeks ago by dsull ★ 7.0k