Should we keep only the uniquely mapped reads for gene expression
1
0
Entering edit mode
3.4 years ago
User000 ▴ 710

Hello,

I am doing the alignment of my RNA-seq paired-end reads with

a.HISAT2 --> stringtie --> DEseq2 
b.STAR --> salmon --> DEseq2

Is it necessary to keep only the uniquely mapped reads before doing gene count?

samtools view -b -q 40 -o output.bam alignments.bam
RNA-seq • 2.0k views
ADD COMMENT
3
Entering edit mode
3.4 years ago

No. Well, at least not for salmon, as half the point of it is that it deals with multimappers in an intelligent way via a modified expectation maximization algorithm. Most other reasonable gene quantification programs (e.g. RSEM, kallisto) will attempt to deal with multimappers as well. I haven't used stringtie in a very long time, so I don't remember what it does with them.

ADD COMMENT
0
Entering edit mode

the paper says: if a fragment aligns in n places, than that fragment alignment will contribute 1/n to the edge capacity. But it is not clear to me at all. thanks for your reply

ADD REPLY
0
Entering edit mode

bbmap.sh will allow you to select a random location from among all locations where a read maps equally well (ambig=random option). I can't vouch for statistical validity of that approach bit it seems logical if you don't want to throw away multi-mapping data.

ADD REPLY
0
Entering edit mode

Yeah, Salmon won't work properly I don't think unless you keep multimappers in.

And I just wouldn't use stringtie for quantification. But if I were to, I'd exclude multimapping reads. I'm also not sure its possible to use DESeq2 on the results of StringTie. I think it outputs TPMs, rather than counts.

ADD REPLY
0
Entering edit mode

thanks for your replies. Stringtie outputs FPKM and I actually used it with DESeq2 as log(FPKM+1). For salmon output should I use tximportData to merge all counts (not TMPs?). Sorry I know it is another question.

ADD REPLY
1
Entering edit mode

Yes, follow the tximport vignette for salmon. It handles it quite painlessly.

ADD REPLY
0
Entering edit mode

You can't use log(FPKM+1) in DESeq2, it has to be read counts (it will run, but the results it produces will not be valid). Read counts are the only things that follow the negative binomial distribution used by DESeq2. log(FPKM+1) will instead follow something closer to a log-normal distribution.

ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6