Entering edit mode
3.4 years ago
User000
▴
710
Hello,
I am doing the alignment of my RNA-seq paired-end reads with
a.HISAT2 --> stringtie --> DEseq2
b.STAR --> salmon --> DEseq2
Is it necessary to keep only the uniquely mapped reads before doing gene count?
samtools view -b -q 40 -o output.bam alignments.bam
the paper says: if a fragment aligns in n places, than that fragment alignment will contribute 1/n to the edge capacity. But it is not clear to me at all. thanks for your reply
bbmap.sh
will allow you to select a random location from among all locations where a read maps equally well (ambig=random
option). I can't vouch for statistical validity of that approach bit it seems logical if you don't want to throw away multi-mapping data.Yeah, Salmon won't work properly I don't think unless you keep multimappers in.
And I just wouldn't use stringtie for quantification. But if I were to, I'd exclude multimapping reads. I'm also not sure its possible to use DESeq2 on the results of StringTie. I think it outputs TPMs, rather than counts.
thanks for your replies. Stringtie outputs FPKM and I actually used it with DESeq2 as log(FPKM+1). For salmon output should I use tximportData to merge all counts (not TMPs?). Sorry I know it is another question.
Yes, follow the
tximport
vignette for salmon. It handles it quite painlessly.You can't use log(FPKM+1) in DESeq2, it has to be read counts (it will run, but the results it produces will not be valid). Read counts are the only things that follow the negative binomial distribution used by DESeq2. log(FPKM+1) will instead follow something closer to a log-normal distribution.