Guys,
I have read several manuals and tutorials for RNA-Seq analysis by DESeq2. I found some tutorials instruct to sort the BAM (pair-end) by name. But some just ignore it. So, does the BAM (paired end) generated by TopHat2 need to be sorted by name?
I think they should but why the manual of DESeq2 or workflow completely ignore this part?
Thank you.
-X
What about using GenomicAlignments to create count matrix using "summarizeOVerlaps" instead of using ht-seq? does it need to be sorted?
As I know that GenomicAlignment kind of implementing the similar algorithm as ht-seq, but there is no similar
-r
option. Since the default option for ht-seq is sorted by name, for the safe side I sort the BAM by name before feed them to "summarizeOverlaps". Is this a general practice or it is not necessary to sort BAM?Hi Wei, Sorry I have never used summarizeOverlaps for generating count data so don't know much about it. But as Ian mentioned in his comment that htseq-count may get confused if it doesn't find both partners, so I would guess that sorting by name won't hurt in case of paired-end data except it may add to the running time of the pipeline.
Just to add some experience to AP's answer: If the BAM files input to htseq-count are not sorted by name it gets sometimes gets confused and cannot find both partners of a pair. Of course there are different ways of counting reads into genes, so the sorting method may differ.