I have just finished running count commands using HTSeq-Count and am about to begin DGE analysis with DESeq2 using the vignette as a guide. However, I found this blog post online which indicates that there may be issues if you do not take into account the fact that Illumina data produces antisense reads.
Having read the blog post a few times, I'm still a little confused. It seems that there is now an argument to the summarizeOverlap()
function preprocess.reads=invertStrand
that allows you to take strandedness into account. Is this all I need to alter for my data? Or do I need to invert the strand using GenomicRanges? Outside of this, I did not take the strandedness into account with my alignment or counts so I am concerned that my raw count data may also be incorrect.
Here is what I ran for Tophat2 alignment:
tophat2 -G <GTF file> -p 4 -o <output folder> <index> <FASTQ file>
And here is what I ran for HTSeq-Count:
python -m HTSeq.scripts.count -s yes -a 10 <name sorted SAM> <GTF file> > <output>
As you can see, I did not take into account the library type for my tophat2 run. For HTSeq-Count I specified it was stranded but not to reverse it. So I really am not sure where this leaves me...
According the facility that did the sequencing, they used the TruSeq Stranded Total RNA kit. This experiment is single-read from mouse neural tissue (RIPSeq). Let me know if any more information is needed.
In summary, my questions:
- Is my count data okay as is?
- What do I need to do in DESeq2 to correct for this antisense issue?
Versions: Tophat 2.1.1 with Bowtie 2.2.9 and Samtools 0.1.19; HTSeq 0.6.1
By using the command
-s reverse
I got a lot more counts than what I originally had. The strandedness issue is very confusing to me but I think I figured it out after digging through old forum posts. Thanks for your help.Hi
I am new on this, and reading over the forums, I have understood that the -s option in Htseq depends on the protocol. So, if I have files defined as strand-specific in the protocol, for me the logic selection is: -s="yes". So, I am wondering why you insist on try -s="reverse" with htseq-count rather than -s="yes" for recent sequencing data... (and what means recent?..5,5, yrs)... So, I am in blank... I do not have clear why to use -s reverse... & I do not understand what do you mean with ... "...for historical reasons.."... Please any help will be appreciate!
Thanks in advance Cynthia SC