RNA-Seq, HTSeq-Counts, counts are ~1M less compared to the reference research paper
0
0
Entering edit mode
3.9 years ago
gdm01 • 0

Hello everyone, I have a question related to RNA-Seq and HTSeq-Counts. A very brief description:

There is a research paper that applies RNA-Seq methods to few samples, from 2016. For my project, I follow the same steps on the same samples they used.

No adapters, I run tophat2 with the reference genome, then samtools, then HTSeq count (htseq-count -r pos -f bam -s yes -m intersection-strict --stranded=no .......)

According to the paper, authors found read counts around ~25,000,000 . I find ~1,000,000 less counts than the paper, for each sample. If It is ~24M, I find ~23M.

Do you have any idea what can cause this? Thanks! ✌️

RNA-Seq rna-seq sequence htseq-count • 1.0k views
ADD COMMENT
2
Entering edit mode

In my opinion that's to be expected. Unless you match every piece of software and version, including the same aligner, and the same commands, then yeah that's normal.

ADD REPLY
0
Entering edit mode

Thank you for your reply! Also, ~1M loss in ~20M sounds very minor to me, would you agree on this?

ADD REPLY
1
Entering edit mode

Most NGS aligners are non-deterministic i.e. you may not get exactly identical alignments if you run an alignment more than once. That said there are aligners (e.g. bbmap) that will allow you to do deterministic alignments. Rather than focusing on number of reads check the final DE result and see if that is largely concordant (it may not be 100% identical).

ADD REPLY
0
Entering edit mode

I would agree but the details are in the devil. Why and when were the reads excluded? Does it affect DE? do the DE counts match? etc etc etc

ADD REPLY

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6