RNA-Seq, HTSeq-Counts, counts are ~1M less compared to the reference research paper

0

Entering edit mode

3.9 years ago

gdm01 • 0

Hello everyone, I have a question related to RNA-Seq and HTSeq-Counts. A very brief description:

There is a research paper that applies RNA-Seq methods to few samples, from 2016. For my project, I follow the same steps on the same samples they used.

No adapters, I run tophat2 with the reference genome, then samtools, then HTSeq count (htseq-count -r pos -f bam -s yes -m intersection-strict --stranded=no .......)

According to the paper, authors found read counts around ~25,000,000 . I find ~1,000,000 less counts than the paper, for each sample. If It is ~24M, I find ~23M.

Do you have any idea what can cause this? Thanks! ✌️

RNA-Seq rna-seq sequence htseq-count • 1.0k views

ADD COMMENT • link 3.9 years ago by gdm01 • 0

2

Entering edit mode

In my opinion that's to be expected. Unless you match every piece of software and version, including the same aligner, and the same commands, then yeah that's normal.

ADD REPLY • link 3.9 years ago by Mark ★ 1.6k

0

Entering edit mode

Thank you for your reply! Also, ~1M loss in ~20M sounds very minor to me, would you agree on this?

ADD REPLY • link 3.9 years ago by gdm01 • 0

1

Entering edit mode

Most NGS aligners are non-deterministic i.e. you may not get exactly identical alignments if you run an alignment more than once. That said there are aligners (e.g. bbmap) that will allow you to do deterministic alignments. Rather than focusing on number of reads check the final DE result and see if that is largely concordant (it may not be 100% identical).

ADD REPLY • link 3.9 years ago by GenoMax 147k

0

Entering edit mode

I would agree but the details are in the devil. Why and when were the reads excluded? Does it affect DE? do the DE counts match? etc etc etc

ADD REPLY • link 3.9 years ago by Mark ★ 1.6k

Login before adding your answer.