Question

GRCH38 counts significantly lower than for version 37?

0

Entering edit mode

6.7 years ago

ab123 ▴ 50

Hi there,

Quick question:

are the gene annotations specific to the sequencing platform for RNA-Seq?

I'm looking at a dataset produced with Illumina HiSeq 2500. I've aligned it both with Human GRCH 38 and the older 37 version. So far so good...alignment files are the same sizes.

When I then count the reads I'm getting almost 60,000 for the gtf GRCH 37 but only 17,000 for gtf GRCH 38. The gtf files are both coming from ensembl.org.

I'm not sure what's wrong. Is there any explanation for this?

Many thanks!

RNA-Seq grch annotation ensembl • 1.4k views

ADD COMMENT • link 6.7 years ago by ab123 ▴ 50

0

Entering edit mode

alignment files are the same size

File size is useless when it's not an extreme value. It is an indicator of nothing, so you cannot predicate a "so far so good" statement on that.

ADD REPLY • link 6.7 years ago by Ram 45k

0

Entering edit mode

60,000 for the gtf GRCH 37 but only 17,000 for gtf GRCH 38

What do those numbers refer to? Genes in your GTF file? There is a reason major new genome builds are spaced a few years apart since they can include major refinements in information content.

ADD REPLY • link 6.7 years ago by GenoMax 153k

0

Entering edit mode

I am not entirely sure if version 38 really would only contain 17,000 genes? It refers to the final counts that are then used for diff. expr. analysis. Would version 38 contain significantly fewer genes?

ADD REPLY • link 6.7 years ago by ab123 ▴ 50

score 2 · Answer 1 · 2018-12-17

2

Entering edit mode

6.7 years ago

Emily 24k

Two possibilities:

You're getting counts of genes in GRCh38 and counts of transcripts in GRCh37.
You've used a protein-coding only reference file for GRCh38 and a complete genes reference file for GRCh37.

ADD COMMENT • link 6.7 years ago by Emily 24k