FeatureCounts (Ensembl based GTF v.s In-built Entrez GTF) variances in count data
0
0
Entering edit mode
4.2 years ago

Hi everyone

Firstly thank you in advance for any help you can give, I am new to bioinformatics and biostars has been immensely helpful. I have human RNA-seq data that I am currently processing, I've gone through my trimming and aligning (with STAR) stages and have just used featureCounts to counts in my data.

I have tried two different methods for featureCounts both worked but varied in their count data. Firstly I used the HG38 GTF from ensembl and secondly I used the built in HG38 GTF from the RSubread package, (entrez gene)...

Both were successful but I compared corresponding genes between ensembl and entrez gene and the count data was quite different - Total number of reads also differed from 36165730 to 38752850 respectively.

Why would the total number of counts be higher in the case of entrez gene? - Seems strange considering ensemble is larger in scope.

I understand that ensembl and entrez do not completely align but the differences seemed quite dramatic, Is this normal? and if I use the entrez values is this okay considering I aligned my data using an ensmbl GTF.

featureCounts RNA-Seq sequencing • 1.6k views
ADD COMMENT
1
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6