New gencode gtf version for analysis
0
0
Entering edit mode
6.6 years ago
1769mkc ★ 1.2k

I been using gencode 21 as my annotation file ,but the last update is gencode v28 , yes the new version comes with more accuracy and improvements would it going to make a drastic change into my results if i use the new version of annotation ?

RNA-Seq • 1.7k views
ADD COMMENT
0
Entering edit mode

Depends on your end goal and what the ENCODE consortium may have included / excluded since the v21 release. Are you interested in just protein coding mRNA or some obscure non-coding RNA or other transcript, like non-mediated decay, processed or unprocessed pseudogenes, etc?

Be also careful about the genome build to which each annotation relates. They stopped producing GRCh37 files and now only provide them as lift-overs from GRCh38 (last time that I checked).

ADD REPLY
0
Entering edit mode

Im using GRCh38 as my build, as of now im mostly interested in protein coding genes ,but yes may be lon-non coding but i did a rough comparison between v21 and v28 i see the number genes in v28 is 58381 where as for v21 it's 60155 similarly the protein coding genes in v21 is 19881 and for v28 its 19901 , so would these make a lots of difference in my final analysis ?

ADD REPLY
0
Entering edit mode

You can be the best judge of that - what are the extra genes?; are they important for your experiment? They may just be previously assumed 'hypothetical proteins' (like those genes with LOC prefixes) for which evidence supporting their true protein coding potential has now become available. The genome annotation is in constant refinement based on new evidence.

Generally, so long as you clearly state your version, it should not matter that much in terms of publication. Many are still using GRCh37 / hg19, for example, and some are even using hg18. Then again, in transcriptome profiling studies, one has groups that just filter in protein coding genes and ignores everything else, whilst others may only focus on snoRNAs.

ADD REPLY
0
Entering edit mode

"Generally, so long as you clearly state your version" yeah exactly this was my concern and yes for my own sake i do look into HUGO to map protein coding genes to filter things which are not needed ...

"what are the extra genes?; are they important for your experiment? They may just be previously assumed 'hypothetical proteins' (like those genes with LOC prefixes) " I was looking that readme stats in the gencode site what changed so i just gave you the number

ADD REPLY

Login before adding your answer.

Traffic: 1614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6