Why doesn't wgEncodeGencodeBasicV24lift37.txt.gz have annotation information of some genes?
4
0
Entering edit mode
8.2 years ago
Apprentice ▴ 170

Hi.

I downloaded wgEncodeGencodeBasicV24lift37.txt.gz from ftp://hgdownload.cse.ucsc.edu/apache/htdocs/goldenPath/hg19/database/ to get latest GENCODE annotations for UCSC hg19. I checked the file, and I noticed that this file might not have annotation information of some genes, which can be found by using UCSC genome browser.

For example, DDX11L1 gene doesn't exist in the file, but I could find DDX11L1 gene in Pseudogene Annotation Set from GENCODE Version 24lift37 by searching UCSC genome browser hg19 as below. https://genome.ucsc.edu/cgi-bin/hgTracks?hgtgroup_map_close=1&hgtgroup_genes_close=0&hgtgroup_phenDis_close=0&hgtgroup_rna_close=0&hgtgroup_expression_close=0&hgtgroup_regulation_close=0&hgtgroup_compGeno_close=0&hgtgroup_neandertal_close=1&hgtgroup_denisova_close=1&hgtgroup_varRep_close=0&hgtgroup_rep_close=0&hgsid=510548867_U3S67edvjqOSeosFFsl3TinZRqia&position=DDX11L1&hgt.positionInput=DDX11L1&hgt.jump=go&hgt.suggestTrack=knownGene&db=hg19&c=chr19&l=7726484&r=7740594&pix=800&dinkL=2.0&dinkR=2.0

Do you know why the file doesn't include enough annotation information?

genome gene • 2.7k views
ADD COMMENT
3
Entering edit mode
8.2 years ago
Marge ▴ 320

Hi,

Looking at the name of the file you mention, you are probably not finding back all the annotations because it's the 'GENCODE Basic Set'. The explanation of why that specific annotation set is not comprehensive is explained in the Methods section of UCSC GENCODE track description (https://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=510652103_Kn0LAoKEHvvhU1JyhFn04fNZ21AM&c=chr1&g=wgEncodeGencodeV24lift37):

"The GENCODE Basic Set is intended to provide a simplified subset of the GENCODE transcript annotations that will be useful to the majority of users. The goal was to have a high-quality basic set that also covered all loci. Selection of GENCODE annotations for inclusion in the basic set was determined independently for the coding and non-coding transcripts at each gene locus."

I hope this helps.

Cheers, Marge

ADD COMMENT
1
Entering edit mode
8.2 years ago
Denise CS ★ 5.2k

What does 'wgEncodeGencodeBasicV24lift37.txt.gz' refer to? Would it be GENCODE v24 lifted from GRCh38 back to GRCh37? How bizarre! There is where the problem seems to be. If you use 'wgEncodeGencodeBasicV19.txt.gz', you will find DDX11L1.

DDX11L1 (ENSG00000223972) can be found in the Ensembl GTF file and only one transcript ENST00000456328 is tagged as GENCODE basic. The remaining transcripts of this locus i.e. ENST00000515242, ENST00000518655 and ENST00000450305 are not selected for the basic set, only the comprehensive one. This can be seen in the table of the Gene page in Ensembl[2]. If you search for ENST00000456328 in 'wgEncodeGencodeBasicV19.txt.gz', you will see it there. But the Ensembl Gene ID does not seem to be listed in the 'wgEncodeGencodeBasicV19.txt.gz' file.

ADD COMMENT
0
Entering edit mode
8.2 years ago
Apprentice ▴ 170

Thank you for your advices. I could understand GENCODE basic Set !

ADD COMMENT
0
Entering edit mode
8.2 years ago
rsuarez • 0

Thanks this information is so usefull for me.

Ricardo Suarez Caballero Director Formativo en IIEMD.com - Marketing Digital

ADD COMMENT

Login before adding your answer.

Traffic: 2540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6