Why is the hg38 exome so much bigger than hg19?
1
0
Entering edit mode
4.6 years ago
ej ▴ 70

Hi,

I downloaded the NCBI Refseq curated file of Genes and Gene Predictions from the UCSC Table Browser for hg38 as I want to use the exon coordinates as a target file for calling variants on Exome Sequencing data.

I noticed however, that the exon coordinates cover approximately double the genomic region as the exon coordinates in hg19 did (~80 million bps vs ~40 million). Is it possible that the size of the exome is really double in hg38?

I do not want to call variants on all of these regions since ~30% of these exonic regions are not covered at all in my WES data and another ~10% is covered by <10x. I would definitely like to exclude these regions from the target file but I do not fully understand what these regions are/why they were included in the first place.

Any help would be greatly appreciated.

hg38 refseq exome target • 1.7k views
ADD COMMENT
0
Entering edit mode

No, exons should not vary that much from freeze to freeze. But, more importantly, if this is really about exome coverage then use the bed file that came with your kit. If you are interested in coding variants then use CCDS.

ADD REPLY
0
Entering edit mode
4.6 years ago
vkkodali_ncbi ★ 3.8k

I am not entirely sure why you are seeing such a huge difference in exome sizes between hg38 and hg19. Could you please describe in a little more detail how you are computing these values?

As far as RefSeq data are concerned, I strongly recommend you to download the relevant files from RefSeq and not UCSC. The data displayed in the UCSC browser are processed by the folks at UCSC and don't necessarily match RefSeq data exactly.

hg19 or GRCh37

RefSeq no longer actively annotates hg19 though updates are released occasionally. For the latest annotation data, go to NCBI Assembly and search for GRCh37. In the result page, click on the 'Download' button in the result card and choose RefSeq as source and GFF3 as your file format to download the latest version of annotation (released in September 2019) in GFF3 format.

hg38 or GRCh38

To download annotation for hg38 or GRCh38, go to NCBI Assembly and search for GRCh38. In the result page, click on the first hit to go to this page and use the blue 'Download Assemblies' button to download the RefSeq GFF3 file.

You will notice that RefSeq annotation data are provided in other file formats such as GTF and FASTA by following the same steps mentioned above.

ADD COMMENT

Login before adding your answer.

Traffic: 1609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6