is there any difference between size of RefGenome in UCSC and Ensembl?
1
0
Entering edit mode
4.7 years ago
star ▴ 350

I have downloaded the Human Refrence Genome from Ensembl and UCSC like below:

UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

Ensembl: ftp://ftp.ensembl.org/pub/release-99/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

and when I make the uncompressed using gunzip, I get a different size like:

                      Zip       Unzip
 UCSC                938 M      3.1 GB
Ensembl             1.0 GB      62 GB

I would like to know is there anything wrong? Because I would like to make an index for aligning and if I use UCSC it takes ~ 1 hour while Ensembl takes 11 hours.

ensembl alignment assembly genome_build UCSC • 1.1k views
ADD COMMENT
0
Entering edit mode
4.7 years ago
GenoMax 148k

toplevel assembly file you have above includes

all sequence regions flagged as toplevel in an Ensembl schema. This includes chromsomes, regions not assembled into chromosomes and N padded haplotype/patch regions.

This file is 60G. See this README file for more details.

Normally primary assembly is sufficient for most analyses. This is included in Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz. This should be more or less equivalent to UCSC download you have.

ADD COMMENT

Login before adding your answer.

Traffic: 2095 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6