Coming from farm animal genomes, how do I deal with the large assemblies for mouse and human?
0
0
Entering edit mode
5.3 years ago
colin.kern ★ 1.1k

I mainly do research in farm animals, but am currently working on a comparative analysis that includes ENCODE data from mouse and human. I was expecting the genome files for these assemblies to be similar to the farm animal genomes, i.e. I expected the 3 GB human genome to be about a 3 GB fasta file. However it's 54 GB. Similarly, the mouse assembly is 12 GB. It seems like this is due to patches which add alternate sequences to the assembly. Is that right?

This is causing me to doubt a lot of what I'm doing currently. Will the same analysis pipeline I've been using for farm animals be suitable, or do I need to do something special to account for these patches? How similar are some of these alternate sequences? Will I need to deal with multi-mapped reads differently? Can I download genome assemblies without all these extra sequences, and how bad of an idea is that?

Also, a more technical question: Because of the size of the human assembly I've been having trouble getting bwa to index the genome in a reasonable amount of time. Is there somewhere I can download these index files?

alignment • 740 views
ADD COMMENT
0
Entering edit mode

3 GB human genome to be about a 3 GB fasta file. However it's 54 GB.

Where? Even with haplotypes etc that should not be the case. Top level genome file for human is about 1G compressed (from Ensembl).

Is there somewhere I can download these index files?

You can use Illumina's iGenomes site to download matched sequence, annotation and index bundles.

ADD REPLY
0
Entering edit mode

I downloaded them from Ensembl. Uncompressed it becomes 54 GB. Compression is very efficient because of so much repetition of a small alphabet.

ADD REPLY
3
Entering edit mode

Use the primary assembly, unless you have a need to worry about the patches etc.

$ du -sh Homo_sapiens.GRCh38.dna.primary_assembly.fa
3.0G    Homo_sapiens.GRCh38.dna.primary_assembly.fa
ADD REPLY

Login before adding your answer.

Traffic: 2662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6