Bwa Index On All Human Grch37 Sequences
2
5
Entering edit mode
13.9 years ago

Hi,

Is there any reason why "bwa index -a bwtsw" would fail on all the current toplevel sequences in the human GRCh37 assembly, including the patches?

wget ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.61.dna.toplevel.fa.gz

~/bwa-0.5.9rc1/bwa index -a bwtsw Homo_sapiens.GRCh37.61.dna.toplevel.fa.gz

After a good while, it fails with: BWTIncConstructFromPacked: Cannot determine file length!

Cheers

bwa human • 16k views
ADD COMMENT
5
Entering edit mode

does bwa work with a gzipped file ??

ADD REPLY
5
Entering edit mode

@Pierre: yes, it does.

ADD REPLY
2
Entering edit mode

By the way, following lh3 recomendatations (in this BioStar question)

I will use the 1kgenomes reference (and it is already indexed), that does not have the haplotype regions (prevent some variation calls)

wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz
wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.fai
ADD REPLY
1
Entering edit mode

No, you should NOT include allelic sequences. You lose calls rather than gain them. You need special treatment to handle MHC and the chr17 inversion.

ADD REPLY
5
Entering edit mode
13.9 years ago
lh3 33k

Bwa accepts gzipped fasta/fastq files in indexing and alignment. It automatically tells compression and whether the input is fasta or fastq. The problem is most likely to be caused by the use of the "toplevel" fasta, which is longer than 4GB concatenated. Please use the reference genome from the 1000 genomes project.

ADD COMMENT
2
Entering edit mode
13.5 years ago

@Albert, I don't know if this would be a problem with bwa, but we aware that ensembl fasta files had two Y chromosome fasta entries with the same name 'Y':

$  grep '^>Y' Homo_sapiens.GRCh37.62.dna.toplevel.fa
 >Y dna:chromosome chromosome:GRCh37:Y:1:10000:1
 >Y dna:chromosome chromosome:GRCh37:Y:2649521:59034049:1

And many programs don't like that at all.

ADD COMMENT

Login before adding your answer.

Traffic: 2024 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6