Hi,
Is there any reason why "bwa index -a bwtsw" would fail on all the current toplevel sequences in the human GRCh37 assembly, including the patches?
wget ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.61.dna.toplevel.fa.gz
~/bwa-0.5.9rc1/bwa index -a bwtsw Homo_sapiens.GRCh37.61.dna.toplevel.fa.gz
After a good while, it fails with: BWTIncConstructFromPacked: Cannot determine file length!
Cheers
does bwa work with a gzipped file ??
@Pierre: yes, it does.
By the way, following lh3 recomendatations (in this BioStar question)
I will use the 1kgenomes reference (and it is already indexed), that does not have the haplotype regions (prevent some variation calls)
No, you should NOT include allelic sequences. You lose calls rather than gain them. You need special treatment to handle MHC and the chr17 inversion.