I am re-aligning the same .fastq files to both hg38 and CHM13 to compare the alignments. My hg38 alignment runs fine but CHM13 is getting this error:
[E::bwa_idx_load_from_disk] fail to locate the index files
Code:
/app/software/BWA/0.7.17-GCC-8.3.0/bin/bwa mem -t 4 -M -R @RG\tID:no_id\tLB:no_library\tPL:no_platform\tPU:no_unit\tSM:test /grp/reference/T2T-CHM13/RefSeq_v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.fna results/fastqs/test_fastq1.fq.gz results/fastqs/test_fastq2.fq.gz
We have tried reindexing with bwa index GCF_009914755.1_T2T-CHM13v2.0_genomic.fna
multiple times and received the same error. A colleague in a different group has reported the same issues with using BWA to align to CHM13.
I assume this is a formatting issue but I don't know what the incompatibility is. My understanding is that .fa and .fna are equivalent formats, and BWA builds a .fai from our .fna without throwing any errors. CHM13 is 3.0 GB total so the chromosomes should not be over the 2 GB limit. Any suggestions on how to trace the source of this issue?
Note: I've also submitted an issue on the BWA github page. Please let me know if there's a more appropriate forum for me to ask this!
bwa github is basically dead, maybe an ambitious user gives you an answer but don't wait for the developer.
Can you share the logs of the indexing and the output of
ls /grp/reference/T2T-CHM13/RefSeq_v2.0/
.The suffix of the file has no meaning, bwa will happily index a file named
football.123uga
if it is in proper fasta format, so that cannot be the issue. My best guess is that some memory shortage got the indexing job killed without that it got spotted while maybe some corrupted files have already been produced.That's about what I expected re: github.
I don't think we have any logs saved from the indexing but I can rerun it tonight. Here's the reference directory:
Edit: Accidentally posted as separate reply initially whoops
Just to be sure, in your comment the path is:
and in the toplevel bwa mem command it was:
...is it a typo that
/fh/fast/ha_g/
is missing?Fair question. I initially truncated the path in my post for privacy reasons but then was too lazy to remove it for the full
ls
copy-paste.We were originally running this all through our usual snakemake pipeline so path integrity is one issue we can disregard with a fair amount of confidence. Snakemake would throw an error for a file not existing before the alignment job was even submitted to our HPC.
Oh yes, a memory issue sounds plausible. I'm not the one who's been running the indexing so I'm not sure if they did it on our HPC or locally.
I'm making a fresh /dir/ and deleting the current .fai to eliminate possible silly mistakes. Will update with log files when I have them. Thanks AT :)
Update: I ran the indexing and didn't have any errors but it didn't output a new .fai file either. It did seemingly update/overwrite the other files though.
Maybe me deleting the old .fai file was a problem? I deleted it in case the issue was BWA not wanting to overwrite the existing .fai file (which was only 915 bytes so definitely not a full index file)
Here's
ls -lh
again:And here's the indexing log (I think all the version changes are for other modules in our pipeline but I'm including it in case it's relevant):
So has the error gone away after reindexing?
No, now I have no .fai file at all.
I am running indexing with the 2.0 CHM file. Will update here once the job is done.
Indexing completed. Here are the files produced
bwa mem
was able to use these index files to create a SAM file. No problem like one you report. Running VN:0.7.17-r1188. Just did a simplebwa mem GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz read1.fq.gz read2.fq.gz
for this test.Same here. Worked for me without problem. The fai is not a bwa index file, it is from faidyx (samtools) so you don’t need that.
Update: I ran
bwa mem
on my test data last night to make sure my problem was solved. Everything went smoothly.I mistakenly assumed that .fai was the index file for .fa (like .bam -> .bai) and was hoping that was the issue. The other files in the directory seem to be more or less unchanged, so I am still not sure what was wrong with the original indexing, or what specific error my other colleagues encountered. Maybe a clean dir and re-indexing was all we needed.
Thank you AT & Max for the advice & sanity checks :)