I have four total fastq files (two bulks, with forward and reverse reads for each) containing Illumina reads which I am trying to align to a reference file. The reference file has file extension ".pa" but when the script fails, the error that is returned shows that the algorithm is looking for and failing to find that same file, but with extension ".pac".
Looking around on this site, I can tell that .pac files are index files that BWA creates and uses internally. So then why would it lose track of, or fail to create, this file? What kinds of things should I do to ensure that my files are appropriate for this function call? I already ran them through fastqc and read quality looks fine.
This is being conducted on a remote server via SSH, so read/write permission issues are possible, but the error log should be documenting such things if so, and it is not. A colleague has performed alignment using the same reference genome kept at the same directory, with fewer permissions, and my script is modeled closely after his, so I can only think there must be some issue with my inputs.
Below is the function call with variable names slightly anonymized:
time bwa mem -t 8 /work/lab/reference_genomes/ref_v2.fa /scratch/user/qtlSeq/eReadsForward.fastq.gz /scratch/user/qtlSeq/eReadsReverse.fastq.gz > eAligned.sam
And the error:
[bns_restore_core] fail to open file '/work/lab/reference_genomes/ref_v2.fa.pac' : No such file or directory
General advice about how to interpret this error would be appreciated.
Did you create indexes for
ref_v2.fa
for use with BWA usingbwa index
? Can you show us output ofls -lh /work/lab/reference_genomes/ref_v2.fa*
?Ahhhh, I see now that my colleague's pipeline has some cart-before-horse going on. He created the indexes for the reference genome in advance, referring to the files where they are needed in the pipeline, but placed the creation of the indexes downstream from their first use on the assumption that they would be retained for future users. They must have been cleaned out by a clumsy mv command or suchlike. I will reproduce the index file and see if that resolves the issue.
RESULT:
Function call:
Error message:
This makes me suspect even more that I am just having permission issues with the host server.
FYI, /work/lab/reference_genomes contains the following files (notably, none with the extension .fa.pac):
The quick solution here looks like getting my colleague to reproduce the .pac file for me, and the long-term solution is to talk to the server admins and ensure I have adequate permissions in /work/lab/reference_genomes. Thanks very much for the quick responses. Additional feedback is still welcome.
It looks like someone mixed indexes for BLAST+ and BWA in the same directory. Not a great idea. Your account does not seem to have have write permissions to
/work/lab/reference_genomes/
?I would suggest that you create the indexes again in a directory where you have write permissions.
I've recently suffered a lot from indexing a customized human genome (mask certain regions). I found a similar issue here:
But I use
ls -lh
and confirmed the existence of the.bwt
file.For the issue here, lack of memory is not likely to be the reason since I already have 120 GB allocated to this shell(by PBS pro) and only one
bwa index
job is running. BTW, the command I carried out isbwa index -a bwtsw <in.fasta>
Furthermore, the /usr/bin/time gives memory profiling, and the peak RAM usage seems to be around 4596492 kb(4.4Gb) only.
Anyone knows what might be the reason causing this error?
Do you have read/write permissions to the following location?
Is that directory available to the worker node where this job is run (you are using PBS Pro so a job scheduler)?
This ended up being my issue. I was using docker and the docker user did not have permission to write to my mounted dir. Thanks.