bwa mem: fail to locate the index files
2
0
Entering edit mode
5.3 years ago
little_more ▴ 70

Hi all,

I have a problem with running bwa in parallel. I'm using this script (as kindly suggested in this topic):

# reference
hg1kv37=/dir/to/reference/and/index/files/human_g1k_v37.fasta

# create a function 
bwatosam() {
  id="$1"

  bwa mem -t 10 -R "@RG\tID:$id\tSM:$id\tPL:ILLUMINA\tLB:$id_exome" -v 1 -M $hg1kv37 "$id"_R1_fq.gz "$id"_R2_fq.gz \
  | samtools view -Sb -o "$id".bam
}
export -f bwatosam

# --plus enables {%...} to remove a substring    
ls *_R1_fq.gz |
  parallel --plus bwatosam {%_R1_fq.gz}

And I keep getting the following error:

[E::bwa_idx_load_from_disk] fail to locate the index files

Although the contents of a /dir/to/reference/and/index/files/ are:

human_g1k_v37.fasta.bwt  
human_g1k_v37.dict
human_g1k_v37.fasta.fai  
human_g1k_v37.fasta
human_g1k_v37.fasta.amb    
human_g1k_v37.fasta.ann        
human_g1k_v37.fasta.idx.1.bt2
human_g1k_v37.fasta.idx.4.bt2
human_g1k_v37.fasta.idx.rev.1.bt2                  
human_g1k_v37.fasta.idx.rev.2.bt2      
human_g1k_v37.fasta.idx.2.bt2      
human_g1k_v37.fasta.idx.3.bt2      
human_g1k_v37.fasta.pac
human_g1k_v37.fasta.sa

The index files were created some time ago but they should be OK because if I run a loop over all files (using a list with their names):

for line in `cat ids_list.txt`
do
bwa mem \
-t 10 \
-M $hg1kv37 \
"$line"_R1_fq.gz \
"$line"_R2_fq.gz \
-v 1 -R '@RG\tID:$line\tSM:$line\tPL:ILLUMINA\tLB:$line_exome' \
-M | samtools view -Sb - > $line.bam
done

everything works. Any ideas?

bwa • 4.8k views
ADD COMMENT
0
Entering edit mode

try option -v 4 to get better debugging output.

ADD REPLY
2
Entering edit mode
5.3 years ago
ATpoint 85k

As far as I know parallel does not see variables defined outside of the function that it runs. I therefore always add outside variables like:

INDEX=/foo/bar/index.fa

function BWA {

  Basename=$1
  Index=$2

  bwa mem ${Index} ${Basename}R1.fastq.gz ${Basename}R2.fastq.gz > (...)

}; export -f BWA

ls *R1.fastq.gz | awk -F "R1" '{print $1 | "sort -u"}' | parallel "BWA {} ${INDEX}"
ADD COMMENT
0
Entering edit mode

and as specified in the previous answers, you should use a workflow manager : C: using GNU parallel for bwa mem and samtools

ADD REPLY
0
Entering edit mode

it works now, thank you very much!

ADD REPLY
1
Entering edit mode
5.3 years ago
ole.tange ★ 4.5k

GNU Parallel does not run its jobs in the same shell session as it is started. That means that non-exported variables and non-exported functions cannot be seen.

However, env_parallel can make a copy of the environment in the current session and pass that to GNU Parallel. By using env_parallel --session only new definitions will be given to GNU Parallel.

# start recording new environment names
env_parallel --session

# reference
hg1kv37=/dir/to/reference/and/index/files/human_g1k_v37.fasta

# create a function 
bwatosam() {
  id="$1"

  bwa mem -t 10 -R "@RG\tID:$id\tSM:$id\tPL:ILLUMINA\tLB:$id_exome" -v 1 -M $hg1kv37 "$id"_R1_fq.gz "$id"_R2_fq.gz |
    samtools view -Sb -o "$id".bam
}
# export is not needed due to --session
# export -f bwatosam

# --plus enables {%...} to remove a substring    
ls *_R1_fq.gz |
  env_parallel --plus bwatosam {%_R1_fq.gz}

# finish the session (not needed if this is the end of the script)
env_parallel --end-session
ADD COMMENT

Login before adding your answer.

Traffic: 1595 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6