Samtools indexing problem during bwa mem alignment
0
0
Entering edit mode
2.5 years ago
GlouGlou • 0

Hello,

I'm trying to align some resequensing data to a reference genome. I made a bash loop faur this. I used bwa mem and in the same script I want to build the samtool index and other stuff. Here is my script :

#!/usr/bin/env bash
#SBATCH --time=150:00:00
#SBATCH --cpus-per-task=26
#SBATCH --mem=150000

#SBATCH --output=alignment.log
#SBATCH --error=alignment.err

ASSEMBLY="/mnt/data/scaffold.fasta"

READ_DIR=/mnt/data/reads/filtered

for SAMPLE in T1 T2 T3 T4 T5 T6;
    do
    mkdir -p ${SAMPLE}
    bwa mem -R "@RG\tID:${SAMPLE}\tPU:x\tSM:${SAMPLE}\tPL:Illumina\tLB:x" -t 18 ${ASSEMBLY}  \
             <(zcat ${READ_DIR}/${SAMPLE}/${SAMPLE}.final_1.fastq.gz) \         
             <(zcat ${READ_DIR}/${SAMPLE}/${SAMPLE}.final_2.fastq.gz) | \
             samtools fixmate -@ 2 -m - - | \
             samtools sort -@ 4 -m 12G |  \
             samtools markdup -@ 2 - ${SAMPLE}/${SAMPLE}.mkdup.bam 

    samtools index ${SAMPLE}/${SAMPLE}.mkdup.bam
        mosdepth -t 4 ${SAMPLE}/${SAMPLE}.mkdup ${SAMPLE}/${SAMPLE}.mkdup.bam
    done


But it seems to be a problem with the indexing. Here is the bottom of my error log :

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[E::main_mem] fail to open file ` '.
/var/lib/slurm-llnl/slurmd/job417492/slurm_script: line 18: /dev/fd/62: Permission denied
[bam_mating_core] ERROR: Couldn't read header
samtools sort: failed to read header from "-"
[markdup] error reading header
samtools index: "T1/T1.mkdup.bam" is in a format that cannot be usefully indexed

Thank you for your help.

EDIT: sorry the space between 18 and assembly was here but disapear when I copy paste here. And thanks for fixing the post.

Alignment bwa samtools • 2.2k views
ADD COMMENT
1
Entering edit mode
   <(zcat ${READ_DIR}/${SAMPLE}/${SAMPLE}.final_1.fastq.gz)

it's useless, bwa mem can read gz files.

ADD REPLY
0
Entering edit mode
for SAMPLE in 1 2 3 4 5 6;

you should use a workflow manager like snakemake or nextflow...

ADD REPLY
0
Entering edit mode
-t 18${ASSEMBLY} 

space missng between 18 and ASSEMBLY

ADD REPLY
0
Entering edit mode
ASSEMBLY=".../scaffold.fasta"

as far as I know, three dots '...' doesn't mean anything in linux.

ADD REPLY
0
Entering edit mode

I would use absolute paths to define these variables. This relative definition is error-prone. In general, make a small subset of the data, say the first 400000 lines of the fastq file pairs and test that locally, and once it works go big on the cluster.

ADD REPLY
0
Entering edit mode

[E::main_mem] fail to open file '.`

This means there is a hickup in parsing the files. Remove the zcat's, try to simplify the script, test with a subset first.

ADD REPLY
0
Entering edit mode

I tried but nothing change, same error and I don't know the cause.

ADD REPLY
0
Entering edit mode

before the line ASSEMBLY=, add

set -o pipefail
set -x
set -e
set -u

and show us the output of stdout and stderr

ADD REPLY
0
Entering edit mode
  • set -e
  • set -u
  • ASSEMBLY=/mnt/data/scaffold.fasta
  • READ_DIR=/mnt/data/reads/filtered
  • for SAMPLE in T1 T2 T3 T4 T5 T6
  • mkdir -p T1
  • bwa mem -R '@RG\tID:T1\tPU:x\tSM:T1\tPL:Illumina\tLB:x' -t 18 /mnt/data/scaffold.fasta /dev/fd/63 ' ' ++ zcat /mnt/data/reads/filtered/T1/T1.final_1.fastq.gz [M::bwa_idx_load_from_disk] read 0 ALT contigs [E::main_mem] fail to open file ` '.

Thank you

ADD REPLY
0
Entering edit mode

Without zcat :

set -e
set -u
ASSEMBLY=/mnt/data/scaffold.fasta
READ_DIR=/mnt/data/reads/filtered
for SAMPLE in T1 T2 T3 T4 T5 T6
   mkdir -p T1
   bwa mem -R '@RG\tID:1_19\tPU:x\tSM:T1\tPL:Illumina\tLB:x' -t 18 /mnt/data/scaffold.fasta samtools fixmate -@ 2 -m - -
samtools sort -@ 4 -m 12G
samtools markdup -@ 2 - T1/T1.mkdup.bam
mem: invalid option -- '@'
samtools sort: failed to read header from "-"
[markdup] error reading header
ADD REPLY
0
Entering edit mode

That misses do and done and the pipe operator.

ADD REPLY
0
Entering edit mode

If these pipe-like constructs are above your current bash-fu then simply make it serial, nothing wrong with that. Save each output to a file and read it in the next operation, after all what counts is that you get the job done. Files that are produced on the way can be deleted if they take up space.

ADD REPLY
0
Entering edit mode

I'm sorry I'm quite new in bioinf can you elaborate how to do that ? And what do you mean by "If these pipe-like constructs are above your current bash-fu" ?

Thank you

ADD REPLY

Login before adding your answer.

Traffic: 1091 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6