Entering edit mode
2.5 years ago
GlouGlou
•
0
Hello,
I'm trying to align some resequensing data to a reference genome. I made a bash loop faur this. I used bwa mem and in the same script I want to build the samtool index and other stuff. Here is my script :
#!/usr/bin/env bash
#SBATCH --time=150:00:00
#SBATCH --cpus-per-task=26
#SBATCH --mem=150000
#SBATCH --output=alignment.log
#SBATCH --error=alignment.err
ASSEMBLY="/mnt/data/scaffold.fasta"
READ_DIR=/mnt/data/reads/filtered
for SAMPLE in T1 T2 T3 T4 T5 T6;
do
mkdir -p ${SAMPLE}
bwa mem -R "@RG\tID:${SAMPLE}\tPU:x\tSM:${SAMPLE}\tPL:Illumina\tLB:x" -t 18 ${ASSEMBLY} \
<(zcat ${READ_DIR}/${SAMPLE}/${SAMPLE}.final_1.fastq.gz) \
<(zcat ${READ_DIR}/${SAMPLE}/${SAMPLE}.final_2.fastq.gz) | \
samtools fixmate -@ 2 -m - - | \
samtools sort -@ 4 -m 12G | \
samtools markdup -@ 2 - ${SAMPLE}/${SAMPLE}.mkdup.bam
samtools index ${SAMPLE}/${SAMPLE}.mkdup.bam
mosdepth -t 4 ${SAMPLE}/${SAMPLE}.mkdup ${SAMPLE}/${SAMPLE}.mkdup.bam
done
But it seems to be a problem with the indexing. Here is the bottom of my error log :
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[E::main_mem] fail to open file ` '.
/var/lib/slurm-llnl/slurmd/job417492/slurm_script: line 18: /dev/fd/62: Permission denied
[bam_mating_core] ERROR: Couldn't read header
samtools sort: failed to read header from "-"
[markdup] error reading header
samtools index: "T1/T1.mkdup.bam" is in a format that cannot be usefully indexed
Thank you for your help.
EDIT: sorry the space between 18 and assembly was here but disapear when I copy paste here. And thanks for fixing the post.
it's useless, bwa mem can read gz files.
you should use a workflow manager like snakemake or nextflow...
space missng between 18 and ASSEMBLY
as far as I know, three dots '...' doesn't mean anything in linux.
I would use absolute paths to define these variables. This relative definition is error-prone. In general, make a small subset of the data, say the first 400000 lines of the fastq file pairs and test that locally, and once it works go big on the cluster.
[E::main_mem] fail to open file
'.`This means there is a hickup in parsing the files. Remove the
zcat
's, try to simplify the script, test with a subset first.I tried but nothing change, same error and I don't know the cause.
before the line ASSEMBLY=, add
and show us the output of stdout and stderr
Thank you
Without zcat :
That misses do and done and the pipe operator.
If these pipe-like constructs are above your current bash-fu then simply make it serial, nothing wrong with that. Save each output to a file and read it in the next operation, after all what counts is that you get the job done. Files that are produced on the way can be deleted if they take up space.
I'm sorry I'm quite new in bioinf can you elaborate how to do that ? And what do you mean by "If these pipe-like constructs are above your current bash-fu" ?
Thank you