Question

Ending up with only one BAM file

0

Entering edit mode

2.5 years ago

salman_96 ▴ 70

Hi,

I am working on a variant calling pipeline. Currently, I am trying to generate BAM files from raw fastq files.

The files are paired end end seq files and named like this (there are more than 20 files)

AB03_T1_R1_001_trimmed.fastq AB03_T1_R2_001_trimmed.fastq
AB05_T1_2D_R1_001_trimmed.fastq AB05_T1_2D_R2_001_trimmed.fastq
AB09T1_2D_R1_001_trimmed.fastq AB09T1_2D_R2_001_trimmed.fastq

The code is:

cd /data/Data/Results
genome=/data/Data/Genome/GRCh38.primary_assembly.genome.fa

for fq1 in /data/Data/*_R1_001_trimmed.fastq

 do  
    echo "working with file $fq1"
    base=$(basename $fq1 _R1_001_trimmed.fastq)
    echo "basename is $base"

    fq1=/data/Data/${base}_R1_001_trimmed.fastq
    fq2=/data/Data/${base}_R2_001_trimmed.fastq
    STAR \
      --genomeDir /dataData/Genome \
      --runThreadN 30 \
      --readFilesIn ${fq1} ${fq2} \
      --outSAMtype BAM SortedByCoordinate \
      --outFileNamePrefix STARpaired
done

The resulting file I am getting is only one .bam file named STARpairedAligned.sortedByCoord.out.bam

Can anyone assist why do I get only one file and how to fix this problem as I need individual files for each sample as I want to run Variant call on individual samples?

Regards,
Salman

bam variant-calling star • 930 views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 2.5 years ago by salman_96 ▴ 70

0

Entering edit mode

for fq1 in /data/Data/*_R1_001_trimmed.fastq

you should use a workflow manager like nextflow or snakemake

ADD REPLY • link 2.5 years ago by Pierre Lindenbaum 165k

0

Entering edit mode

You should get used right away to never work with uncompressed files. It just doesn't scale and takes up disk space which can be limited on shared clusters with quotas. Gzip your fastq files, most trimmers offer that. Same with the BAM files from STAR, use --outBAMcompression 5 for the STAR command to get BAM compression, otherwise the BAM files take up just hilariously much space. STAR is so bad at providing good defaults for simple things like that, but it is great at mapping and that is what counts most.

ADD REPLY • link 2.5 years ago by ATpoint 86k

score 2 · Answer 1 · 2022-08-17

2

Entering edit mode

2.5 years ago

Pierre Lindenbaum 165k

you're always using the very same output file whatever is the input in the loop.

--outFileNamePrefix STARpaired

use something like

 --outFileNamePrefix "${base}STARpaired"

ADD COMMENT • link 2.5 years ago by Pierre Lindenbaum 165k