Entering edit mode
2.5 years ago
salman_96
▴
70
Hi,
I am working on a variant calling pipeline. Currently, I am trying to generate BAM files from raw fastq files.
The files are paired end end seq files and named like this (there are more than 20 files)
- AB03_T1_R1_001_trimmed.fastq AB03_T1_R2_001_trimmed.fastq
- AB05_T1_2D_R1_001_trimmed.fastq AB05_T1_2D_R2_001_trimmed.fastq
- AB09T1_2D_R1_001_trimmed.fastq AB09T1_2D_R2_001_trimmed.fastq
The code is:
cd /data/Data/Results
genome=/data/Data/Genome/GRCh38.primary_assembly.genome.fa
for fq1 in /data/Data/*_R1_001_trimmed.fastq
do
echo "working with file $fq1"
base=$(basename $fq1 _R1_001_trimmed.fastq)
echo "basename is $base"
fq1=/data/Data/${base}_R1_001_trimmed.fastq
fq2=/data/Data/${base}_R2_001_trimmed.fastq
STAR \
--genomeDir /dataData/Genome \
--runThreadN 30 \
--readFilesIn ${fq1} ${fq2} \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix STARpaired
done
The resulting file I am getting is only one .bam file named STARpairedAligned.sortedByCoord.out.bam
Can anyone assist why do I get only one file and how to fix this problem as I need individual files for each sample as I want to run Variant call on individual samples?
Regards,
Salman
you should use a workflow manager like nextflow or snakemake
You should get used right away to never work with uncompressed files. It just doesn't scale and takes up disk space which can be limited on shared clusters with quotas. Gzip your fastq files, most trimmers offer that. Same with the BAM files from STAR, use
--outBAMcompression 5
for the STAR command to get BAM compression, otherwise the BAM files take up just hilariously much space. STAR is so bad at providing good defaults for simple things like that, but it is great at mapping and that is what counts most.