Hello all,
I am new with programming within bioinformatics and long story short, I'm practicing writing pipeline scripts starting with the fastq to VCF pipeline. I am basically at the point where I went from fastq to sorted-bam files, and as I went to check on the status of the files using IGV, no visual mapping showed up for the bam files and am not sure why. I have 2 fastq's (because they are paired end) per sample (one tumor and one normal), but I only used one of the two fastq's for my own simplicity. Is this the issue on why it hasn't show up on IGV? I'm guessing I should have used both fastq's when using bwa mem to create the SAM file. Below is my code (yes I know, plenty of if/then statements because I'm still new to programming), let me know if anyone has a potential answer to this.
Thanks!
#!/bin/bash
chmod +x pipeline.sh
#state reference FASTA and FASTQ's
refpath="/mnt/c/Users/RK/working/ref/human_g1k_v37.fasta"
NFASTQ="/mnt/c/Users/RK/working/FASTQ/JS16-11705.fastqN.gz"
TFASTQ="/mnt/c/Users/RK/working/FASTQ/JS16-11705.fastqT.gz"
#index reference
if [ ! -f /mnt/c/Users/RK/working/ref/human_g1k_v37.fasta.pac ]; then
bwa index $refpath -a bwtsw; else
echo "Reference file already indexed."
fi
#fastq to SAM
if [ ! -f /mnt/c/Users/RK/working/FASTQ/bwamem_alignN.sam ]; then
bwa mem $refpath $NFASTQ > /mnt/c/Users/RK/working/FASTQ/bwamem_alignN.sam; else
echo "BWA alignment already completed for normal."
fi
if [ ! -f /mnt/c/Users/RK/working/FASTQ/bwamem_alignT.sam ]; then
bwa mem $refpath $TFASTQ > /mnt/c/Users/RK/working/FASTQ/bwamem_alignT.sam; else
echo "BWA alignment already completed for tumor."
fi
#SAM to BAM
if [ ! -f /mnt/c/Users/RK/working/FASTQ/bwamem_alignN.bam ]; then
samtools view -bS /mnt/c/Users/RK/working/FASTQ/bwamem_alignN.sam -o /mnt/c/Users/RK/working/FASTQ/bwamem_alignN.bam; else
echo "SAM to BAM already completed for normal."
fi
if [ ! -f /mnt/c/Users/RK/working/FASTQ/bwamem_alignT.bam ]; then
samtools view -bS /mnt/c/Users/RK/working/FASTQ/bwamem_alignT.sam -o /mnt/c/Users/RK/working/FASTQ/bwamem_alignT.bam; else
echo "SAM to BAM already completed for tumor."
fi
#sort and index bam
if [ -f /mnt/c/Users/RK/working/FASTQ/bwamem_alignN.bam ] && [ -f /mnt/c/Users/RK/working/FASTQ/bwamem_alignT.bam ]; then
if [ ! -f /mnt/c/Users/RK/working/FASTQ/bwamem_alignN.sorted.bam ]; then
samtools sort /mnt/c/Users/RK/working/FASTQ/bwamem_alignN.bam -o /mnt/c/Users/RK/working/FASTQ/bwamem_alignN.sorted.bam
samtools index /mnt/c/Users/RK/working/FASTQ/bwamem_alignN.sorted.bam; else
echo "Already indexed."
fi
if [ ! -f /mnt/c/Users/RK/working/FASTQ/bwamem_alignT.sorted.bam ]; then
samtools sort /mnt/c/Users/RK/working/FASTQ/bwamem_alignT.bam -o /mnt/c/Users/RK/working/FASTQ/bwamem_alignT.sorted.bam
samtools index /mnt/c/Users/RK/working/FASTQ/bwamem_alignT.sorted.bam; else
echo "Already indexed."
fi
else
echo "Error occurred, check to see if BAM file is present."
fi
you're re-inventing the wheel. What you wrote is a Makefile https://www.gnu.org/software/make/
see Generating shell scripts
It is better that you keep track of each step in an individual file otherwise file being existent does not necessarily mean that it is complete.
I would suggest you learn nextflow or snakemake to get yourself a more complete solution.