Hello,
I'm using bcftools to call invariant sites in parallel. It appears that only scaffolds that are present across all bam files are included, while any scaffold missing in an individual in the dataset is omitted. What can I do to ensure all scaffolds are included in the invariant site calling? Thanks for your help-I am using the following tutorial: https://www.danielecook.com/using-gnu-parallel-for-bioinformatics/
Here's my code:
split chromosomes from bam files
function bam_chromosomes { samtools idxstats $1 | cut -f 1 | grep -v '*' }
genome=/path_to_reference export genome
apply operations to each scaffold individually in parallel
function parallel_call { bcftools mpileup --fasta-ref ${genome} --regions $2 --output-type u --bam-list $1 | \ bcftools call --multiallelic-caller --variants-only --output-type v - > ${1/.bam/}.$2.vcf }
export -f parallel_call
chrom_set=bam_chromosomes test.bam
parallel --verbose -j 25 parallel_call $1 ::: ${chrom_set}
while parallel is a great tool, you'd better use a workflow manager like nextflow or snakemake.
see also : Generating shell scripts