help with calling invariant sites in parallel: scaffolds not present in all bam files are skipped
0
0
Entering edit mode
3.9 years ago

Hello,

I'm using bcftools to call invariant sites in parallel. It appears that only scaffolds that are present across all bam files are included, while any scaffold missing in an individual in the dataset is omitted. What can I do to ensure all scaffolds are included in the invariant site calling? Thanks for your help-I am using the following tutorial: https://www.danielecook.com/using-gnu-parallel-for-bioinformatics/

Here's my code:

split chromosomes from bam files

function bam_chromosomes { samtools idxstats $1 | cut -f 1 | grep -v '*' }

genome=/path_to_reference export genome

apply operations to each scaffold individually in parallel

function parallel_call { bcftools mpileup --fasta-ref ${genome} --regions $2 --output-type u --bam-list $1 | \ bcftools call --multiallelic-caller --variants-only --output-type v - > ${1/.bam/}.$2.vcf }

export -f parallel_call

chrom_set=bam_chromosomes test.bam

parallel --verbose -j 25 parallel_call $1 ::: ${chrom_set}

invariant bam bcftools scaffold • 1.1k views
ADD COMMENT
0
Entering edit mode

while parallel is a great tool, you'd better use a workflow manager like nextflow or snakemake.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6