Consensus sequence calling with normalisation of indels
1
0
Entering edit mode
3.0 years ago

I'm following the workflow suggested by samtools here to produce a fasta with the consensus sequence.

https://samtools.github.io/bcftools/howtos/consensus-sequence.html

The workflow goes like this:

# call variants
bcftools mpileup -Ou -f reference.fa alignments.bam | bcftools call -mv -Oz -o calls.vcf.gz
bcftools index calls.vcf.gz

# normalize indels
bcftools norm -f reference.fa calls.vcf.gz -Ob -o calls.norm.bcf

# filter adjacent indels within 5bp
bcftools filter --IndelGap 5 calls.norm.bcf -Ob -o calls.norm.flt-indels.bcf

# apply variants to create consensus sequence
cat reference.fa | bcftools consensus calls.vcf.gz > consensus.fa

It's not clear to me if/where/how the calls.norm.flt-indels.bcf file I've made is used by the final bcftools consensus function??

samtools bcftools consensus • 1.8k views
ADD COMMENT
0
Entering edit mode

Hi, did you find out the answer to this?

I am also unclear what this output is for

ADD REPLY
0
Entering edit mode
23 months ago
barslmn ★ 2.3k

It seems like a mishap on the documentation part. The smaller example part above says "This assumes we have already made the calls, normalized indels and filtered".

So your command should be:

cat reference.fa | bcftools consensus calls.norm.flt-indels.bcf > consensus.fa

To make sure you can create an issue and ask to author directly here.

ADD COMMENT
0
Entering edit mode

Agreed, it simply looks like a missing step talking about variant filtering.

Note since this workflow was written, we also added samtools consensus which may or may not be a better fit. It's a totally different beast as the purpose of this is to produce a (potentially heterozygous) consensus given a BAM file and is aimed more at refinining assembly consensus post realignment, as a means to generate a reference-synchronised fasta for CRAM embedded references, or as a fast alternative to things like ivar consensus. However that said clearly there are similarities between the two pipelines. The samtools method is far simpler, faster, doesn't rely on first having the reference to start from (which someone knew in order to do the original alignments, but this may be not the person with the BAM/CRAM), and doesn't have the frankly bewildering calling conventions of bcftools consensus. However it's also much more rigid and lacks the huge flexibility of bcftools or the ability to do filtering first. It's also considerably less robust around indels as it has no indel-realignment step, so be wary of that around long-read technologies.

Anyway, I mention it as it may be a viable alternative to some users.

ADD REPLY

Login before adding your answer.

Traffic: 2161 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6