I mapping fasta reads (200bp) to a short (~5kb) reference that has N's in it. Coverage is very low (1-4x), as this isn't NGS data, but rather a consensus calling project.
The procedure, in short, is as follows:
# index reference
smalt index reference.fasta reference.fasta
# map reads to reference, sort, save as bam
smalt map reference.fasta read.fasta | samtools sort - > sorted.bam
# make pileup, call consensus, and index
bcftools mpileup -Ou -f reference.fasta sorted.bam | bcftools call -P 9.9e-1 -mv -Oz -o calls.vcf.gz ; tabix calls.vcf.gz
# get consensus
cat reference.fasta | bcftools consensus calls.vcf.gz > consensus.fa
The issue I am having is that none of the N's are replaced, even when there is coverage. SNPs are called when applicable, but none of the N's are resolved. I am assuming that it does that because there is only a 1x coverage.
Here is a snippet of the result of samtools mpileup:
1_seq 5403 N 1 c ~
1_seq 5404 N 1 a ~
1_seq 5405 N 1 g ~
1_seq 5406 N 1 g ~
1_seq 5407 N 1 c ~
1_seq 5408 N 1 a ~
1_seq 5409 N 1 t ~
1_seq 5410 C 1 , ~
1_seq 5411 A 1 , ~
1_seq 5412 A 1 , ~
1_seq 5413 A 1 , ~
Question: Give this (non-standard) use case, is there a way to tune the consensus calling parameters to favor replacement of N's?
Thanks in advance!