Generating consensus sequences
1
1
Entering edit mode
8.0 years ago
Lina F ▴ 200

Hi all,

I am interested in generating consensus sequences from a bwa alignment. I generated the consensus sequences as follows:

samtools mpileup -uf reference.fasta sorted.bam -d 8000 | bcftools call -mv -Oz -o sorted.vcf.gz
tabix sorted.vcf.gz
cat reference.fasta | bcftools consensus sorted.vcf.gz > consensus.fasta

However, I noticed that even for sequences in my reference where none of the reads matched, I got a potential "consensus" sequence back. I assume this is because there were no entries in the VCF file that indicated SNPs, which could mean a perfect alignment.

However, this is misleading in my case, because if nothing mapped to a sequence in the reference set then I don't want to be confused with a consensus sequence. On the other hand, if a lot of reads mapped perfectly to a sequence in the reference set, then of course I'd like to see the correct consensus sequence.

How would you recommend I modify this approach?

Is it better to filter the bam somehow to only keep things that have a certain mapping depth? Or should I filter the VCF?

Thanks in advance!

consensus fasta bwa vcftools • 3.1k views
ADD COMMENT
0
Entering edit mode
8.0 years ago
vmicrobio ▴ 290

you may try in parallel to remove non-aligned sequences:

samtools view -b -F4 sorted.bam > sorted_align.bam
ADD COMMENT
0
Entering edit mode

Thank you for the suggestion, I will try that!

ADD REPLY

Login before adding your answer.

Traffic: 2567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6