Could anybody point towards the direction in how to generate a “consensus” fasta sequence file including indels from a BAM file? Thanks
Could anybody point towards the direction in how to generate a “consensus” fasta sequence file including indels from a BAM file? Thanks
samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq
seqtk fq2fa cns.fq >cns.fa
seqtk was released in the latest samtools verion.
The GATK package has a utility called AlternateReferenceMaker to do this.
bcftools view -cg is now deprecated (version 1.1). bcftools call -vm should be used instead.
samtools mpileup -uf hg19.fa in.bam \ | bcftools call -vm -Oz - > out.vcf.gz tabix -p vcf out.vcf.gz
bcftools consensus seems to be deprecated too (version 1.1). vcf-consensus may be used as an alternative:
samtools faidx hg19.fa chr22:29000000-29200000 \ | vcf-consensus out.vcf.gz \ | bgzip > out.fasta.gz
chr22:29000000-29200000 may be changed for any other region/s of interest
Coming in 4 year later, I've tried these lines of code for my single-end BAM file but the consensus sequence fills in gaps with bases from the reference.fasta file. Is there a way to just get a consensus of the BAM file or even the mpileup file to also keep gaps where there was no depth of coverage against the reference.fasta?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I am fairly sure that this method will only include SNVs, but not INDELs. Please correct me if I'm wrong!