Majority consensus from bam file
1
2
Entering edit mode
6.1 years ago
erneheay ▴ 20

Hi,

I am trying to obtain a strict majority consensus from a bam file, i.e. Using only the reads aligned and NOT the reference used to align. So for example, in the case1, because the ref is completely covered by the reads, the consensus will have the same length than the reference, without gaps:

ex1

But for the 2nd case, I expect to have N's or a shorter consensus. In this case, using the reference to fill the gaps would be ok.

ex2

I have tried with samtools/bcftools and gatk but it is not working properly, i.e. it uses variations from the reads with low frequencies.

Many thanks,

E.

Edit1.

The samtools commands:

bwa mem -B 20 -A 3 -O 30 -E 3  -t 4 ref.fa R1.fastq R2.fastq > file.sam
samtools view -bT ref.fa file.sam | samtools sort > file.bam
samtools index file.bam
samtools mpileup -uf ref.fa file.bam | bcftools call -mv -Oz -o output.vcf
cat ref.fa | bcftools consensus output.vcf > output.fa

The gatk ones (from the bwa/samtools bam output):

java -jar picard.jar MarkDuplicates INPUT=file.bam O=file.dup.bam M=file.metrics
gatk HaplotypeCaller --reference ref.fa --input file.dup.bam --output gatk.vcf
gatk FastaAlternateReferenceMaker -R ref.fa --variant gatk.vcf -o gatk.fa
consensus alignment • 4.0k views
ADD COMMENT
1
Entering edit mode

Hello,

I have tried with samtools/bcftools and gatk but it is not working properly

please be more specific of what have you done. What were the exact command?

Usually you first have to do variant calling to obtain a vcf file. After that you can use bcftools consensus to get the consensus sequence.

fin swimmer

ADD REPLY
0
Entering edit mode

Thanks finswimmer, question edited. bcftools consensus output is not a strict majority consensus, I think.

ADD REPLY
0
Entering edit mode

Hello again,

now I understand it better. What you could try is:

  • get depth at each position of the reference using samtools depth
  • extract those regions where the depth is 0 (or any other value you like) and make a bed file out of it
  • mask those regions using bedtools maskfasta
  • use bcftools consensus on that new reference to incorporate your variants

fin swimmer

ADD REPLY
1
Entering edit mode
6.1 years ago
trausch ★ 1.9k

This should work using Alfred

./alfred consensus -f bam -t ont -p chr1:218992200 <ont_pacbio_illumina.bam>

The alignment scoring depends on the sequencing technology. If you have phased SNPs and long reads you should split the BAMs into haplotypes prior to the consensus computation.

ADD COMMENT

Login before adding your answer.

Traffic: 2566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6