Hi,
I am trying to obtain a strict majority consensus from a bam file, i.e. Using only the reads aligned and NOT the reference used to align. So for example, in the case1, because the ref is completely covered by the reads, the consensus will have the same length than the reference, without gaps:
But for the 2nd case, I expect to have N's or a shorter consensus. In this case, using the reference to fill the gaps would be ok.
I have tried with samtools/bcftools and gatk but it is not working properly, i.e. it uses variations from the reads with low frequencies.
Many thanks,
E.
Edit1.
The samtools commands:
bwa mem -B 20 -A 3 -O 30 -E 3 -t 4 ref.fa R1.fastq R2.fastq > file.sam
samtools view -bT ref.fa file.sam | samtools sort > file.bam
samtools index file.bam
samtools mpileup -uf ref.fa file.bam | bcftools call -mv -Oz -o output.vcf
cat ref.fa | bcftools consensus output.vcf > output.fa
The gatk ones (from the bwa/samtools bam output):
java -jar picard.jar MarkDuplicates INPUT=file.bam O=file.dup.bam M=file.metrics
gatk HaplotypeCaller --reference ref.fa --input file.dup.bam --output gatk.vcf
gatk FastaAlternateReferenceMaker -R ref.fa --variant gatk.vcf -o gatk.fa
Hello,
please be more specific of what have you done. What were the exact command?
Usually you first have to do variant calling to obtain a
vcf
file. After that you can use bcftools consensus to get the consensus sequence.fin swimmer
Thanks finswimmer, question edited. bcftools consensus output is not a strict majority consensus, I think.
Hello again,
now I understand it better. What you could try is:
samtools depth
bed
file out of itbedtools maskfasta
bcftools consensus
on that new reference to incorporate your variantsfin swimmer