Question

Samtools Mpileup And Bcftools Calling Snps Againt Non-Existent N Characters

1

Entering edit mode

13.5 years ago

Travis ★ 2.9k

Hi,

I am using the standard mpileup/bcftools route to call variants with some publicly available Ion Torrent E.coli DH10B data. In my final results I get a couple of SNPs called where the reference base is an N character. The problem is that I have checked the entire genome and can't find any N characters! The bam file I am using was also publicly available so I am wondering if maybe the alignments were originally done against a version of the genome that contained Ns however I am not certain on the mechanics of how samtools/bcftools call the variants so am unsure if this explanation could be correct. Can anyone lend their opinion?

Thanks in advance.

next-gen sequencing samtools snp bcftools • 4.7k views

ADD COMMENT • link updated 13.5 years ago by Swbarnes2 ★ 1.6k • written 13.5 years ago by Travis ★ 2.9k

score 2 · Answer 1 · 2011-11-25

2

Entering edit mode

13.5 years ago

Swbarnes2 ★ 1.6k

Check the gneome for other ambiguous characters; MRWSYK. mpileup treats them as N's.

ADD COMMENT • link 13.5 years ago by Swbarnes2 ★ 1.6k

1

Entering edit mode

If mpileup thinks that every single position is an N, that means that there's some discrepancy between the reference sequence you used, and what mpileup thinks you used. For instance, if the reference name in the fasta differs from the name in the .sam file, or if the fasta.fai was made wrong (or if that step failed, the software will yell, but it will gamely go on without it), you get that result.

ADD REPLY • link 13.5 years ago by Swbarnes2 ★ 1.6k

0

Entering edit mode

I found literally two non A|T|G|C characters in the genome. These don't account for the number of variants I am seeing called against N's unfortunately...

ADD REPLY • link 13.5 years ago by Travis ★ 2.9k

0

Entering edit mode

I just performed the same operation with a publicly available MiSeq dataset for the same genome - a SNP against an N character was called for every genomic position covered by a read!

ADD REPLY • link 13.5 years ago by Travis ★ 2.9k

0

Entering edit mode

Cheers for the input. I'll have a look at it asap and will get back to you. The perils of doing too much at once!

ADD REPLY • link 13.5 years ago by Travis ★ 2.9k