Question

Calling SNPs with 1X depth (samtools/bcftools)

2

Entering edit mode

9.8 years ago

Coryza ▴ 30

Hi,

I am trying to call SNPs with at least 1X depth (My estimation is that most of the SNPs I want to see have indeed 1X / 2X depth). I have mapped consensus sequences to the Tomato reference genome (generating in theory 1X/2X depth on specific target positions) with Bowtie2 and perform SNP/INDEL calling with samtools mpileup and bcftools. However, I can't retrieve SNPs with 1X depth although I do mention it on the command line. Can anyone help me?

Commands:

for directory in *; do for file in $directory/*sorted*; do prefix=$(echo $file | rev | cut -f1 -d '/' | rev | cut -f1 -d '.'); samtools mpileup -uf ../../Data/SolanumLycopersicum_gDNA_2.40.fa $file | bcftools view -bvcg - > $directory/$prefix.bcf & done; done
for directory in *; do for file in $directory/*bcf*; do prefix=$(echo $file | rev | cut -f1 -d '/' | rev | cut -f1 -d '.'); bcftools view $file | vcfutils.pl varFilter -d1 -D100 > $directory/$prefix.vcf & done; done

samtools bcftools • 3.8k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Coryza ▴ 30

0

Entering edit mode

The variant calling is done by samtools mpileup, so you'll need to make changes there. I don't actually know that it's possible to call variants from 1x coverage with samtools (most NGS data isn't reliable enough to call variants from 1x coverage).

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

I seem to recall someone posting a tool on seqanswers a while back (years?) that was designed for variant calling from very low coverage (maybe not 1X). If I can find that I'll post the link, but maybe someone else knows what I am talking about and can comment.

ADD REPLY • link 9.8 years ago by SES 8.6k

0

Entering edit mode

Someone on SEQanswers today posted this link that includes a program that might do what you want. You should completely ignore his discussion about samtools and varscan, the author doesn't actually have a clue what he's doing. However, I suspect that the C++ program he wrote will do what you need and function reasonably to do what you want.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

1

Entering edit mode

I think I actually found a much more easier way. Bcftools has a -p option, which shows all filtered variation (including those with 1 depth). I assume that this solves my issue? ;)

ADD REPLY • link 9.8 years ago by Coryza ▴ 30

0

Entering edit mode

Most likely, good catch

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Devon Ryan 104k

Ram · Answer 1 · 2015-02-25

2

Entering edit mode

9.8 years ago

swbarnes2 14k

NGS data is too error prone to call SNPs based on a single read. Most of what you would call would be Illumina errors, not true sequence differences.

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by swbarnes2 14k

1

Entering edit mode

I agree. And be aware that you won't be able to public these data in any reliable journal.

ADD REPLY • link 9.8 years ago by marina.v.yurieva ▴ 580

0

Entering edit mode

Thanks, we are aware of those issues ^^ But I think that consensus sequences of ~2KB, build up from ~>100 raw reads do not represent many sequencing errors anymore ;)

ADD REPLY • link 9.8 years ago by Coryza ▴ 30

0

Entering edit mode

Correct, but that's not what most tools are designed for. Presumably there's something for error corrected PacBio reads, which these would essentially represent then.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

You still think that a consensus of lets say 100 or more raw PacBio subreads contains sequencing errors?

ADD REPLY • link 9.8 years ago by Coryza ▴ 430

0

Entering edit mode

Not to any significant degree. There's likely a switch to mpileup to just have it emit even the lowest confidence calls, which wouldn't really be low confidence in this case. As I wrote, only tools designed with PacBio reads in mind are meant to handle datasets like this.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k