Calling SNPs with 1X depth (samtools/bcftools)
1
2
Entering edit mode
9.8 years ago
Coryza ▴ 30

Hi,

I am trying to call SNPs with at least 1X depth (My estimation is that most of the SNPs I want to see have indeed 1X / 2X depth). I have mapped consensus sequences to the Tomato reference genome (generating in theory 1X/2X depth on specific target positions) with Bowtie2 and perform SNP/INDEL calling with samtools mpileup and bcftools. However, I can't retrieve SNPs with 1X depth although I do mention it on the command line. Can anyone help me?

Commands:

for directory in *; do for file in $directory/*sorted*; do prefix=$(echo $file | rev | cut -f1 -d '/' | rev | cut -f1 -d '.'); samtools mpileup -uf ../../Data/SolanumLycopersicum_gDNA_2.40.fa $file | bcftools view -bvcg - > $directory/$prefix.bcf & done; done
for directory in *; do for file in $directory/*bcf*; do prefix=$(echo $file | rev | cut -f1 -d '/' | rev | cut -f1 -d '.'); bcftools view $file | vcfutils.pl varFilter -d1 -D100 > $directory/$prefix.vcf & done; done
samtools bcftools • 3.8k views
ADD COMMENT
0
Entering edit mode

The variant calling is done by samtools mpileup, so you'll need to make changes there. I don't actually know that it's possible to call variants from 1x coverage with samtools (most NGS data isn't reliable enough to call variants from 1x coverage).

ADD REPLY
0
Entering edit mode

I seem to recall someone posting a tool on seqanswers a while back (years?) that was designed for variant calling from very low coverage (maybe not 1X). If I can find that I'll post the link, but maybe someone else knows what I am talking about and can comment.

ADD REPLY
0
Entering edit mode

Someone on SEQanswers today posted this link that includes a program that might do what you want. You should completely ignore his discussion about samtools and varscan, the author doesn't actually have a clue what he's doing. However, I suspect that the C++ program he wrote will do what you need and function reasonably to do what you want.

ADD REPLY
1
Entering edit mode

I think I actually found a much more easier way. Bcftools has a -p option, which shows all filtered variation (including those with 1 depth). I assume that this solves my issue? ;)

ADD REPLY
0
Entering edit mode

Most likely, good catch

ADD REPLY
2
Entering edit mode
9.8 years ago

NGS data is too error prone to call SNPs based on a single read. Most of what you would call would be Illumina errors, not true sequence differences.

ADD COMMENT
1
Entering edit mode

I agree. And be aware that you won't be able to public these data in any reliable journal.

ADD REPLY
0
Entering edit mode

Thanks, we are aware of those issues ^^ But I think that consensus sequences of ~2KB, build up from ~>100 raw reads do not represent many sequencing errors anymore ;)

ADD REPLY
0
Entering edit mode

Correct, but that's not what most tools are designed for. Presumably there's something for error corrected PacBio reads, which these would essentially represent then.

ADD REPLY
0
Entering edit mode

You still think that a consensus of lets say 100 or more raw PacBio subreads contains sequencing errors?

ADD REPLY
0
Entering edit mode

Not to any significant degree. There's likely a switch to mpileup to just have it emit even the lowest confidence calls, which wouldn't really be low confidence in this case. As I wrote, only tools designed with PacBio reads in mind are meant to handle datasets like this.

ADD REPLY

Login before adding your answer.

Traffic: 1600 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6