SNPs Calling Tools
2
0
Entering edit mode
7.8 years ago
dd.catania • 0

Hi, I'm looking for some tools to perform SNP calling on some of BAM alignment data. The data on which I am doing research come from Illumina technology (average coverage 300x) and PacBio (average coverage 30x). Does anyone know give me some suggestions on which are the best tools in the literature?

SNP alignment variant calling RNA-Seq • 3.0k views
ADD COMMENT
0
Entering edit mode

Here is a very recent comparison of commonly used tools, using non-matched exome data.

ADD REPLY
1
Entering edit mode
7.8 years ago
bharata1803 ▴ 560

This is a paper that talk about almost all available tools and their comparison. http://www.nature.com/articles/srep17875. You can use samtools, GATK, freebayes. I also know you can use VarScan2.

ADD COMMENT
0
Entering edit mode

Thanks, I read the paper and the data used come from Ion Proton and Illumina. Is there any specific tool for PacBio data? Can I use the same tools?

ADD REPLY
0
Entering edit mode

I think if it is in the form of BAM, you can use it. The tools are not spesific for one platform.

ADD REPLY
1
Entering edit mode

Tools like GATK will perform sub-optimal on PacBio data since GATK is mainly designed for Illumina data. You can use them, but more specific variant callers would do better. However, I'm not aware of PacBio-specific variant callers, perhaps this page can put you in the right direction: http://www.pacb.com/products-and-services/analytical-software/smrt-analysis/analysis-applications/resequencing-variant-detection/

ADD REPLY
0
Entering edit mode
7.4 years ago

As far as I can see, no common variant callers work on Pacbio Sequel data (bacterial, read length ~3-5 kbp, coverage 200-300x).

Supporting VCF4 is essential for the Pacbio software SMRT analysis. I currently have two Pacbio projects for resequencing (and also methylation detection). The resequencing part of the more recent SMRT Analysis tools only generates VCF 3.3 from memory. VCF 4.0 (at least) however is essential to work with downstream VCF analysis tools such as Snpeff, vt etc.

Current leading variant callers such as Freebayes do not work on Pacbio BAMs (I believe the MD tags were too long). I need to try Samtools next, but at the moment cannot recommend Pacbio for resequencing.

It would be great if anyone has any solutions to this issue. Even if the callers perform less than optimally it would be good to find one suitable.

ADD COMMENT
1
Entering edit mode

BBMap's CallVariants will run fine on PacBio data, and produce output. It produces VCF 4.2 compliant files. As for the quality of the results... I'm not sure, since it's optimized for Illumina. Error-corrected PacBio reads should work well, though.

ADD REPLY
0
Entering edit mode

Thanks Brian. I haven't been successful with Samtools1.4 so far - though it produces output, the I16 fields are all 0 despite setting the --min-BQ lower as with this command.

samtools mpileup --reference /lager2/rcug/seqres/HS/mito.fa -o 1K2_SNV1.vcf --min-BQ 5 --max-depth 6000 -u --VCF --output-tags DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR 91.combined.alignmentset_RG.bam

I will try BBMap CallVariants next.

Edit:

So it turns out I can generate results using the current (bioconda) bbmap callvariants.sh. I had to play around reducing nearly all filters to really low (not 0, this outputs all base positions) for most parameters.

So progress has been made.

Command ( I have tried many variations of this already).

callvariants.sh in=91.combined.alignmentset_RG.bam out=1K2_SNV3.vcf ref=/lager2/rcug/seqres/HS/mito.fa minquality=1.0 minqualitymax=2 overwrite=t ploidy=1 minscore=2.0 minpairingrate=0 usepairing=f useidentity=f

Thanks for your help Brian, I will take your advice and remap with the canu-corrected reads.

ADD REPLY
0
Entering edit mode

Great to hear; I'd certainly be interested in any additional feedback you have about what it gets wrong, or the best defaults for PacBio reads. But certainly, I expect CallVariants (or any variant-caller) to perform much better with error-corrected PacBio data.

ADD REPLY
1
Entering edit mode

Hi Brian, I created this page.

bbmap callvariants.sh on pacbio

I can't comment on general variant calling very much since this is just for high-depth mitochondria - but the next project is bacterial genomes with PacBio where SNVs should also be called, so maybe I can add more then.

Colin

ADD REPLY

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6