I have PacBio reads from three bacterial genomes as three samples. They were de-novo assembled using HGAP4 and polished. One of samples was taken as reference. And subreads from each sample was mapped back to this reference for variant calling. Variant calling was performed using two workflows:
subreads -> ngmlr -> Sniffles -> VCF
subreads -> pbmm2 -> pbsv -> VCF
Both VCF files have no DP, MQ or any of the standard quality metrics. Is this normal?
- How can we access the quality of these variants? Visual examination seems tricky due to huge number of indels in the BAM file.
- Another point is that I couldn't find any ploidy option in any of these tools. Is that not important? I am guessing these tools are developed for diploids.
Here is the PBSV vcf (696 rows in total)
pb_501_001 8 pbsv.CNV.12 T <CNV> . NearContigEnd SVTYPE=cnv;END=132000;SVLEN=131992;SHADOWED CN 3
pb_501_001 8 pbsv.BND.pb_501_001:8-pb_501_001:5167147 T ]pb_501_001:5167147]T . NearContigEnd SVTYPE=BND;CIPOS=-7,95;MATEID=pbsv.BND.pb_501_001:5167147-pb_501_001:8;MATEDIST=5167139 GT:AD:DP 1/1:0,343:343
Here is the VCF from Sniffles (891 rows in total)
pb_501_001 1 0 N <DUP> . PASS SUPP=3;SUPP_VEC=111;SVLEN=5167155;SVTYPE=DUP;SVMETHOD=SURVIVOR1.0.7;CHR2=pb_501_001;END=5167156;CIPOS=0,0;CIEND=0,0;STRANDS=-+ GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO 1/1:NA:5167155:0,136:-+:.:DUP:0:NA:NA:pb_501_001_1-pb_501_001_5167156 1/1:NA:5167155:0,219:-+:.:DUP:0:NA:NA:pb_501_001_1-pb_501_001_5167156 1/1:NA:5167155:0,100:-+:.:DUP:0:NA:NA:pb_501_001_1-pb_501_001_5167156
pb_501_001 1612 1 N <INS> . PASS SUPP=3;SUPP_VEC=111;SVLEN=321;SVTYPE=INS;SVMETHOD=SURVIVOR1.0.7;CHR2=pb_501_001;END=1612;CIPOS=0,0;CIEND=0,0;STRANDS=+- GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO 0/0:NA:321:24,2:+-:.:INS:1:NA:NA:pb_501_001_1612-pb_501_001_1612 0/0:NA:321:48,2:+-:.:INS:1:NA:NA:pb_501_001_1612-pb_501_001_16120/0:NA:321:19,3:+-:.:INS:1:NA:NA:pb_501_001_1612-pb_501_001_1612
These are the types of variants in the sniffles VCF.
cat sniffles.vcf | grep -v "^#" | cut -f5 | sort | uniq -c
101 <DEL>
86 <DUP>
12 <DUP/INS>
340 <INS>
162 <INV>
154 <INVDUP>
36 <INV/INVDUP>
I had to use subreads because I wasn't able to generate CCS from this data. Are there any other tools to call variants (mostly SNPs) from bacterial pacbio data? I tried PBHoover, but unfortunately it just crashes when I run it.