Hi,
I am new to long read seq, I am trying to call Variants on GIAB Trio samples from PacBio data
Initially i Aligned reads with Pbmm2 tool, then variant call by DeepVariant 1.5, Phasing through Whatshap.
My queries are as follows Like Illumina NextSeq/NovaSeq data
What are Quality parameters that should be taken care ( Adapter trim, Reads filter, Min Data size, ) How to trim adapter, What tools to use for FastQ/BAM input from PacBio for QC.
I have Phased data after Whatshap which are represented "|"
in VCF, Does that mean should i filter out the variants that has "\"
before downstrean analysis annotation ( Are they bad quality variants )
Hi William , Thank you for the explanation, From VCF should i omit variants with
"/"
and keep ones with phased"|"
for further annotation / variants reporting..? If So variants calls byGATK HaplotypeCaller
i dont see something like Phase Unphase variants, All the Variant calls in VCF are Unphased"/"
, Just a confusion, Sorry if its very naïve question, why there is separate Phasing step in PacBio Variants calls from DeepVariant and not in GATK ( Illumina ) which has all"/"
There's no reason to omit unphased genotypes, in general. You'll notice that many of the unphased (
/
) genotypes are homozygous. These are still valid, high quality genotypes, it's just that it isn't meaningful to assign them to a haplotype block because they are present in _both_ haplotype blocks.Unlike GATK HaplotypeCaller, there isn't a post-calling filter step applied to variant calls from DeepVariant. You'll see in the VCF that there are some sites with the
RefCall
FILTER. From the DeepVariant documentation:You can potentially apply your own filters, based on QUAL, GQ, or other features, using
bcftools filter
, but phased/unphased doesn't reflect the quality in any way.Thanks a lot William, I am Clear now