I have 14 plant cultivars and I want to find cnvs against a wild type. I am using lumpy express for each cultivar separately and I get the corresponding .vcf file. My problem is that I do not know what to do next, how to proceed. Any ideas?
I have 14 plant cultivars and I want to find cnvs against a wild type. I am using lumpy express for each cultivar separately and I get the corresponding .vcf file. My problem is that I do not know what to do next, how to proceed. Any ideas?
You should merge the SVs and genotype them. Hopefully your plant is diploid since svtyper is for diploid organisms. Perhaps, it can handle higher ploidies.
I would follow the methods outlined in this paper https://www.nature.com/articles/s41467-018-06159-4
The developers of lumpy/svtyper contributed their recommendations for subsequent steps in SV calling.
Some changes to consider,
The authors recommend merging breakpoints +/- 20bp. You may want to increase this to 1kbp or around the lengths of your sequencing fragments
There are other common filters for SV calling. These include SVs with significant overlap (50%-66% or higher) with segmental duplications, STRs, centromeres, telomeres, assembly gaps, and other problematic regions for aligning short reads.
lumpy often calls extremely large SVs. From blood samples in humans, I have seen SVs the length of chr1 (250Mb). This is impossible and likely due to the reads mapping to the repetitive regions of the telomeres. Depending on your context (I can speak for humans) you may want to omit anything larger than 10-30Mb. Although you can see SVs those sizes in some diseases in humans.
Sometimes filtering SVs if either breakpoint (with confidence interval in the VCF) overlaps a problematic region listed above
Thank you for your reply.
Are you aware of any R packages that can process lumpy and svtyper output?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.