In case of CNV/SV filtration, what is the %age of overlap that is most widely used to filter out a CNV/SV segment after comparison with DGV and dbVar ?
Also in DGV, there are few segments which has both gain and loss. So, while filtering against our CNV data, how to filter out our variants (Individual CNV Loss and CNV Gain variants) against the DGV variants having both gains and loss?
I would think that even a segment that overlaps for 90% isn't justified to be filtered out. The remaining 10% might contain a crucial exon/sequence/regulatory fragment.
Just to let you know CNVs and SVs have differences. CNVs will be way smaller than a SV and filtering CNVs based on SVs detected in healthy individuals based on the databases of DGV and dbVar is not the correct way. When you have SNVs from your data and you are not aware if these CNVs are germiline or somatic, you can then look for ExAC database for removing the CNVs that resemble an overlap with yours. Ideally the cut-off has to be selected based on your biological query and also how many genes are usually under that CNV. Or you can check in browsers like ADVISER. For finding lethal SVs in your data or even for that matter removing SVs that are seen in normal individuals then DGV or dbVar is ok. You have to annotate the CNVs or SVs and prioritize them.
Thank you...
Yes, filtering CNVs based on SVs detected in healthy individuals based on the databases of DGV and dbVar is not correct. bUT DGV AND DBVAR have classified SVs and CNVs separately according to each type under the category - Copy loss, Copy gain, deletion, insertion, duplication, etc. So, i filter the same CNV type. For example, i filter the deletion in my data with the deletion in dgv and dbvar, copy loss in my data with the copy loss in dgv and dbvar,and so on. Is it a correct approach?
The cut-off has to be selected based on your biological query. I feel that if there is a 100% overlap, it is certain to be filtered out provided the SV/CNV type of the data matches the SV/CNV type in dgv and dbvar. Is it correct?
I suspect the answer to this question largely depends on the biological question that you are trying to answer.
Thank you. If i want to remove the variants present in healthy control population. In that case, what is the recommended cut-off ?
I would think that even a segment that overlaps for 90% isn't justified to be filtered out. The remaining 10% might contain a crucial exon/sequence/regulatory fragment.
Yes, it's true. Thank you :)