Dear all,
After some analysis, I used a tool to call copy numbers from my sequencing data. I got the output ->
CHROM START END CopyNumber chr1 0 1000 2.151000 chr2 0 1000 4.478000 chr2 1000 2000 5.431000
Now, I did this analysis for 50 patients. So I have 50 files (as shown above) like this and for each file I have about 10,000 CNV's. Now I want to see which are the disease causing CNV's. So what I am thinking is ->
1.) Take the common CNV's which are present in all 50 patients.
2.) Filter them, if some of them are already present in database (DGV).
I want to know if there is any better strategy (or pipeline, filtering method, visualization of all at once), to find out novel CNV's from this kind of data?
Thanks and Best regards,
Vikas
What is your phenotype? Is this tumor vs normal tissue? Several of the replies here I think assume you are looking at a cancer phenotype. However, if you want to identify "disease-causing CNVs" in a phenotype associated with germ-line mutations/CNVs -- you need to know status of parental inheritance (inherited CNVs less likely to be pathogenic), and it really comes down to size of CNV (larger = more likely pathogenic) and gene content (very small CNVs can be pathogenic if the right gene is deleted).
@Alex: Can you please tell me, what do you mean by "and it really comes down to size of CNV (larger = more likely pathogenic) and gene content (very small CNVs can be pathogenic if the right gene is deleted)".
Just curious: which tool did you use in the end?
I used mrCaNaVar.
@Vikas, take a look at the 2011 review by Girirajan, Campbell, and Eichler (PMID:21854229). I think that paper gives a good overview.