I have generated an ideogram of Chromosome 1 of our first sample, the annotation are CNVs identified found in this sample with Breakdancer. Here is the output figure :
Nice you got the plot running! Of course it looks awesome graphically (there is some room for fine-tuning the colors and collapse levels though), but that's possibly not what meant.
The data displayed indeed look disturbing to me, and that would make me double or triple check each step. I was expecting short variants that do not overlap (too much).
But your variant calls are all over the place and seem to cover large parts of the cromosome, and also they overlap. How can that possibly be true?
So I would check:
Are your data really depicted correctly, or is there an error in the plotting? Try to make a preliminary plot with another tool.
Compare with the variant call files, do they really contain that many huge variants?
indeed I checked the file and data have large variants called. I think Breakdancer didn't do the job properly so I will check and see if a second run of Breakdancer gives the same output. The R script works well thou :)
Yes, this looks like a large over-prediction, maybe the default parameters are not very strict such that wrong read pairs or misalignments cause this. I would set a minimum number of read pairs, and also QC the reads, removing all incorrectly paired reads, and low quality alignments before feeding the SAM files into Breakdancer. Also I would remove all pairs that are much more distant from each other than expected. I don't know, is there is a maximum possible insert size for the pairs? If so, that should be the cutoff for a CNV length, shouldn't it?
indeed I checked the file and data have large variants called. I think Breakdancer didn't do the job properly so I will check and see if a second run of Breakdancer gives the same output. The R script works well thou :)
Yes, this looks like a large over-prediction, maybe the default parameters are not very strict such that wrong read pairs or misalignments cause this. I would set a minimum number of read pairs, and also QC the reads, removing all incorrectly paired reads, and low quality alignments before feeding the SAM files into Breakdancer. Also I would remove all pairs that are much more distant from each other than expected. I don't know, is there is a maximum possible insert size for the pairs? If so, that should be the cutoff for a CNV length, shouldn't it?