Dear All,
I need some suggestions regarding CNV analysis. I want to know CNVs which I have found in normal/tumor pair , to what extent they are overlapping with CNVs from normal/ipsc (induced pluripotent stem cell derived from tumor). To this I extracted the CNVs regions from both normal/tumor (hereafter reported as tumor.bed
) and normal/ipsc (ipsc.bed
) in form of bedfiles. Now if I have to report the overlap of regions or features that between tumor.bed
and ipsc.bed
which option of intersect.bed should I use? I have 121 rows for tumor.bed and 199 for ipsc.bed. My tumor.bed
file and ipsc.bed
file only contains chr#, start and end coordinates.
I want to know the CNVs that I found in tumor.bed to what extent they are conserved in ipsc.bed
. I am providing the below bedtools command. Which should I use to get this information?
bedtools intersect -a tumor.bed -b ipsc.bed -wa -wb -f 1.0 | wc -l
45
(this shows features of tumor.bed
file that overlaps 100% with ipsc.bed
). This is similar to
bedtools intersect -a tumor.bed -b ipsc.bed -u -f 1.0 | wc -l
45
However if I do just
bedtools intersect -a tumor.bed -b ipsc.bed -wa -wb | wc -l
122
or
bedtools intersect -a tumor.bed -b ipsc.bed -u | wc -l
92
So I am bit confused which should be the ideal command to use in my case. My goal is to see if CNVs found in tumor remain conserved in ipsc or not? I would like to have some suggestions
Regards
Yes I have actually tried with other intermediate values like -f .50 or .75 and considerably found a higher hit. I would obviously not consider that the CNV region which I found in tumor will be completely conserved in ipsc as identical regions since the ipsc is a single clone. Definitely one thing which I understood from your reply is that I should use the restriction filter in such overlaps. just simply intersect bed files will not serve my purpose also I dont want one base overlap, my look out is for regions of CNVs that are conserved between tumor and its ipsc. So I should use with different values of -f and look though the output.
Exactly. What percentage is optimal will probably depend on how big the CNVs and how you called them/what sort of technology you used. In any case, if you get ~ a third complete overlap then regardless of the criterion there's a LOT of overlap in the CNVs. Given that your IPSCs came from the tumor, that makes sense.
I have usually used the exome data and called the CNVs with Control-FREEC using default window of minimum 500 and step 250 with ploidy status 2. The CNVs are quite large infact. The median number of bases in a CNV region for my tumor data is 475249 and for my IPSC is 1033749. The CNVs are much larger in the iPSC which should be the likely scenario. So
-f .50
parameter should hold good to see the regions that are CNV in tumor to what extent are they present in IPSC having minimum of 50% overlap between regions. But primary idea is that on reprogramming the tumor to its IPSC the genome background is not completely compromised and that CNVs are moving from tumor to its reprogrammed clone. Obviously am not negating the fact that iPSC will also acquire CNVs but to what extent tumor CNVs are present in IPSC is my actual concern.I have already played with-f .50
,.75
and1.0
. Here my question is about the genomic background maintenance. Do you wish to add any more suggestions Devon Ryan?I'd have to put some thought into that and get back to you if anything that you've likely not thought of comes to mind.
Thanks a lot, I appreciate that.