Dear all,
I am working on sliding window approach for CNV calling. I am confused about overlapping reads in 2 continuous windows. I want to calculate read depth in each window and then take average of read depth. Now lets say I have reference of 1000 bp long and I divided this in 10 non overlapping windows (each 100 bp length) -> 1-100, 101-200, 201-300,.. so on. My reads are 36 bp long. The reads which are present within the window, will be counted in that window and there will be no problem. But if read is in between 2 windows, eg read is mapped at reference genome from 82 - 117 bp. Now half of the read is in first window and half of the read is in second window. So where should I place this read.
I have read some solutions -> I can count the read in the window, where 5' end of read is present. (problem is, if only first base of read is present in first window and rest 35 bases of read is in second window)
May be, I can divide the read equally in both the windows. Eg. if 2 reads are present in overlapping region for 2 windows, then count 1 for each window. (problem is, if only first base of read is present in first window and rest 35 bases of read is in second window and therefore cnv should be in 2nd window but the approach will divide the read count equally in both windows)
Third, may be I can take window of 100 bp length and slide it only 50 bp. (problem is that overlapping reads will be counted twice).
Can you please suggest any papers, your views and solutions?
Thanks and Best regards, Vikas
Hello Vishal, I am working on genome sequencing data and want to identify structural variants. After doing so many research i have found read depth is the best way to identify structural variants.Can you suggest me that how can i do this analysis ?? i will be grateful to you.
Thanks in advance