Overlapping Reads In Window Approach
2
1
Entering edit mode
12.7 years ago
Vikas Bansal ★ 2.4k

Dear all,

I am working on sliding window approach for CNV calling. I am confused about overlapping reads in 2 continuous windows. I want to calculate read depth in each window and then take average of read depth. Now lets say I have reference of 1000 bp long and I divided this in 10 non overlapping windows (each 100 bp length) -> 1-100, 101-200, 201-300,.. so on. My reads are 36 bp long. The reads which are present within the window, will be counted in that window and there will be no problem. But if read is in between 2 windows, eg read is mapped at reference genome from 82 - 117 bp. Now half of the read is in first window and half of the read is in second window. So where should I place this read.

I have read some solutions -> I can count the read in the window, where 5' end of read is present. (problem is, if only first base of read is present in first window and rest 35 bases of read is in second window)

May be, I can divide the read equally in both the windows. Eg. if 2 reads are present in overlapping region for 2 windows, then count 1 for each window. (problem is, if only first base of read is present in first window and rest 35 bases of read is in second window and therefore cnv should be in 2nd window but the approach will divide the read count equally in both windows)

Third, may be I can take window of 100 bp length and slide it only 50 bp. (problem is that overlapping reads will be counted twice).

Can you please suggest any papers, your views and solutions?

Thanks and Best regards, Vikas

analysis cnv read overlap • 3.8k views
ADD COMMENT
0
Entering edit mode

Hello Vishal, I am working on genome sequencing data and want to identify structural variants. After doing so many research i have found read depth is the best way to identify structural variants.Can you suggest me that how can i do this analysis ?? i will be grateful to you.

Thanks in advance

ADD REPLY
2
Entering edit mode
12.7 years ago

Assuming that you are comparing to some "reference", then just make sure that you are counting both the same way. Everything will come out in the wash.

ADD COMMENT
1
Entering edit mode

One read count differences in windows should not make a big difference to a read-counting method; you'll need to make your windows large enough to ensure that is the case. So, choose a method that counts each read only once (whatever that is).

ADD REPLY
0
Entering edit mode

Thanks for your reply. The thing is, I am not comparing with reference. I do not have any normal (control) sample.

ADD REPLY
0
Entering edit mode

Again, it makes no difference I don't think. You may find that using read counts is problematic without correcting for biases like GC content, for example, and by comparing to one or more normals.

ADD REPLY
0
Entering edit mode

Yes, you are right. I will correct GC biases but I also need to decide, in which window I should place that read. Note -> I am not talking about reads which mapped to different positions.

ADD REPLY
1
Entering edit mode
12.7 years ago

We usualy only consider the leftmost position of the read. What you are observing is a random process, so you simply ask, "How many times the event X happens in the window W?" The event X can be "the leftmost position of the read", or "the central position of the read" or whatever you prefer. Depending on how you define it, it might be that there are some bias in the FIRST and LAST window, especially if the length of the read is comparable with the length of the window (like in your example). The other windows, however, are fine. As Sean Davis says, just make sure you count an event only once.

ADD COMMENT
0
Entering edit mode

Thanks for your answer. I think, I will use the same method -> leftmost position of the read.

ADD REPLY

Login before adding your answer.

Traffic: 1915 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6