Dear community members,
I often meet the phrase "these mutations happen most frequently at transcribed strand" in literature (organism is diploid).
How do we understand the strand of a mutation in NGS data? I am reading papers now, but I still can not grasp the idea...For me several replication cycles should totally vanish all the info about strandness, no? I understand how we can "infer" strandness in some cases (e.g. leading/lagging strands for replication next to origin-of-replication point) - but we still see mutations on both strands after the replication, so I doubt we can classify all the mutations to their strands...
UPD: this is the closest for me to understanding - but that means that we can classify them only around replication domain? What are valleys and peaks, what is "slope of 250 rtu"? https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1509-y#Sec8
"Direction of replication Left- and right-replicating domains were taken from [11] where replication timing profiles were generated in six lymphoblastoid cell lines [68], valleys and peaks (defined as regions with a slope with a magnitude lower than 250 rtu per Mb) were removed, after which left- and right-replicating domains were defined as timing transition regions with a negative and positive slope, respectively [11]. In the left-replicated regions, the reference strand is used as a template for the leading strand, while the opposite strand is used as a template for the lagging strand, and vice versa for the right-replicated regions. Each domain (called territory in the original source code and data) is 20 kbp wide and annotated with the direction of replication and with replication timing."
During DNA duplication, you have 2 ways of copy the strands (in figure 1a in the paper), which causes the duplication process is not the same for the leading and lagging strands. So, you can have new mutations in one copy which are not in the other copy.
yeap, but what I don't understand is why the imbalance is detected - at the next replication cycle both strands will contain this mutation, how can we see it in bulk NGS data? I see that the bars are of different height - but why?
UPD: sorry I think this "annotated left/right-replicating regions" is the key. So having them everything is easy.
Yes, they used methods that can sequence only a specific region relative to the replication origin