Entering edit mode
6.3 years ago
gdaly9000
▴
10
In "Comment on 'DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification'", Stewart et al., Science 361 eaas9824(2018), authors point out that the method of variant analysis proposed by Chen et al. (Reports, 17 February 2017, p. 752)
...software implementation is limited by using reads aligned only to the forward strand...
Talking to my collaborators, we have been asking how is information about the reverse reference strand would change variant call results? Shouldn't knowing the forward reference base provide adequate information related to variant calling?
I think the comment by Stewart et al. is "why throw away half of your data, if it is perfectly usable?".
Yes, assuming that is occurring that would be a very reasonable conclusion. In fact, another criticism by Stewart et al. is that Chen et al. chose quality params that were too high, thus throwing away much of their data so that may be the reason for their criticism. I am still unclear about the mechanism.
I guess my question is, only accounting for the forward reference strand are you actually throwing away data?
For example, Chen et al. split reads into forward and reverse (with respect to Illumina adapters) and then call variants, apparently only against the forward reference strand.
Assuming a single genomic position with reference on the forward strand T . Taking an example where an Illumina Read 1 has position with a T and on the complementary Read 2, A.
Obviously the T from Read 1 would map to the T of the reference. The question is now how to account from the Read 2 A. I would assume that a reasonable variant caller would see that an A would map to the reverse complement (reverse strand of the reference), as opposed to calling it a variant.
Is this what Stewart et al. are alluding to?
The aligner will take care of that, not the variant caller.
Examining the source code, the Chen et al. code does eliminate all reads mapped to the reverse reference strand.
Assuming global damage which is randomly distributed (with respect to the forward or reverse strand), this may be a reasonable assumption, though.
Basically the aligner dealt with stranding but the downstream software ignored this information in this case.